UPDATE:
Right, the following PHP script results in the UTF-8 code 233 being echoed to the browser:
According to the following table, 233 is the HTML encoding of the unicode character 'é':
http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=dec&unicodeinhtml=dec
So I guess I could use this method if I can also do the same conversion in C (i.e. from the character in the string to the UTF-8 HTML code equivalent). That way the two values should match. If anyone knows of a method for doing that then I'd be grateful!
Right, the following PHP script results in the UTF-8 code 233 being echoed to the browser:
PHP:
function ordutf8($string, &$offset) {
$code = ord(substr($string, $offset,1));
if ($code >= 128) { //otherwise 0xxxxxxx
if ($code < 224) $bytesnumber = 2; //110xxxxx
else if ($code < 240) $bytesnumber = 3; //1110xxxx
else if ($code < 248) $bytesnumber = 4; //11110xxx
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) $offset = -1;
return $code;
}
$text = "é";
$offset = 0;
while ($offset >= 0) {
echo ordutf8($text, $offset);
}
According to the following table, 233 is the HTML encoding of the unicode character 'é':
http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=dec&unicodeinhtml=dec
So I guess I could use this method if I can also do the same conversion in C (i.e. from the character in the string to the UTF-8 HTML code equivalent). That way the two values should match. If anyone knows of a method for doing that then I'd be grateful!