PHP: Convert UTF-8 to Hex Codepoint values (Unicode Hexidecimal)

This is one of those strange things that sounds a lot easier to do than it is.

I originally handled this by exploding the character through this:

$array = preg_split('//u', $a);

This worked fine and the string was split into an array of unicode characters. The next part was converting it into a useful hexidecimal value.

$character = $array[0];
$value = hexdec(bin2hex($character));

I originally thought this was the way to do so – I was wrong, don’t do it. It turns out there is no real simple way to convert from UTF-8 to hex values. Instead, try the UTF8ToUnicode function here: http://hsivonen.iki.fi/php-utf8/

Include this function and use the author’s utf8ToUnicode function. It becomes simple then:

$value = utf8ToUnicode($character);
$value = $value[0];

I am only posting this because of the sheer amount of time it took for me to find this information. I hope it helps you out.

One thought on “PHP: Convert UTF-8 to Hex Codepoint values (Unicode Hexidecimal)

  1. Thanks for this!

    I’d already found Henri Sivonen’s code, but your tale of success encouraged me to reassess it, after initally dismissing it.

    I did have to convert the result to hex with dechex($value) at the end, and for some reason I also had to modify the utf8ToUnicode() function to accept the variable passed by value, but I now have everything I need.

    Thanks again for posting!

Comments are closed.