Yorick Posted December 21, 2010 Share Posted December 21, 2010 Hello, I've got a huge database that is filled with text. It is encoded in UTF8 and some of the symbols used (like emoticons) are encoded in the private use area of UTF8 (http://www.fileformat.info/info/unicode/block/private_use_area/utf8test.htm). Now I want to replace those codes of the private use area with the corresponding smilies etcetera. So actually my question is, how do I replace specific UTF8 codes with something else in PHP? Thanks in advance! Quote Link to comment Share on other sites More sharing options...
requinix Posted December 21, 2010 Share Posted December 21, 2010 UTF-8 is just an encoding. Behind it are actual bytes of data. Hopefully utf8_encode() allows you to convert private use Unicode characters into UTF-8 sequences. Can't test where I am. U+E8B9 should be... 0xEEA2B9 I think. Get the byte encoding of whatever character, if you don't have that already, and do a binary-safe search-and-replace for each emoticon. If you want to do it in PHP, //$text = str_replace(utf8_encode("\xE8\xB9"), ":)", $text); $text = str_replace("\xEE\xA2\xB9", ":)", $text); Quote Link to comment Share on other sites More sharing options...
Yorick Posted December 21, 2010 Author Share Posted December 21, 2010 UTF-8 is just an encoding. Behind it are actual bytes of data. Hopefully utf8_encode() allows you to convert private use Unicode characters into UTF-8 sequences. Can't test where I am. U+E8B9 should be... 0xEEA2B9 I think. Get the byte encoding of whatever character, if you don't have that already, and do a binary-safe search-and-replace for each emoticon. If you want to do it in PHP, //$text = str_replace(utf8_encode("\xE8\xB9"), "", $text); $text = str_replace("\xEE\xA2\xB9", "", $text); Great! That worked, thank you very much for the quick reply! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.