Subscribe to PHP Freaks RSS

thePHP.cc: Putting PHP 8 on the Roadmap

syndicated from www.phpdeveloper.org on February 2, 2018

On thePHP.cc site today they have a quick post that looks ahead at the future of the PHP language towards PHP version 8 and one planned feature - the deprecation of some multi-byte character handling.

Since the attempt to create a Unicode-based PHP implementation has failed, PHP 7 – just like PHP 5 – does not handle Unicode strings natively. The commonly used UTF-8 encoding, for example, is a multibyte encoding, as opposed to ASCII, where each character is represented by one single byte.

[...] UTF-8 is a variable-length encoding and each character (code point, to be exact) is represented by one to four bytes. For ASCII characters, everything works smoothly, because UTF-8 is a superset of ASCII. The problems start with non-ASCII characters.

The post covers some of the common issues with multi-byte Unicode characters in PHP and the role that the iconv and mbstring functions play in their handling. It shows how the mbstring handling allows developers to "cheat a little" and where, when PHP 8 comes around, the main issue will lie: the deprecation of thembstring.func_overload setting in the php.ini.