ideas:php6:unicode

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ideas:php6:unicode [2014/02/20 06:38]
pajoye
ideas:php6:unicode [2017/09/22 13:28]
Line 1: Line 1:
-====== Unicode Support ====== 
-Author: Pierre Joye 
  
-Status: Under discussion 
- 
-Unicode still remains one of the top requested features in PHP. 
- 
-However as Rasmus and other stated earlier, it is not a trivial job. 
-Some of the keys point we need to take care of are: 
- 
-  * UTF-8 storage 
-  * UTF-8 support for almost (if not all) existing string APIs 
-  * Performance 
- 
-As of today, I did not find any library covering at least two of these 
-key points. 
- 
-Please keep in mind that I am by no mean a Unicode expert, and this 
-summary is what I gather by reading the ICU and other projects 
-documentation and discussions archives. Experiments still have to be 
-done. However I rather prefer to discuss the options prior to go wild 
-with an implementation (huge task, even for basic features coverage). 
- 
-If one of the following statement is wrong or not accurate, please fix 
-it. I will keep a dedicated wiki page to summarize the discussions and 
-options about unicode support. 
- 
-====== ICU ====== 
- 
-U_CHARSET_IS_UTF8 allows to force ICU to use UTF-8 by default. It is a 
-ICU compile time setting.It is is not possible to set it at PHP 
-configure time. It means that users will have to create their own 
-build. Alternatively we can bundle ICU but this will be awkward, a 
-maintenance nightmare for both php and the distros. 
- 
-Alternatively UText can be used to create UTF-8 string. APIs accepting 
-UText allow almost everything we need. However the counterpart is that 
-a UTF-8 UText is readonly. Any operation altering its content will 
-require duplication, clones or conversions. That may kill all gains we 
-got from using UTF-8 only. 
- 
-The  U_CHARSET_IS_UTF8 is very appealing but to bundle ICU is actually 
- show stopper. Asking users to custom build ICU is not an option 
-either. I do not know if the distros will be ready to provide two 
-different builds of ICU either, it may add a lot of issues with all 
-projects using ICU. 
- 
-====== UTF8proc ====== 
- 
-utf8proc is very attractive, small and relatively fast. I see it as a 
-good starting point. However its features cover a very little part of 
-what PHP needs.It is easy to bundle but will require a fork and a lot 
-of work to add all missing features. 
- 
-====== librope ====== 
- 
-Same comments than utf8proc, with even less features. 
- 
-I would like to begin to discuss our option now already. I am not 
-asking to get in all implementation details from a userland point of 
-view (like u"some text" or addng new APIs or not) but only to see what 
-we can do internally to work with UTF-8 string. 
- 
-====== References ====== 
-  * http://userguide.icu-project.org/strings/utf-8 
-  * https://github.com/josephg/librope 
-  * https://code.google.com/p/easl/ 
ideas/php6/unicode.txt · Last modified: 2017/09/22 13:28 (external edit)