rfc:altmbstring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:altmbstring [2014/01/27 00:25] yohgakirfc:altmbstring [2017/09/22 13:28] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== Request for Comments: Alternative implementation of mbstring using ICU ====== ====== Request for Comments: Alternative implementation of mbstring using ICU ======
-  * Version: 1.1 +  * Version: 1.3 
-  * Date: 2014-01-27 +  * Date: 2014-02-05 
-  * Author: Moriyoshi Koizumi <moriyoshi@php.net> +  * Author: Yasuo Ohgaki <yohgaki@ohgaki.net> Moriyoshi Koizumi <moriyoshi@php.net> 
-  * Status: Under Discussion+  * Status: Declined
   * First Published at: http://wiki.php.net/rfc/altmbstring   * First Published at: http://wiki.php.net/rfc/altmbstring
  
-Note: This RFC is consolidated into  
-https://wiki.php.net/rfc/multibyte_char_handling 
  
 ===== Introduction ===== ===== Introduction =====
Line 13: Line 11:
 This RFC discusses the alternative implementation of mbstring extension that in turn uses ICU instead of libmbfl. This RFC discusses the alternative implementation of mbstring extension that in turn uses ICU instead of libmbfl.
  
-===== Rationale =====+Note: This RFC is related to  
 +https://wiki.php.net/rfc/multibyte_char_handling This RFC is for long term resolution for multibyte character encoding related issues.
  
-Ever since its introduction in the very first version of PHP 4, mbstring extension has been controversial in a sense supposedly owing to the following reasons:+Note: This RFC is also address LGPL license issue used by current mbstring module. It is preferred to have alternative to mbstring that does not have license issue. 
 + 
 +===== Rationale =====
  
 +   * LGPL license - libmbfl(multibyte filter) and Oniguruma(Multibyte regular expression) is licensed by LGPL. Users that complie PHP statically may have license problem.
    * Lack of understanding -- It took long for those who don't use Unicode or other non-single-byte codesets to figure out how essential the functionality this extension covers, just until recently.    * Lack of understanding -- It took long for those who don't use Unicode or other non-single-byte codesets to figure out how essential the functionality this extension covers, just until recently.
    * Huge bundled libraries -- One of the bundled libraries, libmbfl, consists of a large set of Unicode-to-legacy charset mapping tables and vice versa. This may look redundant to those who aren't interested in manipulating multibyte strings.    * Huge bundled libraries -- One of the bundled libraries, libmbfl, consists of a large set of Unicode-to-legacy charset mapping tables and vice versa. This may look redundant to those who aren't interested in manipulating multibyte strings.
Line 60: Line 62:
 ==== Features to be implemented ==== ==== Features to be implemented ====
  
-   * All features that exist in mbstring will be ported to mbstring-ng unless there is technical difficulty.+   * All features that exist in mbstring will be ported to mbstring-ng unless there are technical difficulties.
  
 ==== Known / remaining limitations and incompatibilities ==== ==== Known / remaining limitations and incompatibilities ====
Line 69: Line 71:
    * The group reference placeholders for mb_ereg_replace() is now $0, $1, $2... instead of \0, \1, \2.  This can be avoided if we don't use uregex_replaceAll() and implement our own.    * The group reference placeholders for mb_ereg_replace() is now $0, $1, $2... instead of \0, \1, \2.  This can be avoided if we don't use uregex_replaceAll() and implement our own.
    * ILP64  :-P     * ILP64  :-P 
 +
 +===== Proposal =====
 +
 +Introduce mbsgring-ng as EXPERIMENTAL module for testing compatibility against existing applications.
 +
 +===== Future Scope ====
 +
 +Compiling multibyte aware module by default is important for eliminating vulnerabilities related to character encoding. Compile mbstring-ng by default when it is ready. Replace mbstring by mstring-ng if it is possible.
 +
 +There will be a RFC for replacing mbstring by mbstring-ng, how it will be replaced, what to do with legacy mbstring, etc. Replacing module would be PHP 6 matter as it would break some applications.
 +
 +It is better to remove LGPLed code from 'must have' module. mbstring-ng shall remove this issue.
 +
 +**Note: Even when PHP supports Unicode internally, multibyte aware features/functions are needed to handle char encoding properly. Unicode does not solve all issues. There should be some module to handle it. Otherwise, default string functions must have encoding parameters and it would be  copy of mb_*() functions.**
 +
 +===== PHP Version =====
 +
 +PHP 5.6 and up
 +
 +===== VOTE =====
 +
 +VOTE: 2014/02/10 - 2014/02/17
 + 
 +<doodle title="Include mbstring-ng for PHP-5.6 as EXPERIMENTAL module" auth="yohgaki" voteType="single" closed="true">
 +   * Yes
 +   * No
 +</doodle>
 +
 +Thank you for voting!
 +
 +===== Reference =====
 +
 +  * https://wiki.php.net/rfc/multibyte_char_handling
  
 ===== Changelog ===== ===== Changelog =====
rfc/altmbstring.1390782321.txt.gz · Last modified: 2017/09/22 13:28 (external edit)