rfc:altmbstring
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:altmbstring [2014/01/26 00:25] – yohgaki | rfc:altmbstring [2017/09/22 13:28] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Request for Comments: Alternative implementation of mbstring using ICU ====== | ====== Request for Comments: Alternative implementation of mbstring using ICU ====== | ||
- | * Version: 1.0 | + | * Version: 1.3 |
- | * Date: 2009-07-27 | + | * Date: 2014-02-05 |
- | * Author: Moriyoshi Koizumi < | + | * Author: |
- | * Status: | + | * Status: |
* First Published at: http:// | * First Published at: http:// | ||
- | Note: This RFC is consolidated into | ||
- | https:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
Line 13: | Line 11: | ||
This RFC discusses the alternative implementation of mbstring extension that in turn uses ICU instead of libmbfl. | This RFC discusses the alternative implementation of mbstring extension that in turn uses ICU instead of libmbfl. | ||
- | ===== Rationale ===== | + | Note: This RFC is related to |
+ | https:// | ||
- | Ever since its introduction in the very first version of PHP 4, mbstring | + | Note: This RFC is also address LGPL license issue used by current |
+ | |||
+ | ===== Rationale ===== | ||
+ | * LGPL license - libmbfl(multibyte filter) and Oniguruma(Multibyte regular expression) is licensed by LGPL. Users that complie PHP statically may have license problem. | ||
* Lack of understanding -- It took long for those who don't use Unicode or other non-single-byte codesets to figure out how essential the functionality this extension covers, just until recently. | * Lack of understanding -- It took long for those who don't use Unicode or other non-single-byte codesets to figure out how essential the functionality this extension covers, just until recently. | ||
* Huge bundled libraries -- One of the bundled libraries, libmbfl, consists of a large set of Unicode-to-legacy charset mapping tables and vice versa. This may look redundant to those who aren't interested in manipulating multibyte strings. | * Huge bundled libraries -- One of the bundled libraries, libmbfl, consists of a large set of Unicode-to-legacy charset mapping tables and vice versa. This may look redundant to those who aren't interested in manipulating multibyte strings. | ||
Line 58: | Line 60: | ||
* mb_substr_count() | * mb_substr_count() | ||
- | ==== Removed (deprecated) functions and reasons behind it ==== | + | ==== Features to be implemented |
- | | + | |
- | * mb_convert_case() -- Use mb_strtoupper(), | + | |
- | * mb_convert_kana() -- This can' | + | |
- | * mb_convert_variables() | + | |
- | * mb_decode_mimeheader() and mb_encode_mimeheader() -- Non-standard compliancy. | + | |
- | * mb_decode_numericentity() -- Removed in favor of html_entity_decode(). | + | |
- | * mb_encode_numericentity() -- Removed in favor of htmlentities() and htmlspecialchars(). | + | |
- | * mb_encoding_aliases() -- Just unnecessary. | + | |
- | * mb_ereg_match() -- Use mb_ereg() | + | |
- | * mb_ereg_search(), | + | |
- | * mb_eregi() -- Use mb_regex_options() and mb_ereg() | + | |
- | * mb_eregi_replace() -- I wonder why this function was added in the first place because giving ' | + | |
- | * mb_detect_order(), | + | |
- | * mb_regex_encoding() -- It is really confusing that the current mbstring allows two different encoding defaults for regex functions and the rest. Those settings are unified in the alternative version and so this is no longer necessary. | + | |
- | * mb_send_mail() -- The behavior of this function relies on the pseudo-locale setting called " | + | |
- | * mb_strrchr() -- Use mb_strrpos(). | + | |
- | * mb_strrichr() -- Use mb_strripos(). | + | |
==== Known / remaining limitations and incompatibilities ==== | ==== Known / remaining limitations and incompatibilities ==== | ||
Line 84: | Line 71: | ||
* The group reference placeholders for mb_ereg_replace() is now $0, $1, $2... instead of \0, \1, \2. This can be avoided if we don't use uregex_replaceAll() and implement our own. | * The group reference placeholders for mb_ereg_replace() is now $0, $1, $2... instead of \0, \1, \2. This can be avoided if we don't use uregex_replaceAll() and implement our own. | ||
* ILP64 :-P | * ILP64 :-P | ||
+ | |||
+ | ===== Proposal ===== | ||
+ | |||
+ | Introduce mbsgring-ng as EXPERIMENTAL module for testing compatibility against existing applications. | ||
+ | |||
+ | ===== Future Scope ==== | ||
+ | |||
+ | Compiling multibyte aware module by default is important for eliminating vulnerabilities related to character encoding. Compile mbstring-ng by default when it is ready. Replace mbstring by mstring-ng if it is possible. | ||
+ | |||
+ | There will be a RFC for replacing mbstring by mbstring-ng, | ||
+ | |||
+ | It is better to remove LGPLed code from 'must have' module. mbstring-ng shall remove this issue. | ||
+ | |||
+ | **Note: Even when PHP supports Unicode internally, multibyte aware features/ | ||
+ | |||
+ | ===== PHP Version ===== | ||
+ | |||
+ | PHP 5.6 and up | ||
+ | |||
+ | ===== VOTE ===== | ||
+ | |||
+ | VOTE: 2014/02/10 - 2014/02/17 | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
+ | |||
+ | Thank you for voting! | ||
+ | |||
+ | ===== Reference ===== | ||
+ | |||
+ | * https:// | ||
===== Changelog ===== | ===== Changelog ===== | ||
- | - 2009-07-27 Moriyoshi Koizumi: Initial | + | - 2014-01-27 Yasuo Ohgaki: Updated to replace existing mbstring |
+ | |
rfc/altmbstring.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1