This is an old revision of the document!
PHP RFC: Multibyte Char Handling
- Version: 1.2
- Date: 2014-01-28
- Author: Yasuo Ohgaki, yohgaki@ohgaki.net
- Status: Under discussion
- First Published at: http://wiki.php.net/rfc/multibyte_char_handling
Introduction
For example, addslashes()/stripslashes() has vulnerability that allows PHP script execution and DoS under certain environment via char encoding based attack. Counter measure using php_mblen() is proposed at first in security ML. However, php_mblen() is locale dependent and unreliable. Therefore, this is proposed and created RFC to decide.
Proper and secure char encoding handling is mandatory for application to work correctly. However, current PHP lacks default multibyte char handling feature. As a result, escapeshellarg()/escapeshellcmd()/fgetcsv() rely on locale based multibyte char handling for security and proper operation. Currently, addslahses()/ver_export()/stripslashes() lacks proper multibyte char handling.
escapeshellarg()/escapeshellcmd()/fgetcsv() uses locale based multibyte char handling. i.e. php_mblem() Although this approach works mostly, it has several disadvantages.
- Locale is not reliable and has issues on some locale, such as Turkish locale.
- Locale is not visible for application programmers. Most developers just ignores locale.
New mbstring functions are added for users that are affected by this issue so that they can simply rename functions.
Proposal
PHP, including released versions, needs secure addslashes()/var_export()/stripslashes().
Compile mbstringp-ng as default compiled module, when mbstring-ng is ready. See following FRC for mbstring-ng details.
Alternative implementation of mbstring using ICU
Until mbstring-ng is ready, mbstring-ng is provided as EXPERIMENTAL module.
mbstring-ng implementation is subject to be changed. Vote for mbstring-ng is done separately.
Add mb_addslashes()/mb_var_export()/mb_stripslashes() to released versions
For PHP 5.3 and up, add mb_add_slashes()/mb_var_export()/mb_strip_slashes() has encoding option.
string mb_add_slashes(string $str [, string $encoding=internal_encoding]) string mb_var_export(mixed $var [, bool $return=FALSE [, string $encoding=internal_encoding]]) string mb_strip_slashes(string $str [, $encoding=internal_encoding])
Add mb version of function uses php_mblen()
For PHP 5.3 and up, add mb_escape_shell_arg()/mb_secape_shell_cmd()/mb_fget_csv()/etc that have extra encoding parameter like mb_add_slashes().
- mb_escape_shell_arg
- mb_escape_shsell_cmd
- functions/methods use php_fgetcsv API http://lxr.php.net/search?q=php_fgetcsv&project=PHP_5_5
Reference
We still needs php_mblen() for command line output since it's not good idea to detect locale and using internal_encoding opens new vulnerability.
Functions that should use locale are
- mb_escape_shell_arg()
- mb_escape_shell_cmd()
These function may override locale by encoding parameter. Since fgetcsv() uses locale now, do the same for fgetcsv().
mbstring usage and implementation
For PHP 5.3 and up, all changes done in mbstring.
mbstring is rather large module. Therefore, it is better to be able to build PHP without mbstring. Any function uses mbstring feature use “#if”, so that PHP could be built without mbstring if there is. Note that this RFC only use mbstring feature in mbstring module.
mbstring functions have history of remain insecure when single byte version of function's issue has been fixed. e.g. mb_prase_str(), mb_send_mail() Refactoring is preffered to avoid this issue, but refactoring is postponed until PHP6. i.e. There would be 2 codes that are mostly the same.
Note about short and long term resolution
Main objective of this RFC is to remove vulnerability like CVE-2014-1239 completely. To accomplish this objective, we need multibyte aware function by default which we don't have it right now.
We may compile current mbstring by default, but there is license issue for some users. mbstring-ng does not have such issue and it is preferred to use it as default with respect to license, but it's far from complete.
Since there is no feasible option right now, I'm proposing load map for short and long term resolution.
Please take mbstring-ng part as load map for long term resolution. There is “No BC issue” for short term resolution. (Add some functions to mbstring) I agree we would have BC issue for long term resolution. (Replace mbstring by mbstring-ng)
When mbstring-ng development is finished, we should have vote whether mbstring is replaced by mbstring-ng or not.
Backward Incompatible Changes
None for short term resolution. (Adding functions to mbstring)
Some for long term resolution. (Replacing mbstring by mbstring-ng)
Proposed PHP Version(s)
- PHP 5.3 and up - Introduce additional mb_*() functions
- PHP 5.6 or up - Introduce mbstring-ng and compile it as default compiled module when it's ready.
If mbstring-ng is not compiled as default until it's ready. mbstring co-exists until we are confident with mbstring-ng. For PHP 5.6, mbstring-ng will be EXPERIMENTAL module probably.
Impact to Existing Extensions
- mbstring will be replaced by mbstring-ng in future release and mbstring is moved to PECL.
Open Issues
Use of mbstring.func_overload INI for overriding single byte string functions by mbstring functions is left open issue for future releases.
Some users are annoyed by sloppy multilingual implementations using this option. There is feature request from user who want to remove mbstring.func_overload INI option.
https://bugs.php.net/bug.php?id=65785
However, func_overload is extended for now.
Proposed Voting Choices
Yas/No
Patches and Tests
- Prepared for review after vote.
- mbstring-ng is in https://wiki.php.net/rfc/altmbstring
- When mbstring-ng development is finished, it is reviewed and have vote for replacing mbstring.
Vote
VOTE: 2014/02/XX - 2014/02/XX
This vote is only for adding new mb_*() functions to released versions. mbstring-ng vote is done separately.
Implementation
After the project is implemented, this section should contain
- the version(s) it was merged to
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
References
Related RFC
CVE
- CVE is assigned for addslashes issue. CVE-2014-1239
Rejected Features
Keep this updated with features that were discussed on the mail lists.