rfc:multibyte_char_handling

This is an old revision of the document!


PHP RFC: Multibyte Char Handling

Introduction

For example, addslashes()/stripslashes() has vulnerability that allows PHP script execution and DoS under certain environment via char encoding based attack. Counter measure using php_mblen() is proposed at first in security ML. However, php_mblen() is locale dependent and unreliable. Therefore, this is proposed and created RFC to decide.

Proper and secure char encoding handling is mandatory for application to work correctly. However, current PHP lacks default multibyte char handling feature. As a result, escapeshellarg()/escapeshellcmd()/fgetcsv() rely on locale based multibyte char handling for security and proper operation. Currently, addslahses()/ver_export()/stripslashes() lacks proper multibyte char handling.

escapeshellarg()/escapeshellcmd()/fgetcsv() uses locale based multibyte char handling. i.e. php_mblem() Although this approach works mostly, it has several disadvantages.

  • Locale is not reliable and has issues on some locale, such as Turkish locale.
  • Locale is not visible for application programmers. Most developers just ignores locale.

New mbstring functions are added for users that are affected by this issue so that they can simply rename functions.

Proposal

PHP, including released versions, needs secure addslashes()/var_export()/stripslashes().

Compile mbstringp-ng as default compiled module, when mbstring-ng is ready. See following FRC for mbstring-ng details.

Alternative implementation of mbstring using ICU

Until mbstring-ng is ready, mbstring-ng is provided as EXPERIMENTAL module.

mbstring-ng implementation is subject to be changed. Vote for mbstring-ng is done separately.

Add mb_addslashes()/mb_var_export()/mb_stripslashes() to released versions

For PHP 5.3 and up, add mb_add_slashes()/mb_var_export()/mb_strip_slashes() has encoding option.

string mb_add_slashes(string $str [, string $encoding=internal_encoding])
string mb_var_export(mixed $var [, bool $return=FALSE [, string $encoding=internal_encoding]])
string mb_strip_slashes(string $str [, $encoding=internal_encoding])

Add mb version of function uses php_mblen()

For PHP 5.3 and up, add mb_escape_shell_arg()/mb_secape_shell_cmd()/mb_fget_csv()/etc that have extra encoding parameter like mb_add_slashes().

Reference

We still needs php_mblen() for command line output since it's not good idea to detect locale and using internal_encoding opens new vulnerability.

Functions that should use locale are

- mb_escape_shell_arg() - mb_escape_shell_cmd()

These function may override locale by encoding parameter. Since fgetcsv() uses locale now, do the same for fgetcsv().

mbstring usage and implementation

For PHP 5.3 and up, all changes done in mbstring.

mbstring is rather large module. Therefore, it is better to be able to build PHP without mbstring. Any function uses mbstring feature use “#if”, so that PHP could be built without mbstring if there is. Note that this RFC only use mbstring feature in mbstring module.

mbstring functions have history of remain insecure when single byte version of function's issue has been fixed. e.g. mb_prase_str(), mb_send_mail() Refactoring is preffered to avoid this issue, but refactoring is postponed until PHP6. i.e. There would be 2 codes that are mostly the same.

Note about short and long term resolution

Main objective of this RFC is to remove vulnerability like CVE-2014-1239 completely. To accomplish this objective, we need multibyte aware function by default which we don't have it right now.

We may compile current mbstring by default, but there is license issue for some users. mbstring-ng does not have such issue and it is preferred to use it as default with respect to license, but it's far from complete.

Since there is no feasible option right now, I'm proposing load map for short and long term resolution.

Please take mbstring-ng part as load map for long term resolution. There is “No BC issue” for short term resolution. (Add some functions to mbstring) I agree we would have BC issue for long term resolution. (Replace mbstring by mbstring-ng)

When mbstring-ng development is finished, we should have vote whether mbstring is replaced by mbstring-ng or not.

Backward Incompatible Changes

None for short term resolution. (Adding functions to mbstring)

Some for long term resolution. (Replacing mbstring by mbstring-ng)

Proposed PHP Version(s)

  • PHP 5.3 and up - Introduce additional mb_*() functions
  • PHP 5.6 or up - Introduce mbstring-ng and compile it as default compiled module when it's ready.

If mbstring-ng is not compiled as default until it's ready. mbstring co-exists until we are confident with mbstring-ng. For PHP 5.6, mbstring-ng will be EXPERIMENTAL module probably.

Impact to Existing Extensions

  • mbstring will be replaced by mbstring-ng in future release and mbstring is moved to PECL.

Open Issues

Use of mbstring.func_overload INI for overriding single byte string functions by mbstring functions is left open issue for future releases.

Some users are annoyed by sloppy multilingual implementations using this option. There is feature request from user who want to remove mbstring.func_overload INI option.

https://bugs.php.net/bug.php?id=65785

However, func_overload is extended for now.

Proposed Voting Choices

Yas/No

Patches and Tests

  • Prepared for review after vote.
  • When mbstring-ng development is finished, it is reviewed and have vote for replacing mbstring.

Vote

VOTE: 2014/01/26 - 2014/02/09

This vote is only for adding new mb_*() functions to released versions. mbstring-ng vote is done separately.

Add required mb_*() functions and prepare mbstring-ng as a default compiled module
Real name Yes No
derick (derick)  
dm (dm)  
levim (levim)  
malukenho (malukenho)  
nikic (nikic)  
yohgaki (yohgaki)  
Count: 1 5

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Related RFC

CVE

  • CVE is assigned for addslashes issue. CVE-2014-1239

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/multibyte_char_handling.1391308343.txt.gz · Last modified: 2017/09/22 13:28 (external edit)