PHP RFC: Multibyte Char Handling


For example, addslashes()/stripslashes() has vulnerability that allows PHP script execution and DoS under certain environment via char encoding based attack. Counter measure using php_mblen() is proposed at first in security ML. However, php_mblen() is locale dependent and unreliable. Therefore, this is proposed and created RFC to decide.

Proper and secure char encoding handling is mandatory for application to work correctly. However, current PHP lacks default multibyte char handling feature. As a result, escapeshellarg()/escapeshellcmd()/fgetcsv() rely on locale based multibyte char handling for security and proper operation. Currently, addslahses()/ver_export()/stripslashes() lacks proper multibyte char handling.

escapeshellarg()/escapeshellcmd()/fgetcsv() uses locale based multibyte char handling. i.e. php_mblem() Although this approach works mostly, it has several disadvantages.

  • Locale is not reliable and has issues on some locale, such as Turkish locale.
  • Locale is not visible for application programmers. Most developers just ignores locale.

New mbstring functions are added for users that are affected by this issue so that they can simply rename functions.


PHP, including released versions, needs secure addslashes()/var_export()/stripslashes().

Add mb_addslashes()/mb_var_export()/mb_stripslashes() to released versions

For PHP 5.4 and up, add mb_add_slashes()/mb_var_export()/mb_strip_slashes() has encoding option.

  string mb_add_slashes(string $str [, string $encoding=internal_encoding])
  string mb_strip_slashes(string $str [, $encoding=internal_encoding])
  string mb_add_cslashes(string $str [, string $encoding=internal_encoding])
  string mb_strip_cslashes(string $str [, $encoding=internal_encoding])
  string mb_var_export(mixed $var [, bool $return=FALSE [, string $encoding=internal_encoding]])

addcslashes()/stripcslashs() needs to be multibyte aware for the same reason addslashes()/stripslashes().

Add mb version of function uses php_mblen()

For PHP 5.4 and up, add mb_escape_shell_arg()/mb_secape_shell_cmd()/mb_fget_csv()/etc that have extra encoding parameter like mb_add_slashes().

  • mb_escape_shell_arg()
  • mb_escape_shsell_cmd()
  • functions/methods use php_fgetcsv API
    • mb_file_get_csv()
    • mb_file_put_csv()


We still needs php_mblen() for command line output since it's not good idea to detect locale and using internal_encoding opens new vulnerability.

Functions that should use locale are

  • mb_escape_shell_arg()
  • mb_escape_shell_cmd()

These function may override locale by encoding parameter. Since fgetcsv() uses locale now, do the same for fgetcsv().


Some users are annoyed by sloppy multilingual implementations using this option. There is feature request from user who want to remove mbstring.func_overload INI option.


However, func_overload is extended for now.

mbstring usage and implementation

For PHP 5.4 and up, all changes done in mbstring.

mbstring functions have history of remain insecure when single byte version of function's issue has been fixed. e.g. mb_prase_str(), mb_send_mail() Refactoring is preffered to avoid this issue, but refactoring is postponed until PHP6. i.e. There would be 2 codes that are mostly the same.

In short, if some one fixes related function, do not forget to update mbstring code also.

Note about short and long term resolution

  • Short term resolution: Add required function to mbstring
  • Long term resolution: Replace mbstring with mbstring-ng to provide multibyte aware functions by default. mbstring-ng does not have license issue.

This RFC is for short term resolution.

Main objective is to remove vulnerability like CVE-2014-1239. To accomplish this objective, we need multibyte aware function by default which we don't have it now.

To remove vulnerability like CVE-2014-129 from user scripts, there must be multibyte aware functions by default. We may compile current mbstring by default, but there is license issue for some users. mbstring-ng does not have such issue and it is preferred to use it as default with respect to license, but it's far from complete.

Since there is no feasible option right now, short and long term resolution is needed.

Backward Incompatible Changes

None. (Adding functions to mbstring)

Proposed PHP Version(s)

  • PHP 5.4 and up - Introduce additional mb_*() functions

Future Scope

  • mbstring may be replaced by mbstring-ng in future release and mbstring may be moved to PECL.

There is other RFC for introducing mbstring-ng as a EXPERIMENTAL module.

When mbstring-ng development is finished, there will be a vote whether mbstring is replaced by mbstring-ng or not.

Open Issues

Proposed Voting Choices


Patches and Tests

  • Prepared for review after vote.


VOTE: 2014/02/10 - 2014/02/17

This vote is only for adding new mb_*() functions to released versions.

Add required mb_*() functions to fix vulnerability
Real name Yes No
aharvey (aharvey)  
brianlmoon (brianlmoon)  
chobieeee (chobieeee)  
derick (derick)  
klaussilveira (klaussilveira)  
krakjoe (krakjoe)  
levim (levim)  
mbeccati (mbeccati)  
nikic (nikic)  
peehaa (peehaa)  
treffynnon (treffynnon)  
yohgaki (yohgaki)  
Final result: 2 10
This poll has been closed.

Thank you for voting.

If you vote No for this, please provide alternative short term resolution for CVE-2014-1239.


After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature


Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/multibyte_char_handling.txt · Last modified: 2014/02/24 10:42 by yohgaki