Table of Contents

Request for Comments: Alternative implementation of mbstring using ICU

Introduction

This RFC discusses the alternative implementation of mbstring extension that in turn uses ICU instead of libmbfl.

Note: This RFC is related to https://wiki.php.net/rfc/multibyte_char_handling This RFC is for long term resolution for multibyte character encoding related issues.

Note: This RFC is also address LGPL license issue used by current mbstring module. It is preferred to have alternative to mbstring that does not have license issue.

Rationale

To overcome these issues, a complete rewrite of the extension has long been wanted. But it didn't come into reality because there was no good Unicode library. Now that ICU is stable and we already relies on it (intl in 5.3), why not make it happen?

Preliminary stuff

It is currently hosted by GitHub.

http://github.com/moriyoshi/mbstring-ng/

Implemented functions

Features to be implemented

Known / remaining limitations and incompatibilities

Proposal

Introduce mbsgring-ng as EXPERIMENTAL module for testing compatibility against existing applications.

Future Scope

Compiling multibyte aware module by default is important for eliminating vulnerabilities related to character encoding. Compile mbstring-ng by default when it is ready. Replace mbstring by mstring-ng if it is possible.

There will be a RFC for replacing mbstring by mbstring-ng, how it will be replaced, what to do with legacy mbstring, etc. Replacing module would be PHP 6 matter as it would break some applications.

It is better to remove LGPLed code from 'must have' module. mbstring-ng shall remove this issue.

Note: Even when PHP supports Unicode internally, multibyte aware features/functions are needed to handle char encoding properly. Unicode does not solve all issues. There should be some module to handle it. Otherwise, default string functions must have encoding parameters and it would be copy of mb_*() functions.

PHP Version

PHP 5.6 and up

VOTE

VOTE: 2014/02/10 - 2014/02/17

Include mbstring-ng for PHP-5.6 as EXPERIMENTAL module
Real name Yes No
aharvey (aharvey)  
bwoebi (bwoebi)  
chobieeee (chobieeee)  
derick (derick)  
krakjoe (krakjoe)  
mbeccati (mbeccati)  
nikic (nikic)  
pollita (pollita)  
treffynnon (treffynnon)  
tyrael (tyrael)  
yohgaki (yohgaki)  
Final result: 1 10
This poll has been closed.

Thank you for voting!

Reference

Changelog

  1. 2014-01-27 Yasuo Ohgaki: Updated to replace existing mbstring
  2. 2009-07-27 Moriyoshi Koizumi: Initial