====== PHP RFC: Change the default of $characters in mb_trim function ======
* Version: 0.1
* Date: 2024-04-03
* Author: Yuya Hamada, youkidearitai@gmail.com
* Status: Draft
* First Published at: http://wiki.php.net/rfc/mb_trim
===== Introduction =====
We found a problem with $characters, the second argument of mb_trim. This RFC will be a proposed solution to that problem.
First, the mbstring_arginfo.h file output by mbstring.stub.php, which is added by the mb_trim function, generates UTF-8 strings, which prevents it from compiling with some on Visual C++
https://github.com/php/php-src/issues/13789
Next is the problem that $characters in the mb_trim function cannot be trimmed with other character encodings with the default.
https://github.com/php/php-src/issues/13815
Putting all these together, we create this RFC that $characters in mb_trim is more appropriate to be null.
===== Proposal =====
Change the default of $characters in mb_trim, mb_ltrim and mb_rtrim functions
function mb_trim(string $string, ?string $characters = null, ?string $encoding = null): string
function mb_ltrim(string $string, ?string $characters = null, ?string $encoding = null): string
function mb_rtrim(string $string, ?string $characters = null, ?string $encoding = null): string
If $characters is null, the following characters are trimmed by default.
Here's the list of characters trimmed:
Same as trim:
U+0020 SPACE (also in Separator category)
U+0009 \t
U+000A \n
U+000B \v
U+000D \r
not removed in trim(), probably it wasn't common enough, but ok for mb_trim
U+000C \f
Removed in trim, but not included in regex \s
U+0000 \0
whole Separator Z category (20 codepoints) covered by regex \s:
U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
Other symbols (included in regex \s):
U+0085 NEXT LINE (NEL)
U+180E MONGOLIAN VOWEL SEPARATOR
===== Backward Incompatible Changes =====
This could break a function existing in userland with the same name.
===== Proposed PHP Version(s) =====
PHP 8.4
===== RFC Impact =====
==== To SAPIs ====
To SAPIs
Will add the aforementioned functions to all PHP environments.
==== To Existing Extensions ====
Fixes mb_trim(), mb_ltrim() and mb_rtrim() to the mbstring extension.
==== To Opcache ====
No effect.
==== New Constants ====
No new constants.
==== php.ini Defaults ====
No changed php.ini settings.
===== Open Issues =====
* https://github.com/php/php-src/issues/13789
* https://github.com/php/php-src/issues/13815
===== Future Scope =====
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
===== Proposed Voting Choices =====
Include these so readers know where you are heading and can discuss the proposed voting options.
===== Implementation =====
* https://github.com/php/php-src/pull/13820
===== References =====
https://wiki.php.net/rfc/mb_trim
===== Rejected Features =====
Keep this updated with features that were discussed on the mail lists.