PHP RFC: Change the default of $characters in mb_trim function
- Version: 0.1
- Date: 2024-04-03
- Author: Yuya Hamada, youkidearitai@gmail.com
- Status: Draft
- First Published at: http://wiki.php.net/rfc/mb_trim
Introduction
We found a problem with $characters, the second argument of mb_trim. This RFC will be a proposed solution to that problem.
First, the mbstring_arginfo.h file output by mbstring.stub.php, which is added by the mb_trim function, generates UTF-8 strings, which prevents it from compiling with some on Visual C++
https://github.com/php/php-src/issues/13789
Next is the problem that $characters in the mb_trim function cannot be trimmed with other character encodings with the default.
https://github.com/php/php-src/issues/13815
Putting all these together, we create this RFC that $characters in mb_trim is more appropriate to be null.
Proposal
Change the default of $characters in mb_trim, mb_ltrim and mb_rtrim functions
function mb_trim(string $string, ?string $characters = null, ?string $encoding = null): string
function mb_ltrim(string $string, ?string $characters = null, ?string $encoding = null): string
function mb_rtrim(string $string, ?string $characters = null, ?string $encoding = null): string
If $characters is null, the following characters are trimmed by default.
Here's the list of characters trimmed:
Same as trim:
U+0020 SPACE (also in Separator category) U+0009 \t U+000A \n U+000B \v U+000D \r
not removed in trim(), probably it wasn't common enough, but ok for mb_trim
U+000C \f
Removed in trim, but not included in regex \s
U+0000 \0
whole Separator Z category (20 codepoints) covered by regex \s:
U+0020 SPACE U+00A0 NO-BREAK SPACE U+1680 OGHAM SPACE MARK U+2000 EN QUAD U+2001 EM QUAD U+2002 EN SPACE U+2003 EM SPACE U+2004 THREE-PER-EM SPACE U+2005 FOUR-PER-EM SPACE U+2006 SIX-PER-EM SPACE U+2007 FIGURE SPACE U+2008 PUNCTUATION SPACE U+2009 THIN SPACE U+200A HAIR SPACE U+2028 LINE SEPARATOR U+2029 PARAGRAPH SEPARATOR U+202F NARROW NO-BREAK SPACE U+205F MEDIUM MATHEMATICAL SPACE U+3000 IDEOGRAPHIC SPACE
Other symbols (included in regex \s):
U+0085 NEXT LINE (NEL) U+180E MONGOLIAN VOWEL SEPARATOR
Backward Incompatible Changes
This could break a function existing in userland with the same name.
Proposed PHP Version(s)
PHP 8.4
RFC Impact
To SAPIs
To SAPIs Will add the aforementioned functions to all PHP environments.
To Existing Extensions
Fixes mb_trim(), mb_ltrim() and mb_rtrim() to the mbstring extension.
To Opcache
No effect.
New Constants
No new constants.
php.ini Defaults
No changed php.ini settings.
Open Issues
Future Scope
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
Proposed Voting Choices
Include these so readers know where you are heading and can discuss the proposed voting options.
Implementation
References
Rejected Features
Keep this updated with features that were discussed on the mail lists.