====== PHP RFC: Change the default of $characters in mb_trim function ====== * Version: 0.1 * Date: 2024-04-03 * Author: Yuya Hamada, youkidearitai@gmail.com * Status: Draft * First Published at: http://wiki.php.net/rfc/mb_trim ===== Introduction ===== We found a problem with $characters, the second argument of mb_trim. This RFC will be a proposed solution to that problem. First, the mbstring_arginfo.h file output by mbstring.stub.php, which is added by the mb_trim function, generates UTF-8 strings, which prevents it from compiling with some on Visual C++ https://github.com/php/php-src/issues/13789 Next is the problem that $characters in the mb_trim function cannot be trimmed with other character encodings with the default. https://github.com/php/php-src/issues/13815 Putting all these together, we create this RFC that $characters in mb_trim is more appropriate to be null. ===== Proposal ===== Change the default of $characters in mb_trim, mb_ltrim and mb_rtrim functions function mb_trim(string $string, ?string $characters = null, ?string $encoding = null): string function mb_ltrim(string $string, ?string $characters = null, ?string $encoding = null): string function mb_rtrim(string $string, ?string $characters = null, ?string $encoding = null): string If $characters is null, the following characters are trimmed by default. Here's the list of characters trimmed: Same as trim: U+0020 SPACE (also in Separator category) U+0009 \t U+000A \n U+000B \v U+000D \r not removed in trim(), probably it wasn't common enough, but ok for mb_trim U+000C \f Removed in trim, but not included in regex \s U+0000 \0 whole Separator Z category (20 codepoints) covered by regex \s: U+0020 SPACE U+00A0 NO-BREAK SPACE U+1680 OGHAM SPACE MARK U+2000 EN QUAD U+2001 EM QUAD U+2002 EN SPACE U+2003 EM SPACE U+2004 THREE-PER-EM SPACE U+2005 FOUR-PER-EM SPACE U+2006 SIX-PER-EM SPACE U+2007 FIGURE SPACE U+2008 PUNCTUATION SPACE U+2009 THIN SPACE U+200A HAIR SPACE U+2028 LINE SEPARATOR U+2029 PARAGRAPH SEPARATOR U+202F NARROW NO-BREAK SPACE U+205F MEDIUM MATHEMATICAL SPACE U+3000 IDEOGRAPHIC SPACE Other symbols (included in regex \s): U+0085 NEXT LINE (NEL) U+180E MONGOLIAN VOWEL SEPARATOR ===== Backward Incompatible Changes ===== This could break a function existing in userland with the same name. ===== Proposed PHP Version(s) ===== PHP 8.4 ===== RFC Impact ===== ==== To SAPIs ==== To SAPIs Will add the aforementioned functions to all PHP environments. ==== To Existing Extensions ==== Fixes mb_trim(), mb_ltrim() and mb_rtrim() to the mbstring extension. ==== To Opcache ==== No effect. ==== New Constants ==== No new constants. ==== php.ini Defaults ==== No changed php.ini settings. ===== Open Issues ===== * https://github.com/php/php-src/issues/13789 * https://github.com/php/php-src/issues/13815 ===== Future Scope ===== This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC. ===== Proposed Voting Choices ===== Include these so readers know where you are heading and can discuss the proposed voting options. ===== Implementation ===== * https://github.com/php/php-src/pull/13820 ===== References ===== https://wiki.php.net/rfc/mb_trim ===== Rejected Features ===== Keep this updated with features that were discussed on the mail lists.