rfc:mb_trim

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:mb_trim [2023/10/18 06:50] – Add from 8ctopus youkidearitairfc:mb_trim [2024/04/15 08:40] (current) – old revision restored (2023/11/24 06:26) youkidearitai
Line 2: Line 2:
   * Version: 0.1   * Version: 0.1
   * Date: 2023-10-18   * Date: 2023-10-18
-  * Author: Yuya Hamada (youkidearitai), youkidearitai@gmail.com based on 8ctopus(https://github.com/8ctopus), hello@octopuslabs.io +  * Author: Yuya Hamada (https://github.com/youkidearitai), youkidearitai@gmail.com based on 8ctopus(https://github.com/8ctopus), hello@octopuslabs.io 
-  * Status: Draft+  * Status: Implemented
   * First Published at: http://wiki.php.net/rfc/mb_trim   * First Published at: http://wiki.php.net/rfc/mb_trim
  
-This is a suggested template for PHP Request for Comments (RFCs). Change this template to suit your RFC Not all RFCs need to be tightly specified Not all RFCs need all the sections below. +===== Introduction ===== 
-Read https://wiki.php.net/rfc/howto carefully!+PHP does not have a multibyte equivalent of the trim function. It is possible to get close enough behavior using preg_replace("/^\s+|\s+$/u", '', $string), however adding a pre-built function to do this will improve the readability and clarity of PHP codeIt will also standardize how it is done as it can be trickyThis feature would be of use to many PHP developers with varying levels of experience and would complete the mbstring extension.
  
 +One of use case is "trim Byte Order Mark". I think mb_ltrim would be work:
  
-Quoting [[http://news.php.net/php.internals/71525|Rasmus]]: +<code
- +mb_ltrim($string, "\u{FEFF}\u{FFFE}")
-PHP is and should remain: +</code>
-> 1) a pragmatic web-focused language +
-> 2) a loosely typed language +
-> 3) a language which caters to the skill-levels and platforms of a wide range of users +
- +
-Your RFC should move PHP forward following his vision. As [[http://news.php.net/php.internals/66065|said by Zeev Suraski]] "Consider only features which have significant traction to a +
-large chunk of our userbaseand not something that could be useful in some +
-extremely specialized edge cases [...] Make sure you think about the full context, the huge audience out there, the consequences of  making the learning curve steeper with +
-every new feature, and the scope of the goodness that those new features bring." +
- +
-===== Introduction ===== +
-PHP does not have a multibyte equivalent of the trim function. It is possible to get close enough behavior using preg_replace("/^\s+|\s+$/u", '', $string), however adding a pre-built function to do this will improve the readability and clarity of PHP code. It will also standardize how it is done as it can be tricky. This feature would be of use to many PHP developers with varying levels of experience and would complete the mbstring extension.+
  
 ===== Proposal ===== ===== Proposal =====
 Add mb_trim() function: Add mb_trim() function:
  
 +<code>
 function mb_trim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}"): string function mb_trim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}"): string
 +</code>
 +<code>
 function mb_ltrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", ?string $encoding = null): string {} function mb_ltrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", ?string $encoding = null): string {}
 +</code>
 +<code>
 function mb_rtrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", ?string $encoding = null): string {} function mb_rtrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", ?string $encoding = null): string {}
 +</code>
  
 Here's the list of characters trimmed: Here's the list of characters trimmed:
  
 Same as trim: Same as trim:
 +<code>
 U+0020 SPACE (also in Separator category) U+0020 SPACE (also in Separator category)
 U+0009 \t U+0009 \t
Line 41: Line 37:
 U+000B \v U+000B \v
 U+000D \r U+000D \r
 +</code>
  
 not removed in trim(), probably it wasn't common enough, but ok for mb_trim not removed in trim(), probably it wasn't common enough, but ok for mb_trim
 +<code>
 U+000C \f U+000C \f
 +</code>
  
 Removed in trim, but not included in regex \s Removed in trim, but not included in regex \s
 +<code>
 U+0000 \0 U+0000 \0
 +</code>
  
 whole Separator Z category (20 codepoints) covered by regex \s: whole Separator Z category (20 codepoints) covered by regex \s:
 +<code>
 U+0020 SPACE U+0020 SPACE
 U+00A0 NO-BREAK SPACE U+00A0 NO-BREAK SPACE
Line 68: Line 70:
 U+205F MEDIUM MATHEMATICAL SPACE U+205F MEDIUM MATHEMATICAL SPACE
 U+3000 IDEOGRAPHIC SPACE U+3000 IDEOGRAPHIC SPACE
 +</code>
  
-Other symbols:+Other symbols (included in regex \s): 
 +<code>
 U+0085 NEXT LINE (NEL) U+0085 NEXT LINE (NEL)
 U+180E MONGOLIAN VOWEL SEPARATOR U+180E MONGOLIAN VOWEL SEPARATOR
 +</code>
 +
 +On the other hand, The ".." notation for $characters that was in the trim function was not supported. ex: \u{0000}..\u{FFFF}
 +Because the reason is below:
 +
 +  * Unicode character is very wide
 +    *  Difficult to search
 +    * Difficult to store in memory
 +    * Mapping with other character codes may be incompatible
 +      * For example, to express Hiragana, UTF-8 uses [あ-ゞ], EUC-JP [あ-ゝゞ], and Shift_JIS [あ-ん].
 +
 +
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
Line 98: Line 114:
 ===== Open Issues ===== ===== Open Issues =====
 https://github.com/php/php-src/issues/9216 https://github.com/php/php-src/issues/9216
- 
-===== Unaffected PHP Functionality ===== 
-List existing areas/features of PHP that will not be changed by the RFC. 
- 
-This helps avoid any ambiguity, shows that you have thought deeply about the RFC's impact, and helps reduces mail list noise. 
  
 ===== Future Scope ===== ===== Future Scope =====
Line 110: Line 121:
 Include these so readers know where you are heading and can discuss the proposed voting options. Include these so readers know where you are heading and can discuss the proposed voting options.
  
-===== Patches and Tests ===== +===== Voting =====
-Links to any external patches and tests go here.+
  
-If there is no patchmake it clear who will create a patch, or whether a volunteer to help with implementation is needed. +<doodle title="Multibyte for trim function mb_trimmb_ltrim and mb_rtrim" auth="Yuya Hamada" voteType="single"  closed="true" closeon="2023-11-17T00:00:00Z"> 
- +   * Yes 
-Make it clear if the patch is intended to be the final patch, or is just a prototype. +   * No 
- +</doodle>
-For changes affecting the core language, you should also provide a patch for the language specification.+
  
 ===== Implementation ===== ===== Implementation =====
 https://github.com/php/php-src/pull/12459 https://github.com/php/php-src/pull/12459
- 
-===== References ===== 
-Links to external references, discussions or RFCs 
  
 ===== Rejected Features ===== ===== Rejected Features =====
 Keep this updated with features that were discussed on the mail lists. Keep this updated with features that were discussed on the mail lists.
rfc/mb_trim.1697611818.txt.gz · Last modified: 2023/10/18 06:50 by youkidearitai