rfc:mb_trim
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:mb_trim [2023/10/18 06:50] – Add from 8ctopus youkidearitai | rfc:mb_trim [2024/04/15 08:40] (current) – old revision restored (2023/11/24 06:26) youkidearitai | ||
---|---|---|---|
Line 2: | Line 2: | ||
* Version: 0.1 | * Version: 0.1 | ||
* Date: 2023-10-18 | * Date: 2023-10-18 | ||
- | * Author: Yuya Hamada (youkidearitai), | + | * Author: Yuya Hamada (https:// |
- | * Status: | + | * Status: |
* First Published at: http:// | * First Published at: http:// | ||
- | This is a suggested template for PHP Request for Comments | + | ===== Introduction ===== |
- | Read https:// | + | PHP does not have a multibyte equivalent of the trim function. It is possible to get close enough behavior using preg_replace("/ |
+ | One of use case is "trim Byte Order Mark". I think mb_ltrim would be work: | ||
- | Quoting [[http:// | + | <code> |
- | + | mb_ltrim($string, "\u{FEFF}\u{FFFE}"); | |
- | > PHP is and should remain: | + | </code> |
- | > 1) a pragmatic web-focused language | + | |
- | > 2) a loosely typed language | + | |
- | > 3) a language which caters to the skill-levels and platforms of a wide range of users | + | |
- | + | ||
- | Your RFC should move PHP forward following his vision. As [[http:// | + | |
- | large chunk of our userbase, and not something that could be useful in some | + | |
- | extremely specialized edge cases [...] Make sure you think about the full context, the huge audience out there, the consequences of making the learning curve steeper with | + | |
- | every new feature, and the scope of the goodness that those new features bring." | + | |
- | + | ||
- | ===== Introduction ===== | + | |
- | PHP does not have a multibyte equivalent of the trim function. It is possible to get close enough behavior using preg_replace("/ | + | |
===== Proposal ===== | ===== Proposal ===== | ||
Add mb_trim() function: | Add mb_trim() function: | ||
+ | < | ||
function mb_trim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}" | function mb_trim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}" | ||
+ | </ | ||
+ | < | ||
function mb_ltrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", | function mb_ltrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", | ||
+ | </ | ||
+ | < | ||
function mb_rtrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", | function mb_rtrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}", | ||
+ | </ | ||
Here's the list of characters trimmed: | Here's the list of characters trimmed: | ||
Same as trim: | Same as trim: | ||
+ | < | ||
U+0020 SPACE (also in Separator category) | U+0020 SPACE (also in Separator category) | ||
U+0009 \t | U+0009 \t | ||
Line 41: | Line 37: | ||
U+000B \v | U+000B \v | ||
U+000D \r | U+000D \r | ||
+ | </ | ||
not removed in trim(), probably it wasn't common enough, but ok for mb_trim | not removed in trim(), probably it wasn't common enough, but ok for mb_trim | ||
+ | < | ||
U+000C \f | U+000C \f | ||
+ | </ | ||
Removed in trim, but not included in regex \s | Removed in trim, but not included in regex \s | ||
+ | < | ||
U+0000 \0 | U+0000 \0 | ||
+ | </ | ||
whole Separator Z category (20 codepoints) covered by regex \s: | whole Separator Z category (20 codepoints) covered by regex \s: | ||
+ | < | ||
U+0020 SPACE | U+0020 SPACE | ||
U+00A0 NO-BREAK SPACE | U+00A0 NO-BREAK SPACE | ||
Line 68: | Line 70: | ||
U+205F MEDIUM MATHEMATICAL SPACE | U+205F MEDIUM MATHEMATICAL SPACE | ||
U+3000 IDEOGRAPHIC SPACE | U+3000 IDEOGRAPHIC SPACE | ||
+ | </ | ||
- | Other symbols: | + | Other symbols |
+ | < | ||
U+0085 NEXT LINE (NEL) | U+0085 NEXT LINE (NEL) | ||
U+180E MONGOLIAN VOWEL SEPARATOR | U+180E MONGOLIAN VOWEL SEPARATOR | ||
+ | </ | ||
+ | |||
+ | On the other hand, The " | ||
+ | Because the reason is below: | ||
+ | |||
+ | * Unicode character is very wide | ||
+ | * Difficult to search | ||
+ | * Difficult to store in memory | ||
+ | * Mapping with other character codes may be incompatible | ||
+ | * For example, to express Hiragana, UTF-8 uses [あ-ゞ], EUC-JP [あ-ゝゞ], | ||
+ | |||
+ | |||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 98: | Line 114: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
https:// | https:// | ||
- | |||
- | ===== Unaffected PHP Functionality ===== | ||
- | List existing areas/ | ||
- | |||
- | This helps avoid any ambiguity, shows that you have thought deeply about the RFC's impact, and helps reduces mail list noise. | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
Line 110: | Line 121: | ||
Include these so readers know where you are heading and can discuss the proposed voting options. | Include these so readers know where you are heading and can discuss the proposed voting options. | ||
- | ===== Patches and Tests ===== | + | ===== Voting |
- | Links to any external patches and tests go here. | + | |
- | If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed. | + | <doodle title=" |
- | + | * Yes | |
- | Make it clear if the patch is intended to be the final patch, or is just a prototype. | + | * No |
- | + | </ | |
- | For changes affecting the core language, you should also provide a patch for the language specification. | + | |
===== Implementation ===== | ===== Implementation ===== | ||
https:// | https:// | ||
- | |||
- | ===== References ===== | ||
- | Links to external references, discussions or RFCs | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
Keep this updated with features that were discussed on the mail lists. | Keep this updated with features that were discussed on the mail lists. |
rfc/mb_trim.1697611818.txt.gz · Last modified: 2023/10/18 06:50 by youkidearitai