rfc:mb_trim
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:mb_trim [2023/10/18 06:16] – modify title youkidearitai | rfc:mb_trim [2024/04/15 08:40] (current) – old revision restored (2023/11/24 06:26) youkidearitai | ||
---|---|---|---|
Line 2: | Line 2: | ||
* Version: 0.1 | * Version: 0.1 | ||
* Date: 2023-10-18 | * Date: 2023-10-18 | ||
- | * Author: Yuya Hamada (youkidearitai), | + | * Author: Yuya Hamada (https:// |
- | * Status: | + | * Status: |
* First Published at: http:// | * First Published at: http:// | ||
- | This is a suggested template for PHP Request for Comments | + | ===== Introduction ===== |
- | Read https:// | + | PHP does not have a multibyte equivalent of the trim function. It is possible to get close enough behavior using preg_replace("/ |
+ | One of use case is "trim Byte Order Mark". I think mb_ltrim would be work: | ||
- | Quoting [[http:// | + | < |
+ | mb_ltrim($string, | ||
+ | </code> | ||
- | > PHP is and should remain: | + | ===== Proposal ===== |
- | > 1) a pragmatic web-focused language | + | Add mb_trim() function: |
- | > 2) a loosely typed language | + | |
- | > 3) a language which caters to the skill-levels and platforms of a wide range of users | + | |
- | Your RFC should move PHP forward following his vision. As [[http:// | + | < |
- | large chunk of our userbase, and not something that could be useful in some | + | function mb_trim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}" |
- | extremely specialized edge cases [...] Make sure you think about the full context, the huge audience out there, the consequences of making the learning curve steeper with | + | </code> |
- | every new feature, and the scope of the goodness that those new features bring." | + | < |
+ | function mb_ltrim(string $string, string $characters = " \f\n\r\t\v\x00\u{00A0}\u{1680}\u{2000}\u{2001}\u{2002}\u{2003}\u{2004}\u{2005}\u{2006}\u{2007}\u{2008}\u{2009}\u{200A}\u{2028}\u{2029}\u{202F}\u{205F}\u{3000}\u{0085}\u{180E}" | ||
+ | </ | ||
+ | < | ||
+ | function mb_rtrim(string $string, string $characters = " | ||
+ | </ | ||
- | ===== Introduction ===== | + | Here' |
- | The elevator pitch for the RFC. The first paragraph | + | |
- | ===== Proposal ===== | + | Same as trim: |
- | All the features and examples of the proposal. | + | < |
+ | U+0020 SPACE (also in Separator category) | ||
+ | U+0009 \t | ||
+ | U+000A \n | ||
+ | U+000B \v | ||
+ | U+000D \r | ||
+ | </ | ||
- | To [[http://news.php.net/php.internals/ | + | not removed in trim(), probably it wasn't common enough, but ok for mb_trim |
- | for inclusion | + | < |
+ | U+000C \f | ||
+ | </ | ||
+ | |||
+ | Removed in trim, but not included in regex \s | ||
+ | < | ||
+ | U+0000 \0 | ||
+ | </ | ||
+ | |||
+ | whole Separator Z category (20 codepoints) covered by regex \s: | ||
+ | < | ||
+ | U+0020 SPACE | ||
+ | U+00A0 NO-BREAK SPACE | ||
+ | U+1680 OGHAM SPACE MARK | ||
+ | U+2000 EN QUAD | ||
+ | U+2001 EM QUAD | ||
+ | U+2002 EN SPACE | ||
+ | U+2003 EM SPACE | ||
+ | U+2004 THREE-PER-EM SPACE | ||
+ | U+2005 FOUR-PER-EM SPACE | ||
+ | U+2006 SIX-PER-EM SPACE | ||
+ | U+2007 FIGURE SPACE | ||
+ | U+2008 PUNCTUATION SPACE | ||
+ | U+2009 THIN SPACE | ||
+ | U+200A HAIR SPACE | ||
+ | U+2028 LINE SEPARATOR | ||
+ | U+2029 PARAGRAPH SEPARATOR | ||
+ | U+202F NARROW NO-BREAK SPACE | ||
+ | U+205F MEDIUM MATHEMATICAL SPACE | ||
+ | U+3000 IDEOGRAPHIC SPACE | ||
+ | </code> | ||
+ | |||
+ | Other symbols (included in regex \s): | ||
+ | < | ||
+ | U+0085 NEXT LINE (NEL) | ||
+ | U+180E MONGOLIAN VOWEL SEPARATOR | ||
+ | </code> | ||
+ | |||
+ | On the other hand, The ".." notation for $characters that was in the trim function was not supported. ex: \u{0000}..\u{FFFF} | ||
+ | Because | ||
+ | |||
+ | * Unicode character is very wide | ||
+ | * Difficult | ||
+ | * Difficult to store in memory | ||
+ | * Mapping with other character codes may be incompatible | ||
+ | * For example, to express Hiragana, UTF-8 uses [あ-ゞ], EUC-JP [あ-ゝゞ], | ||
- | Remember that the RFC contents should be easily reusable in the PHP Documentation. | ||
- | If applicable, you may wish to use the language specification as a reference. | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | What breaks, and what is the justification for it? | + | This could break a function existing in userland with the same name. |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 43: | Line 97: | ||
===== RFC Impact ===== | ===== RFC Impact ===== | ||
==== To SAPIs ==== | ==== To SAPIs ==== | ||
- | Describe | + | To SAPIs |
+ | Will add the aforementioned functions | ||
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
- | mbstring | + | Adds mb_trim(), mb_ltrim() and mb_rtrim() to the mbstring |
==== To Opcache ==== | ==== To Opcache ==== | ||
- | It is necessary to develop RFC's with opcache in mind, since opcache is a core extension distributed with PHP. | + | No effect. |
- | + | ||
- | Please explain how you have verified your RFC's compatibility with opcache. | + | |
==== New Constants ==== | ==== New Constants ==== | ||
- | Describe any new constants | + | No new constants. |
==== php.ini Defaults ==== | ==== php.ini Defaults ==== | ||
- | If there are any php.ini settings | + | No changed |
- | * hardcoded default values | + | |
- | * php.ini-development values | + | |
- | * php.ini-production values | + | |
===== Open Issues ===== | ===== Open Issues ===== | ||
- | Make sure there are no open issues when the vote starts! | ||
https:// | https:// | ||
- | |||
- | ===== Unaffected PHP Functionality ===== | ||
- | List existing areas/ | ||
- | |||
- | This helps avoid any ambiguity, shows that you have thought deeply about the RFC's impact, and helps reduces mail list noise. | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
Line 77: | Line 121: | ||
Include these so readers know where you are heading and can discuss the proposed voting options. | Include these so readers know where you are heading and can discuss the proposed voting options. | ||
- | ===== Patches and Tests ===== | + | ===== Voting |
- | Links to any external patches and tests go here. | + | |
- | If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed. | + | <doodle title=" |
- | + | * Yes | |
- | Make it clear if the patch is intended to be the final patch, or is just a prototype. | + | * No |
- | + | </ | |
- | For changes affecting the core language, you should also provide a patch for the language specification. | + | |
===== Implementation ===== | ===== Implementation ===== | ||
https:// | https:// | ||
- | |||
- | ===== References ===== | ||
- | Links to external references, discussions or RFCs | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
Keep this updated with features that were discussed on the mail lists. | Keep this updated with features that were discussed on the mail lists. |
rfc/mb_trim.1697609805.txt.gz · Last modified: 2023/10/18 06:16 by youkidearitai