rfc:mb_str_pad
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:mb_str_pad [2023/05/19 17:45] – wording nielsdos | rfc:mb_str_pad [2023/11/13 19:55] (current) – link to docs nielsdos | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: mb_str_pad ====== | ====== PHP RFC: mb_str_pad ====== | ||
- | * Version: 0.1 | + | * Version: 0.1.2 |
* Date: 2023-05-19 | * Date: 2023-05-19 | ||
* Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | * Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | ||
- | * Status: | + | * Status: |
+ | * Target Version: PHP 8.3 | ||
+ | * Implementation: | ||
* First Published at: http:// | * First Published at: http:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Many string functions | + | In PHP, various |
===== Proposal ===== | ===== Proposal ===== | ||
- | All the features | + | This proposal aims to introduce a new mbstring function mb_str_pad(). Both the input string |
- | To [[http:// | + | < |
- | for inclusion in one of the world' | + | function mb_str_pad(string $string, int $length, string $pad_string = " ", int $pad_type = STR_PAD_RIGHT, |
+ | </ | ||
- | Remember that the RFC contents should be easily reusable in the PHP Documentation. | + | This proposal defines character as code point, which is how the other mbstring functions define characters as well. |
- | If applicable, you may wish to use the language specification | + | ==== Error conditions ==== |
+ | |||
+ | mb_str_pad() has the same error conditions as str_pad(): | ||
+ | * $pad must not be an empty string. Otherwise it will result in a value error. | ||
+ | * $pad_type must be one of STR_PAD_LEFT, STR_PAD_RIGHT, | ||
+ | |||
+ | There is one additional error condition that str_pad() doesn' | ||
+ | * $encoding must be a valid and supported character encoding, if provided. | ||
+ | |||
+ | ==== Examples and Comparison Against str_pad() ==== | ||
+ | |||
+ | This section shows some examples and comparisons between str_pad() and mb_str_pad() output for multibyte strings. | ||
+ | str_pad() has trouble with special characters or letters used in some languages because those are encoded in multiple bytes. The first example demonstrates this by using the word " | ||
+ | |||
+ | <code php> | ||
+ | // This will pad such that the string will become 10 bytes long. | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | |||
+ | // This will pad such that the string will become 10 characters long, and in this case 11 bytes. | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | </ | ||
+ | |||
+ | The problems with str_pad() become even more prominent for languages which use a non-latin alphabet (like Greek for example). | ||
+ | |||
+ | <code php> | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | |||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | </ | ||
+ | |||
+ | We can also use emojis and symbols, which may have some uses in CLI applications. This is an example from the original feature request report. | ||
+ | |||
+ | <code php> | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | |||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | </ | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | Since this is a new function, and no existing functions change, | + | Since this is a new function and no existing functions change, |
- | TODO | + | |
+ | I did a quick search using GitHub' | ||
+ | |||
+ | Looking at the function / method // | ||
+ | * 47 in classes | ||
+ | * 12 free functions, checked if PHP doesn' | ||
+ | * 42 free functions, | ||
+ | |||
+ | This means that for 42 implementations, | ||
+ | |||
+ | Let's also take a look at correctness: | ||
+ | * 36 likely correct implementations. I did not test or read them thoroughly, I just ran some inputs through them automatically. | ||
+ | * 65 implementations which break if the padding string is a multibyte string. Almost all these implementations are very similar to each other. | ||
+ | |||
+ | As we can see it appears to be a function that's a little tricky to implement correctly. | ||
+ | Note that these results don't include numbers for inline implementations or for implementations under a different name. Hence the reported numbers are quite low. It is very likely more implementations exist under different names, but that doesn' | ||
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 31: | Line 97: | ||
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
- | mbstring: A new function mb_str_pad() will be added to mbstring. The implementation of this function will leverage the existing internal functions of mbstring. No modifications will be made to any existing functions, and no new internal functions will be added. | + | mbstring: A new function mb_str_pad() will be added to mbstring. The implementation of this function will leverage the existing internal functions of mbstring. No modifications will be made to any existing functions, and no new internal functions will be added. By reusing existing internal functions, the maintenance burden of mb_str_pad() stays quite low. |
==== To Opcache ==== | ==== To Opcache ==== | ||
Line 49: | Line 115: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | None. | + | In the future we could add a string padding function that works on grapheme clusters instead of code points: grapheme_str_pad(). This should be added to ext/intl. This will of course require another RFC. |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | One primary yes/no vote to decide if the function may be introduced. | + | One primary yes/no vote to decide if the function may be introduced, requires 2/3 majority. |
+ | |||
+ | Voting starts on 2023-06-05 20:00 GMT+2, and ends on 2023-06-19 20:00 GMT+2. | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 59: | Line 132: | ||
===== Implementation ===== | ===== Implementation ===== | ||
After the project is implemented, | After the project is implemented, | ||
- | - the version(s) it was merged into | + | - the version(s) it was merged into: PHP 8.3 |
- | - a link to the git commit(s) | + | - a link to the git commit(s): https:// |
- | - a link to the PHP manual entry for the feature | + | - a link to the PHP manual entry for the feature: https:// |
- | - a link to the language specification section (if any) | + | - a link to the language specification section (if any): N/A |
===== References ===== | ===== References ===== | ||
Line 69: | Line 142: | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
Keep this updated with features that were discussed on the mail lists. | Keep this updated with features that were discussed on the mail lists. | ||
+ | |||
+ | ===== Changelog ===== | ||
+ | |||
+ | * 0.1.2: Clarify that we use the mbstring definition of character (i.e. code point) instead of grapheme cluster. | ||
+ | * 0.1.1: Initial version placed under discussion |
rfc/mb_str_pad.1684518321.txt.gz · Last modified: 2023/05/19 17:45 by nielsdos