rfc:mb_str_pad
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
rfc:mb_str_pad [2023/05/19 17:41] – created nielsdos | rfc:mb_str_pad [2023/11/13 19:55] (current) – link to docs nielsdos | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: mb_str_pad ====== | ====== PHP RFC: mb_str_pad ====== | ||
- | * Version: 0.1 | + | * Version: 0.1.2 |
* Date: 2023-05-19 | * Date: 2023-05-19 | ||
* Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | * Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | ||
- | * Status: | + | * Status: |
+ | * Target Version: PHP 8.3 | ||
+ | * Implementation: | ||
* First Published at: http:// | * First Published at: http:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Many string functions | + | In PHP, various |
===== Proposal ===== | ===== Proposal ===== | ||
- | All the features | + | This proposal aims to introduce a new mbstring function mb_str_pad(). Both the input string |
- | To [[http:// | + | < |
- | for inclusion in one of the world' | + | function mb_str_pad(string $string, int $length, string $pad_string = " ", int $pad_type = STR_PAD_RIGHT, |
+ | </ | ||
- | Remember that the RFC contents should be easily reusable in the PHP Documentation. | + | This proposal defines character as code point, which is how the other mbstring functions define characters as well. |
- | If applicable, you may wish to use the language specification | + | ==== Error conditions ==== |
+ | |||
+ | mb_str_pad() has the same error conditions as str_pad(): | ||
+ | * $pad must not be an empty string. Otherwise it will result in a value error. | ||
+ | * $pad_type must be one of STR_PAD_LEFT, STR_PAD_RIGHT, | ||
+ | |||
+ | There is one additional error condition that str_pad() doesn' | ||
+ | * $encoding must be a valid and supported character encoding, if provided. | ||
+ | |||
+ | ==== Examples and Comparison Against str_pad() ==== | ||
+ | |||
+ | This section shows some examples and comparisons between str_pad() and mb_str_pad() output for multibyte strings. | ||
+ | str_pad() has trouble with special characters or letters used in some languages because those are encoded in multiple bytes. The first example demonstrates this by using the word " | ||
+ | |||
+ | <code php> | ||
+ | // This will pad such that the string will become 10 bytes long. | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | |||
+ | // This will pad such that the string will become 10 characters long, and in this case 11 bytes. | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | </ | ||
+ | |||
+ | The problems with str_pad() become even more prominent for languages which use a non-latin alphabet (like Greek for example). | ||
+ | |||
+ | <code php> | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | |||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | </ | ||
+ | |||
+ | We can also use emojis and symbols, which may have some uses in CLI applications. This is an example from the original feature request report. | ||
+ | |||
+ | <code php> | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | var_dump(str_pad(' | ||
+ | |||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | var_dump(mb_str_pad(' | ||
+ | </ | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | None. | + | Since this is a new function and no existing functions change, there is no behavioural backwards incompatibility. The only backwards compatible break occurs when a userland PHP project declares their own mb_str_pad() function without first checking if PHP doesn' |
+ | |||
+ | I did a quick search using GitHub' | ||
+ | |||
+ | Looking at the function / method // | ||
+ | * 47 in classes | ||
+ | * 12 free functions, checked if PHP doesn' | ||
+ | * 42 free functions, not checked (correctly) | ||
+ | |||
+ | This means that for 42 implementations, | ||
+ | |||
+ | Let's also take a look at correctness: | ||
+ | * 36 likely correct implementations. I did not test or read them thoroughly, I just ran some inputs through them automatically. | ||
+ | * 65 implementations which break if the padding string is a multibyte string. Almost all these implementations are very similar to each other. | ||
+ | |||
+ | As we can see it appears to be a function that's a little tricky to implement correctly. | ||
+ | Note that these results don't include numbers for inline implementations or for implementations under a different name. Hence the reported numbers are quite low. It is very likely more implementations exist under different names, but that doesn' | ||
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 30: | Line 97: | ||
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
- | mbstring: A new function mb_str_pad() will be added to mbstring. The implementation | + | mbstring: A new function mb_str_pad() will be added to mbstring. The implementation |
==== To Opcache ==== | ==== To Opcache ==== | ||
Line 48: | Line 115: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | None. | + | In the future we could add a string padding function that works on grapheme clusters instead of code points: grapheme_str_pad(). This should be added to ext/intl. This will of course require another RFC. |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | One primary yes/no vote to decide if the function may be introduced. | + | One primary yes/no vote to decide if the function may be introduced, requires 2/3 majority. |
+ | |||
+ | Voting starts on 2023-06-05 20:00 GMT+2, and ends on 2023-06-19 20:00 GMT+2. | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 58: | Line 132: | ||
===== Implementation ===== | ===== Implementation ===== | ||
After the project is implemented, | After the project is implemented, | ||
- | - the version(s) it was merged into | + | - the version(s) it was merged into: PHP 8.3 |
- | - a link to the git commit(s) | + | - a link to the git commit(s): https:// |
- | - a link to the PHP manual entry for the feature | + | - a link to the PHP manual entry for the feature: https:// |
- | - a link to the language specification section (if any) | + | - a link to the language specification section (if any): N/A |
===== References ===== | ===== References ===== | ||
Line 68: | Line 142: | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
Keep this updated with features that were discussed on the mail lists. | Keep this updated with features that were discussed on the mail lists. | ||
+ | |||
+ | ===== Changelog ===== | ||
+ | |||
+ | * 0.1.2: Clarify that we use the mbstring definition of character (i.e. code point) instead of grapheme cluster. | ||
+ | * 0.1.1: Initial version placed under discussion |
rfc/mb_str_pad.1684518114.txt.gz · Last modified: 2023/05/19 17:41 by nielsdos