rfc:mb_str_pad
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:mb_str_pad [2023/05/19 19:00] – wording, proposal, examples nielsdos | rfc:mb_str_pad [2023/11/13 19:55] (current) – link to docs nielsdos | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: mb_str_pad ====== | ====== PHP RFC: mb_str_pad ====== | ||
- | * Version: 0.1 | + | * Version: 0.1.2 |
* Date: 2023-05-19 | * Date: 2023-05-19 | ||
* Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | * Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | ||
- | * Status: | + | * Status: |
+ | * Target Version: PHP 8.3 | ||
+ | * Implementation: | ||
* First Published at: http:// | * First Published at: http:// | ||
Line 10: | Line 12: | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | The proposal | + | This proposal |
<code php> | <code php> | ||
function mb_str_pad(string $string, int $length, string $pad_string = " ", int $pad_type = STR_PAD_RIGHT, | function mb_str_pad(string $string, int $length, string $pad_string = " ", int $pad_type = STR_PAD_RIGHT, | ||
</ | </ | ||
+ | |||
+ | This proposal defines character as code point, which is how the other mbstring functions define characters as well. | ||
==== Error conditions ==== | ==== Error conditions ==== | ||
Line 23: | Line 27: | ||
There is one additional error condition that str_pad() doesn' | There is one additional error condition that str_pad() doesn' | ||
- | * $encoding must be a valid and supported character encoding, if provided. | + | * $encoding must be a valid and supported character encoding, if provided. |
==== Examples and Comparison Against str_pad() ==== | ==== Examples and Comparison Against str_pad() ==== | ||
- | This section shows some examples and comparison | + | This section shows some examples and comparisons |
- | str_pad() has trouble with special characters or letters used in some languages, because those are encoded in multiple bytes. The first example demonstrates this by using the word " | + | str_pad() has trouble with special characters or letters used in some languages because those are encoded in multiple bytes. The first example demonstrates this by using the word " |
<code php> | <code php> | ||
- | var_dump(str_pad(' | + | // This will pad such that the string will become 10 bytes long. |
- | var_dump(str_pad(' | + | var_dump(str_pad(' |
- | var_dump(str_pad(' | + | var_dump(str_pad(' |
+ | var_dump(str_pad(' | ||
- | var_dump(mb_str_pad(' | + | // This will pad such that the string will become 10 characters long, and in this case 11 bytes. |
- | var_dump(mb_str_pad(' | + | var_dump(mb_str_pad(' |
+ | var_dump(mb_str_pad(' | ||
var_dump(mb_str_pad(' | var_dump(mb_str_pad(' | ||
</ | </ | ||
- | The problems with str_pad() become even more prominent for languages which use a non-latin alphabet. | + | The problems with str_pad() become even more prominent for languages which use a non-latin alphabet |
<code php> | <code php> | ||
Line 52: | Line 58: | ||
</ | </ | ||
- | We can also use emojis and symbols, which may be useful for some CLI applications. This is an example from the original | + | We can also use emojis and symbols, which may have some uses in CLI applications. This is an example from the original |
<code php> | <code php> | ||
Line 65: | Line 71: | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | Since this is a new function, and no existing functions change, | + | Since this is a new function and no existing functions change, |
- | TODO | + | |
+ | I did a quick search using GitHub' | ||
+ | |||
+ | Looking at the function / method // | ||
+ | * 47 in classes | ||
+ | * 12 free functions, checked if PHP doesn' | ||
+ | * 42 free functions, | ||
+ | |||
+ | This means that for 42 implementations, | ||
+ | |||
+ | Let's also take a look at correctness: | ||
+ | * 36 likely correct implementations. I did not test or read them thoroughly, I just ran some inputs through them automatically. | ||
+ | * 65 implementations which break if the padding string is a multibyte string. Almost all these implementations are very similar to each other. | ||
+ | |||
+ | As we can see it appears to be a function that's a little tricky to implement correctly. | ||
+ | Note that these results don't include numbers for inline implementations or for implementations under a different name. Hence the reported numbers are quite low. It is very likely more implementations exist under different names, but that doesn' | ||
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 94: | Line 115: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | None. | + | In the future we could add a string padding function that works on grapheme clusters instead of code points: grapheme_str_pad(). This should be added to ext/intl. This will of course require another RFC. |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | One primary yes/no vote to decide if the function may be introduced. | + | One primary yes/no vote to decide if the function may be introduced, requires 2/3 majority. |
+ | |||
+ | Voting starts on 2023-06-05 20:00 GMT+2, and ends on 2023-06-19 20:00 GMT+2. | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 104: | Line 132: | ||
===== Implementation ===== | ===== Implementation ===== | ||
After the project is implemented, | After the project is implemented, | ||
- | - the version(s) it was merged into | + | - the version(s) it was merged into: PHP 8.3 |
- | - a link to the git commit(s) | + | - a link to the git commit(s): https:// |
- | - a link to the PHP manual entry for the feature | + | - a link to the PHP manual entry for the feature: https:// |
- | - a link to the language specification section (if any) | + | - a link to the language specification section (if any): N/A |
===== References ===== | ===== References ===== | ||
Line 114: | Line 142: | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
Keep this updated with features that were discussed on the mail lists. | Keep this updated with features that were discussed on the mail lists. | ||
+ | |||
+ | ===== Changelog ===== | ||
+ | |||
+ | * 0.1.2: Clarify that we use the mbstring definition of character (i.e. code point) instead of grapheme cluster. | ||
+ | * 0.1.1: Initial version placed under discussion |
rfc/mb_str_pad.1684522846.txt.gz · Last modified: 2023/05/19 19:00 by nielsdos