rfc:mb_ucfirst

PHP RFC: Multibyte for ucfirst, lcfirst functions, mb_ucfirst mb_lcfirst

Introduction

PHP does not have a multibyte equivalent of ucfirst, lcfirst functions. It is possible to get close enough behavior below:

function mb_ucfirst(string $str, ?string $encoding = null): string
{
    return mb_convert_case(mb_substr($str, 0, 1, $encoding), MB_CASE_TITLE, $encoding) . mb_substr($str, 1, null, $encoding);
}
function mb_lcfirst(string $str, ?string $encoding = null): string
{
    return mb_strtolower(mb_substr($str, 0, 1, $encoding), $encoding) . mb_substr($str, 1, null, $encoding);
}

However adding a pre-built functions to do this will implobe the readability and clarify of PHP code. And it will standardize how it is done as it can be tricky.

Proposal

Add mb_ucfirst function, mb_lcfirst function.

function mb_ucfirst(string $string, ?string $encoding = null): string

The first character in mb_ucfirst uses Unicode title case.

function mb_lcfirst(string $string, ?string $encoding = null): string

From what I've researched with Unicode, it may not behave as expected in some languages. In that case, please deal with it in userland.

For example, In Vietnamese, the first letter is not always capitalized.

  • ngày Quốc khánh 2-9 (September 2nd National Day)
  • tiếng Nhật (Japanese)

Another example, In Georgian should uses title case.

  • mb_strtoupper(“აბგ”) (ani bani gani, U+10D0 U+10D1 U+10D2) -> ᲐᲑᲒ(U+1C90 U+1C91 U+1C92)
  • mb_strtoupper(“lj”)(U+01C9) -> “LJ” (U+01C7)

Correct case.

  • mb_ucfirst(“აბგ”) -> “აბგ” (U+10D0 U+10D1 U+10D2)
  • mb_ucfirst(“lj”) -> “Lj” (U+01C9 -> U+01C8)

Backward Incompatible Changes

This could break a function existing in userland with the same name.

Proposed PHP Version(s)

next PHP 8.x

RFC Impact

To SAPIs

To SAPIs Will add the aforementioned functions to all PHP environments.

To Existing Extensions

Adds mb_ucfirst(), mb_lcfirst() to the mbstring extension.

To Opcache

No effect.

New Constants

No new constants.

php.ini Defaults

No changed php.ini settings.

Open Issues

Future Scope

This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Voting

Add mb_ucfirst and mb_lcfirst functions
Real name Yes No
ashnazg (ashnazg)  
beberlei (beberlei)  
bwoebi (bwoebi)  
crell (crell)  
dams (dams)  
devnexen (devnexen)  
galvao (galvao)  
kguest (kguest)  
kocsismate (kocsismate)  
nielsdos (nielsdos)  
saki (saki)  
sergey (sergey)  
thekid (thekid)  
tstarling (tstarling)  
weierophinney (weierophinney)  
Final result: 15 0
This poll has been closed.

Implementation

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/mb_ucfirst.txt · Last modified: 2024/03/20 21:13 by youkidearitai