This is an old revision of the document!
PHP RFC: Grapheme cluster for str_split function: grapheme_str_split
- Version: 0.1
- Date: 2024-03-04
- Author: Yuya Hamada, youkidearitai@gmail.com
- Status: Draft
- First Published at: http://wiki.php.net/rfc/grapheme_str_split
Introduction
I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function using ICU. Creating this function in the Intl extension would provide stronger support for grapheme clusters.
This feature will allow to correctly handle emoji and Variation Selectors.
grapheme_str_split function is correctly support for grapheme cluster.
$ sapi/cli/php -r 'var_dump(grapheme_str_split("๐โโ๏ธ"));' array(1) { [0]=> string(13) "๐โโ๏ธ" }
For example, compare to mb_str_split function, mb_str_split function is str_split for Unicode codepoint. (Of course, sometimes this is more convenient.)
$ sapi/cli/php -r 'var_dump(mb_str_split("๐โโ๏ธ"));' array(4) { [0]=> string(4) "๐" [1]=> string(3) "โ" // U+200D [2]=> string(3) "โ" [3]=> string(3) "๏ธ" // U+FE0F }
Until now, PCRE functions were required to support grapheme clusters.
$ sapi/cli/php -r 'preg_match_all("/(\X)/u", "๐โโ๏ธ", $matches, PREG_OFFSET_CAPTURE); var_dump($matches[1]);' array(1) { [0]=> array(2) { [0]=> string(13) "๐โโ๏ธ" [1]=> int(0) } }
Proposal
Add grapheme_str_split function.
function grapheme_str_split(string $string, int $length = 1): array {}
$string is only support UTF-8. $length is the length of the grapheme cluster per element of the array.
Backward Incompatible Changes
This could break a function existing in userland with the same name.
Proposed PHP Version(s)
next PHP 8.x
RFC Impact
To SAPIs
To SAPIs Will add the aforementioned functions to all PHP environments.
To Existing Extensions
Add grapheme_str_split() to the intl extension.
To Opcache
No effect.
New Constants
No new constants.
php.ini Defaults
No changed php.ini settings.
Open Issues
No issues
Future Scope
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
Proposed Voting Choices
Include these so readers know where you are heading and can discuss the proposed voting options.
Implementation
Rejected Features
Keep this updated with features that were discussed on the mail lists.