
This is an old revision of the document!

PHP RFC: Grapheme cluster for str_split function: grapheme_str_split


I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function using ICU. Creating this function in the Intl extension would provide stronger support for grapheme clusters. This feature will allow to correctly handle emoji and Variation Selectors.

grapheme_str_split function is correctly support for grapheme cluster.

$ sapi/cli/php -r 'var_dump(grapheme_str_split("๐Ÿ™‡โ€โ™‚๏ธ"));'
array(1) {
  string(13) "๐Ÿ™‡โ€โ™‚๏ธ"

For example, compare to mb_str_split function, mb_str_split function is str_split for Unicode codepoint. (Of course, sometimes this is more convenient.)

$ sapi/cli/php -r 'var_dump(mb_str_split("๐Ÿ™‡โ€โ™‚๏ธ"));'
array(4) {
  string(4) "๐Ÿ™‡"
  string(3) "โ€" // U+200D
  string(3) "โ™‚"
  string(3) "๏ธ" // U+FE0F

Until now, PCRE functions were required to support grapheme clusters.

$ sapi/cli/php -r  'preg_match_all("/(\X)/u", "๐Ÿ™‡โ€โ™‚๏ธ", $matches, PREG_OFFSET_CAPTURE); var_dump($matches[1]);'
array(1) {
  array(2) {
    string(13) "๐Ÿ™‡โ€โ™‚๏ธ"


Add grapheme_str_split function.

function grapheme_str_split(string $string, int $length = 1): array {}

$string is only support UTF-8. $length is the length of the grapheme cluster per element of the array.

Backward Incompatible Changes

This could break a function existing in userland with the same name.

Proposed PHP Version(s)

next PHP 8.x

RFC Impact


To SAPIs Will add the aforementioned functions to all PHP environments.

To Existing Extensions

Add grapheme_str_split() to the intl extension.

To Opcache

No effect.

New Constants

No new constants.

php.ini Defaults

No changed php.ini settings.

Open Issues

No issues

Future Scope

This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Proposed Voting Choices

Include these so readers know where you are heading and can discuss the proposed voting options.


Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/grapheme_str_split.1709531211.txt.gz ยท Last modified: 2024/03/04 05:46 by youkidearitai
