I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function using ICU. Creating this function in the Intl extension would provide stronger support for grapheme clusters. This feature will allow to correctly handle emoji and Variation Selectors.
grapheme_str_split function is correctly support for grapheme cluster.
$ sapi/cli/php -r 'var_dump(grapheme_str_split("๐โโ๏ธ"));' array(1) { [0]=> string(13) "๐โโ๏ธ" }
For example, compare to mb_str_split function, mb_str_split function is str_split for Unicode codepoint. (Of course, sometimes this is more convenient.)
$ sapi/cli/php -r 'var_dump(mb_str_split("๐โโ๏ธ"));' array(4) { [0]=> string(4) "๐" [1]=> string(3) "โ" // U+200D Zero Width Joinner [2]=> string(3) "โ" [3]=> string(3) "๏ธ" // U+FE0F VARIATION SELECTOR }
Until now, PCRE functions were required to support grapheme clusters.
$ sapi/cli/php -r 'preg_match_all("/(\X)/u", "๐โโ๏ธ", $matches, PREG_OFFSET_CAPTURE); var_dump($matches[1]);' array(1) { [0]=> array(2) { [0]=> string(13) "๐โโ๏ธ" [1]=> int(0) } }
Examples of other languages. Ruby is already support grapheme clusters as String#grapheme_clusters
s = "\u0061\u0308-pqr-\u0062\u0308-xyz-\u0063\u0308" # => "aฬ-pqr-bฬ-xyz-cฬ" s.grapheme_clusters # => ["aฬ", "-", "p", "q", "r", "-", "bฬ", "-", "x", "y", "z", "-", "cฬ"]
grapheme_str_split support to grapheme clusters (variation selectors).
$ sapi/cli/php -r 'var_dump(grapheme_str_split("รค-pqr-bฬ-xyz-cฬ"));' array(13) { [0]=> string(2) "รค" [1]=> string(1) "-" [2]=> string(1) "p" [3]=> string(1) "q" [4]=> string(1) "r" [5]=> string(1) "-" [6]=> string(3) "bฬ" [7]=> string(1) "-" [8]=> string(1) "x" [9]=> string(1) "y" [10]=> string(1) "z" [11]=> string(1) "-" [12]=> string(3) "cฬ" }
Add grapheme_str_split function.
function grapheme_str_split(string $string, int $length = 1): array|false {}
$string is only support UTF-8. $length is the length of the grapheme cluster per element of the array.
This could break a function existing in userland with the same name.
PHP 8.4
To SAPIs Will add the aforementioned functions to all PHP environments.
Add grapheme_str_split() to the intl extension.
No effect.
No new constants.
No changed php.ini settings.
No issues
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
Keep this updated with features that were discussed on the mail lists.