rfc:grapheme_str_split

This is an old revision of the document!


PHP RFC: Grapheme cluster for str_split function: grapheme_str_split

Introduction

I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function.

This feature will allow to correctly handle emoji and Variation Selectors.

For example, compare to mb_str_split function. mb_str_split function is str_split for Unicode codepoint.(Of course, sometimes this is more convenient.)

$ sapi/cli/php -r 'var_dump(mb_str_split("๐Ÿ™‡โ€โ™‚๏ธ"));'
array(4) {
  [0]=>
  string(4) "๐Ÿ™‡"
  [1]=>
  string(3) "โ€"
  [2]=>
  string(3) "โ™‚"
  [3]=>
  string(3) "๏ธ"
}

grapheme_str_split function is correctly support for grapheme cluster.

$ sapi/cli/php -r 'var_dump(grapheme_str_split("๐Ÿ™‡โ€โ™‚๏ธ"));'
array(1) {
  [0]=>
  string(13) "๐Ÿ™‡โ€โ™‚๏ธ"
}

Proposal

Add grapheme_str_split function.

function grapheme_str_split(string $string, int $length = 1): array {}

Backward Incompatible Changes

This could break a function existing in userland with the same name.

Proposed PHP Version(s)

next PHP 8.x

RFC Impact

To SAPIs

To SAPIs Will add the aforementioned functions to all PHP environments.

To Existing Extensions

Add grapheme_str_split() to the intl extension.

To Opcache

No effect.

New Constants

No new constants.

php.ini Defaults

No changed php.ini settings.

Open Issues

No issues

Future Scope

This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Proposed Voting Choices

Include these so readers know where you are heading and can discuss the proposed voting options.

Implementation

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/grapheme_str_split.1709513593.txt.gz ยท Last modified: 2024/03/04 00:53 by youkidearitai

๏ปฟ