rfc:grapheme_str_split

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
rfc:grapheme_str_split [2024/03/04 00:35] – created youkidearitairfc:grapheme_str_split [2024/04/10 19:21] (current) – Implemented grapheme_str_split function youkidearitai
Line 3: Line 3:
   * Date: 2024-03-04   * Date: 2024-03-04
   * Author: Yuya Hamada, youkidearitai@gmail.com   * Author: Yuya Hamada, youkidearitai@gmail.com
-  * Status: Draft+  * Status: Implemented
   * First Published at: http://wiki.php.net/rfc/grapheme_str_split   * First Published at: http://wiki.php.net/rfc/grapheme_str_split
  
 ===== Introduction ===== ===== Introduction =====
-I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function.+I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function using [[https://unicode-org.github.io/icu/userguide/icu4c/|ICU]]. Creating this function in the Intl extension would provide stronger support for grapheme clusters. This feature will allow to correctly handle emoji and Variation Selectors.
  
-This feature will allow to correctly handle emoji and Variation Selectors.+grapheme_str_split function is correctly support for grapheme cluster.
  
 +<code>
 +$ sapi/cli/php -r 'var_dump(grapheme_str_split("🙇‍♂️"));'
 +array(1) {
 +  [0]=>
 +  string(13) "🙇‍♂️"
 +}
 +</code>
  
 +For example, compare to mb_str_split function, mb_str_split function is str_split for Unicode codepoint. (Of course, sometimes this is more convenient.)
 +
 +<code>
 +$ sapi/cli/php -r 'var_dump(mb_str_split("🙇‍♂️"));'
 +array(4) {
 +  [0]=>
 +  string(4) "🙇"
 +  [1]=>
 +  string(3) "‍" // U+200D Zero Width Joinner
 +  [2]=>
 +  string(3) "♂"
 +  [3]=>
 +  string(3) "️" // U+FE0F VARIATION SELECTOR
 +}
 +</code>
 +
 +Until now, PCRE functions were required to support grapheme clusters.
 +
 +<code>
 +$ sapi/cli/php -r  'preg_match_all("/(\X)/u", "🙇‍♂️", $matches, PREG_OFFSET_CAPTURE); var_dump($matches[1]);'
 +array(1) {
 +  [0]=>
 +  array(2) {
 +    [0]=>
 +    string(13) "🙇‍♂️"
 +    [1]=>
 +    int(0)
 +  }
 +}
 +</code>
 +
 +Examples of other languages. Ruby is already support grapheme clusters as [[https://ruby-doc.org/3.2.2/String.html#method-i-grapheme_clusters|String#grapheme_clusters]]
 +
 +<code>
 +s = "\u0061\u0308-pqr-\u0062\u0308-xyz-\u0063\u0308" # => "ä-pqr-b̈-xyz-c̈"
 +s.grapheme_clusters
 +# => ["ä", "-", "p", "q", "r", "-", "b̈", "-", "x", "y", "z", "-", "c̈"]
 +</code>
 +
 +grapheme_str_split support to grapheme clusters (variation selectors).
 +
 +<code>
 +$ sapi/cli/php -r 'var_dump(grapheme_str_split("ä-pqr-b̈-xyz-c̈"));'
 +array(13) {
 +  [0]=>
 +  string(2) "ä"
 +  [1]=>
 +  string(1) "-"
 +  [2]=>
 +  string(1) "p"
 +  [3]=>
 +  string(1) "q"
 +  [4]=>
 +  string(1) "r"
 +  [5]=>
 +  string(1) "-"
 +  [6]=>
 +  string(3) "b̈"
 +  [7]=>
 +  string(1) "-"
 +  [8]=>
 +  string(1) "x"
 +  [9]=>
 +  string(1) "y"
 +  [10]=>
 +  string(1) "z"
 +  [11]=>
 +  string(1) "-"
 +  [12]=>
 +  string(3) "c̈"
 +}
 +</code>
  
 ===== Proposal ===== ===== Proposal =====
Line 17: Line 96:
  
 <code> <code>
-function grapheme_str_split(string $string, int $length = 1): array {}+function grapheme_str_split(string $string, int $length = 1): array|false {}
 </code> </code>
 +
 +$string is only support UTF-8. $length is the length of the grapheme cluster per element of the array.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
Line 25: Line 106:
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
-next PHP 8.x+PHP 8.4
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 54: Line 135:
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-Include these so readers know where you are heading and can discuss the proposed voting options.+<doodle title="Add grapheme cluster for str_split function: grapheme_str_split" auth="youkidearitai" voteType="single" closed="false" closeon="2024-04-10T00:00:00Z"> 
 +   * Yes 
 +   * No 
 +</doodle>
  
  
rfc/grapheme_str_split.1709512523.txt.gz · Last modified: 2024/03/04 00:35 by youkidearitai