rfc:grapheme_str_split

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:grapheme_str_split [2024/03/04 05:46] – fix nit youkidearitairfc:grapheme_str_split [2024/04/10 19:21] (current) – Implemented grapheme_str_split function youkidearitai
Line 3: Line 3:
   * Date: 2024-03-04   * Date: 2024-03-04
   * Author: Yuya Hamada, youkidearitai@gmail.com   * Author: Yuya Hamada, youkidearitai@gmail.com
-  * Status: Draft+  * Status: Implemented
   * First Published at: http://wiki.php.net/rfc/grapheme_str_split   * First Published at: http://wiki.php.net/rfc/grapheme_str_split
  
 ===== Introduction ===== ===== Introduction =====
-I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function using ICU. Creating this function in the Intl extension would provide stronger support for grapheme clusters. This feature will allow to correctly handle emoji and Variation Selectors.+I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function using [[https://unicode-org.github.io/icu/userguide/icu4c/|ICU]]. Creating this function in the Intl extension would provide stronger support for grapheme clusters. This feature will allow to correctly handle emoji and Variation Selectors.
  
 grapheme_str_split function is correctly support for grapheme cluster. grapheme_str_split function is correctly support for grapheme cluster.
Line 27: Line 27:
   string(4) "🙇"   string(4) "🙇"
   [1]=>   [1]=>
-  string(3) "‍" // U+200D+  string(3) "‍" // U+200D Zero Width Joinner
   [2]=>   [2]=>
   string(3) "♂"   string(3) "♂"
   [3]=>   [3]=>
-  string(3) "️" // U+FE0F+  string(3) "️" // U+FE0F VARIATION SELECTOR
 } }
 </code> </code>
Line 47: Line 47:
     int(0)     int(0)
   }   }
 +}
 +</code>
 +
 +Examples of other languages. Ruby is already support grapheme clusters as [[https://ruby-doc.org/3.2.2/String.html#method-i-grapheme_clusters|String#grapheme_clusters]]
 +
 +<code>
 +s = "\u0061\u0308-pqr-\u0062\u0308-xyz-\u0063\u0308" # => "ä-pqr-b̈-xyz-c̈"
 +s.grapheme_clusters
 +# => ["ä", "-", "p", "q", "r", "-", "b̈", "-", "x", "y", "z", "-", "c̈"]
 +</code>
 +
 +grapheme_str_split support to grapheme clusters (variation selectors).
 +
 +<code>
 +$ sapi/cli/php -r 'var_dump(grapheme_str_split("ä-pqr-b̈-xyz-c̈"));'
 +array(13) {
 +  [0]=>
 +  string(2) "ä"
 +  [1]=>
 +  string(1) "-"
 +  [2]=>
 +  string(1) "p"
 +  [3]=>
 +  string(1) "q"
 +  [4]=>
 +  string(1) "r"
 +  [5]=>
 +  string(1) "-"
 +  [6]=>
 +  string(3) "b̈"
 +  [7]=>
 +  string(1) "-"
 +  [8]=>
 +  string(1) "x"
 +  [9]=>
 +  string(1) "y"
 +  [10]=>
 +  string(1) "z"
 +  [11]=>
 +  string(1) "-"
 +  [12]=>
 +  string(3) "c̈"
 } }
 </code> </code>
Line 54: Line 96:
  
 <code> <code>
-function grapheme_str_split(string $string, int $length = 1): array {}+function grapheme_str_split(string $string, int $length = 1): array|false {}
 </code> </code>
  
-$string is only support UTF-8. +$string is only support UTF-8. $length is the length of the grapheme cluster per element of the array.
-$length is the length of the grapheme cluster per element of the array.+
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
Line 65: Line 106:
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
-next PHP 8.x+PHP 8.4
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 94: Line 135:
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-Include these so readers know where you are heading and can discuss the proposed voting options.+<doodle title="Add grapheme cluster for str_split function: grapheme_str_split" auth="youkidearitai" voteType="single" closed="false" closeon="2024-04-10T00:00:00Z"> 
 +   * Yes 
 +   * No 
 +</doodle>
  
  
rfc/grapheme_str_split.1709531211.txt.gz · Last modified: 2024/03/04 05:46 by youkidearitai