rfc:grapheme_str_split

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:grapheme_str_split [2024/03/04 00:53] – fix nit youkidearitairfc:grapheme_str_split [2024/04/10 19:21] (current) – Implemented grapheme_str_split function youkidearitai
Line 3: Line 3:
   * Date: 2024-03-04   * Date: 2024-03-04
   * Author: Yuya Hamada, youkidearitai@gmail.com   * Author: Yuya Hamada, youkidearitai@gmail.com
-  * Status: Draft+  * Status: Implemented
   * First Published at: http://wiki.php.net/rfc/grapheme_str_split   * First Published at: http://wiki.php.net/rfc/grapheme_str_split
  
 ===== Introduction ===== ===== Introduction =====
-I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function.+I noticed PHP does not have a grapheme cluster based str_split function. So I think need str_split for grapheme cluster, grapheme_str_split function using [[https://unicode-org.github.io/icu/userguide/icu4c/|ICU]]. Creating this function in the Intl extension would provide stronger support for grapheme clusters. This feature will allow to correctly handle emoji and Variation Selectors.
  
-This feature will allow to correctly handle emoji and Variation Selectors.+grapheme_str_split function is correctly support for grapheme cluster. 
 + 
 +<code> 
 +$ sapi/cli/php -r 'var_dump(grapheme_str_split("🙇‍♂️"));' 
 +array(1) { 
 +  [0]=> 
 +  string(13) "🙇‍♂️" 
 +
 +</code>
  
-For example, compare to mb_str_split functionmb_str_split function is str_split for Unicode codepoint.(Of course, sometimes this is more convenient.)+For example, compare to mb_str_split functionmb_str_split function is str_split for Unicode codepoint. (Of course, sometimes this is more convenient.)
  
 <code> <code>
Line 19: Line 27:
   string(4) "🙇"   string(4) "🙇"
   [1]=>   [1]=>
-  string(3) "‍"+  string(3) "‍" // U+200D Zero Width Joinner
   [2]=>   [2]=>
   string(3) "♂"   string(3) "♂"
   [3]=>   [3]=>
-  string(3) "️"+  string(3) "️" // U+FE0F VARIATION SELECTOR
 } }
 </code> </code>
  
-grapheme_str_split function is correctly support for grapheme cluster.+Until now, PCRE functions were required to support grapheme clusters.
  
 <code> <code>
-$ sapi/cli/php -r 'var_dump(grapheme_str_split("🙇‍♂️"));'+$ sapi/cli/php -r  'preg_match_all("/(\X)/u", "🙇‍♂️", $matches, PREG_OFFSET_CAPTURE); var_dump($matches[1]);'
 array(1) { array(1) {
   [0]=>   [0]=>
-  string(13) "🙇‍♂️"+  array(2) { 
 +    [0]=> 
 +    string(13) "🙇‍♂️
 +    [1]=> 
 +    int(0) 
 +  } 
 +
 +</code> 
 + 
 +Examples of other languages. Ruby is already support grapheme clusters as [[https://ruby-doc.org/3.2.2/String.html#method-i-grapheme_clusters|String#grapheme_clusters]] 
 + 
 +<code> 
 +s = "\u0061\u0308-pqr-\u0062\u0308-xyz-\u0063\u0308" # => "ä-pqr-b̈-xyz-c̈" 
 +s.grapheme_clusters 
 +# => ["ä", "-", "p", "q", "r", "-", "b̈", "-", "x", "y", "z", "-", "c̈"
 +</code> 
 + 
 +grapheme_str_split support to grapheme clusters (variation selectors). 
 + 
 +<code> 
 +$ sapi/cli/php -r 'var_dump(grapheme_str_split("ä-pqr-b̈-xyz-c̈"));' 
 +array(13) { 
 +  [0]=> 
 +  string(2) "ä" 
 +  [1]=> 
 +  string(1) "-" 
 +  [2]=> 
 +  string(1) "p" 
 +  [3]=> 
 +  string(1) "q" 
 +  [4]=> 
 +  string(1) "r" 
 +  [5]=> 
 +  string(1) "-" 
 +  [6]=> 
 +  string(3) "b̈" 
 +  [7]=> 
 +  string(1) "-" 
 +  [8]=> 
 +  string(1) "x" 
 +  [9]=> 
 +  string(1) "y" 
 +  [10]=> 
 +  string(1) "z" 
 +  [11]=> 
 +  string(1) "-" 
 +  [12]=> 
 +  string(3) "c̈"
 } }
 </code> </code>
Line 41: Line 96:
  
 <code> <code>
-function grapheme_str_split(string $string, int $length = 1): array {}+function grapheme_str_split(string $string, int $length = 1): array|false {}
 </code> </code>
 +
 +$string is only support UTF-8. $length is the length of the grapheme cluster per element of the array.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
Line 49: Line 106:
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
-next PHP 8.x+PHP 8.4
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 78: Line 135:
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-Include these so readers know where you are heading and can discuss the proposed voting options.+<doodle title="Add grapheme cluster for str_split function: grapheme_str_split" auth="youkidearitai" voteType="single" closed="false" closeon="2024-04-10T00:00:00Z"> 
 +   * Yes 
 +   * No 
 +</doodle>
  
  
rfc/grapheme_str_split.1709513593.txt.gz · Last modified: 2024/03/04 00:53 by youkidearitai