I noticed there is not exist limit of codepoint that reading Unicode Standard Annex #29(UAX#29). So that means grapheme cluster can crash computer because computer resource is limited but grapheme cluster is not limited.
This proposal is use to safe for grapheme cluster that it is limit codepoint.
<?php $f = "あい👨👨👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦👦うえお"; var_dump(grapheme_limit_codepoints($f)); // returns false because no 3 grapheme cluster is greater than 32 codepoints. $f = "あいうえお👨👨👦"; var_dump(grapheme_limit_codepoints($f)); // returns true $f = "あいうえおH̵̛͕̞̦̰̜͍̰̥̟͆̏͂̌͑ͅä̷͔̟͓̬̯̟͍̭͉͈̮͙̣̯̬͚̞̭̍̀̾͠m̴̡̧̛̝̯̹̗̹̤̲̺̟̥̈̏͊̔̑̍͆̌̀̚͝͝b̴̢̢̫̝̠̗̼̬̻̮̺̭͔̘͑̆̎̚ư̵̧̡̥̙̭̿̈̀̒̐̊͒͑r̷̡̡̲̼̖͎̫̮̜͇̬͌͘g̷̹͍͎̬͕͓͕̐̃̈́̓̆̚͝ẻ̵̡̼̬̥̹͇̭͔̯̉͛̈́̕r̸̮̖̻̮̣̗͚͖̝̂͌̾̓̀̿̔̀͋̈́͌̈́̋͜👨👨👦"; var_dump(grapheme_limit_codepoints($f)); // returns true because zalgo text for Hamburger but lower than 32 codepoints ?>
Check grapheme cluster's codepoints lower than $limit
<?php function grapheme_limit_codepoints(string $string, int $limit = GRAPHEME_LIMIT_CODEPOINTS): bool {} ?>
GRAPHEME_LIMIT_CODEPOINTS is 32, Because based on UAX#15 Stream-safe Text Format. Unicode's official answer is not rely Stream-safe Text Format, But I think make sense to it.
Check the codepoints per grapheme cluster. Then measure grapheme_strlen.
Simple example:
This could break a function existing in userland with the same name.
Next of PHP 8.5 (PHP 8.6 or PHP 9.0)
None
Adds grapheme_limit_codepoints() to the intl extension.
None
None
Please consult the php/policies repository for the current voting guidelines.
Primary Vote requiring a 2/3 majority to accept the RFC:
Keep this updated with features that were discussed on the mail lists.
If there are major changes to the initial proposal, please include a short summary with a date or a link to the mailing list announcement here, as not everyone has access to the wikis' version history.