====== PHP RFC: Add locale for case insensitive grapheme functions ====== * Version: 2.0 * Date: 2025-06-07 * Author: Yuya Hamada, youkidearitai@gmail.com * Status: Under Discussion * First Published at: https://wiki.php.net/rfc/grapheme_add_locale_for_case_insensitive ===== Introduction ===== A grapheme functions is not locale dependency. This RFC is add locale parameter for grapheme case insensitive functions. That is more enhancements for Unicode can be expected. * grapheme_strpos * grapheme_stripos * grapheme_strrpos * grapheme_strripos * grapheme_strstr * grapheme_stristr * grapheme_levenshtein By this RFC can cover locale. For example. var_dump(grapheme_stripos("i", "\u{0130}", 0, "tr_TR")); // Result is 0 var_dump(grapheme_stripos("i", "\u{0130}", 0, "en_US")); // Result is false ===== Proposal ===== Add a $locale parameter and $strength parameter in these functions. function grapheme_strpos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_stripos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_strrpos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_strripos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_substr(string $string, int $offset, ?int $length = null, string $locale = ""): string|false {} function grapheme_strstr(string $haystack, string $needle, bool $beforeNeedle = false, string $locale = ""): string|false {} function grapheme_stristr(string $haystack, string $needle, bool $beforeNeedle = false, string $locale = ""): string|false {} function grapheme_levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1, int $deletion_cost = 1, string $locale = ""): int|false {} Specifying strength can change the match for CJK characters, For example: $nabe = '邊'; $nabe_E0101 = "邊\u{E0101}"; var_dump(grapheme_levenshtein($nabe, $nabe_E0101)); // result is 0 var_dump(grapheme_levenshtein($nabe, $nabe_E0101, locale: "ja_JP-u-ks-identic")); // result is 1 var_dump(grapheme_strpos($nabe, $nabe_E0101)); // result is 0 var_dump(grapheme_strpos($nabe, $nabe_E0101, locale: "ja_JP-u-ks-identic")); // result is false If $locale is not valid, returns false and set ''intl_error_code_set'' and ''intl_error_set_custom_msg''. Therefore, PHP userland can use ''intl_get_error_code'' and ''intl_get_error_message'' for reason. grapheme_stri* The strength of the function remains unchanged and UCOL_SECONDARY is used. The reason for removing $strength is to avoid complexity and because it can be specified with $locale. ===== Backward Incompatible Changes ===== Nothing if added parameter is default values. ===== Proposed PHP Version(s) ===== 8.5 ===== RFC Impact ===== ==== To SAPIs ==== No effects. ==== To Existing Extensions ==== No effects. ==== To Opcache ==== No effects. ==== New Constants ==== No effects. ===== Open Issues ===== Nothing. ===== Future Scope ===== This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC. ===== Proposed Voting Choices ===== I am sorry for stopping vote. I'll fix this RFC. ===== Patches and Tests ===== https://github.com/php/php-src/pull/18792 ===== Implementation ===== https://github.com/php/php-src/pull/18792 ===== References ===== https://unicode-org.github.io/icu/userguide/transforms/casemappings.html#full-language-specific-case-mapping ===== Rejected Features ===== Keep this updated with features that were discussed on the mail lists.