A grapheme functions is not locale dependency. This RFC is add locale parameter for grapheme case insensitive functions. That is more enhancements for Unicode can be expected.
By this RFC can cover locale. For example.
var_dump(grapheme_stripos("i", "\u{0130}", 0, "tr_TR")); // Result is 0 var_dump(grapheme_stripos("i", "\u{0130}", 0, "en_US")); // Result is false
Add a $locale parameter and $strength parameter in these functions.
function grapheme_strpos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_stripos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_strrpos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_strripos(string $haystack, string $needle, int $offset = 0, string $locale = ""): int|false {} function grapheme_substr(string $string, int $offset, ?int $length = null, string $locale = ""): string|false {} function grapheme_strstr(string $haystack, string $needle, bool $beforeNeedle = false, string $locale = ""): string|false {} function grapheme_stristr(string $haystack, string $needle, bool $beforeNeedle = false, string $locale = ""): string|false {} function grapheme_levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1, int $deletion_cost = 1, string $locale = ""): int|false {}
Specifying strength can change the match for CJK characters, For example:
$nabe = '邊'; $nabe_E0101 = "邊\u{E0101}"; var_dump(grapheme_levenshtein($nabe, $nabe_E0101)); // result is 0 var_dump(grapheme_levenshtein($nabe, $nabe_E0101, locale: "ja_JP-u-ks-identic")); // result is 1 var_dump(grapheme_strpos($nabe, $nabe_E0101)); // result is 0 var_dump(grapheme_strpos($nabe, $nabe_E0101, locale: "ja_JP-u-ks-identic")); // result is false
If $locale is not valid, returns false and set intl_error_code_set
and intl_error_set_custom_msg
. Therefore, PHP userland can use intl_get_error_code
and intl_get_error_message
for reason.
grapheme_stri* The strength of the function remains unchanged and UCOL_SECONDARY is used.
The reason for removing $strength is to avoid complexity and because it can be specified with $locale.
Nothing if added parameter is default values.
8.5
No effects.
No effects.
No effects.
No effects.
Nothing.
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
I am sorry for stopping vote. I'll fix this RFC.
Keep this updated with features that were discussed on the mail lists.