====== PHP RFC: Grapheme cluster for levenshtein, grapheme_levenshtein function ====== * Version: 0.1 * Date: 2024-10-14 * Author: Yuya Hamada, youkidearitai@gmail.com * Status: Draft * First Published at: http://wiki.php.net/rfc/grapheme_levenshtein ===== Introduction ===== I creating mb_levenshtein [[https://wiki.php.net/rfc/mb_levenshtein]]. However, there was some discussion that the Levenshtein function for each grapheme cluster might be more logical, and I thought so too, so I created a PoC. ref: [[https://github.com/php/php-src/issues/16428]] For example, combined character is works fine. var_dump(grapheme_levenshtein("\u{0065}\u{0301}", "\u{00e9}")); // Result is 0 when use grapheme_levenshtein. mb_levenshtein is not works well. Also, variable selector is works fine. // variable $nabe and $nabe_E0100 is seems nothing different. // However, $nabe_E0100 is variable selector in U+908A U+E0100. // So grapheme_levenshtein result is maybe 0. $nabe = '邊'; $nabe_E0100 = "邊󠄀"; var_dump(grapheme_levenshtein($nabe, $nabe_E0100)); // Result is 0 when use grapheme_levenshtein. mb_levenshtein result is 1 that it's not works fine. ===== Proposal ===== Add grapheme_levenshtein function. function grapheme_levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1, int $deletion_cost = 1): int|false {} $string1 and $string2 is only need UTF-8. Returns false is failed parse to UTF-8. ===== Backward Incompatible Changes ===== This could break a function existing in userland with the same name. ===== Proposed PHP Version(s) ===== PHP 8.5 ===== RFC Impact ===== ==== To SAPIs ==== To SAPIs Will add the aforementioned functions to all PHP environments. ==== To Existing Extensions ==== Adds grapheme_levenshtein() to the intl extension. ==== To Opcache ==== No effect. ==== New Constants ==== No new constants. ==== php.ini Defaults ==== No changed php.ini settings. ===== Open Issues ===== https://github.com/php/php-src/issues/16428 ===== Future Scope ===== This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC. ===== Proposed Voting Choices ===== Include these so readers know where you are heading and can discuss the proposed voting options. ===== Voting ===== TBD. ===== Implementation ===== https://github.com/php/php-src/pull/16043 ===== References ===== Nothing. ===== Rejected Features ===== Keep this updated with features that were discussed on the mail lists.