====== PHP RFC: Multibyte for levenshtein, mb_levenshtein function ====== * Version: 0.1 * Date: 2024-09-25 * Author: Yuya Hamada, youkidearitai@gmail.com * Status: Voting * First Published at: http://wiki.php.net/rfc/mb_levenshtein ===== Introduction ===== Multibyte levenshtein distances have feature requests in the past. Therefore, we would like to create the mb_levenshtein function to implement this. ref: [[https://github.com/php/php-src/issues/10180]] ====== Levenshtein distance difference mb_levenshtein vs grapheme_levenshtein ===== The mb_levenshtein function is the Levenshtein distance in code points. This is useful for comparing Unicode code points. For example, this can be used to compare concatenated characters. var_dump(mb_levenshtein("\u{0065}\u{0301}", "\u{00e9}")); // "é" result is 1. Surely, There are times when I want to consider this to be the same. In that case, I will propose grapheme_levenshtein separately. ===== Proposal ===== Add mb_levenshtein function. function mb_levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1, int $deletion_cost = 1, ?string $encoding = null): int {} ===== Backward Incompatible Changes ===== This could break a function existing in userland with the same name. ===== Proposed PHP Version(s) ===== PHP 8.5 ===== RFC Impact ===== ==== To SAPIs ==== To SAPIs Will add the aforementioned functions to all PHP environments. ==== To Existing Extensions ==== Adds mb_levenshtein() to the mbstring extension. ==== To Opcache ==== No effect. ==== New Constants ==== No new constants. ==== php.ini Defaults ==== No changed php.ini settings. ===== Open Issues ===== https://github.com/php/php-src/issues/10180 ===== Future Scope ===== This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC. ===== Proposed Voting Choices ===== Include these so readers know where you are heading and can discuss the proposed voting options. ===== Voting ===== * Yes * No ===== Implementation ===== https://github.com/php/php-src/pull/16043 ===== References ===== Userland implementation is here: * https://github.com/KEINOS/mb_levenshtein ===== Rejected Features ===== Keep this updated with features that were discussed on the mail lists.