Multibyte levenshtein distances have feature requests in the past. Therefore, we would like to create the mb_levenshtein function to implement this.
The mb_levenshtein function is the Levenshtein distance in code points. This is useful for comparing Unicode code points. For example, this can be used to compare concatenated characters.
var_dump(mb_levenshtein("\u{0065}\u{0301}", "\u{00e9}")); // "é" result is 1.
Surely, There are times when I want to consider this to be the same. In that case, I will propose grapheme_levenshtein separately.
Add mb_levenshtein function.
function mb_levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1, int $deletion_cost = 1, ?string $encoding = null): int {}
This could break a function existing in userland with the same name.
PHP 8.5
To SAPIs Will add the aforementioned functions to all PHP environments.
Adds mb_levenshtein() to the mbstring extension.
No effect.
No new constants.
No changed php.ini settings.
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
Include these so readers know where you are heading and can discuss the proposed voting options.
Userland implementation is here:
Keep this updated with features that were discussed on the mail lists.