====== PHP RFC: Multibyte for levenshtein, mb_levenshtein function ======
* Version: 0.1
* Date: 2024-09-25
* Author: Yuya Hamada, youkidearitai@gmail.com
* Status: Voting
* First Published at: http://wiki.php.net/rfc/mb_levenshtein
===== Introduction =====
Multibyte levenshtein distances have feature requests in the past. Therefore, we would like to create the mb_levenshtein function to implement this.
ref: [[https://github.com/php/php-src/issues/10180]]
====== Levenshtein distance difference mb_levenshtein vs grapheme_levenshtein =====
The mb_levenshtein function is the Levenshtein distance in code points.
This is useful for comparing Unicode code points. For example, this can be used to compare concatenated characters.
var_dump(mb_levenshtein("\u{0065}\u{0301}", "\u{00e9}")); // "é" result is 1.
Surely, There are times when I want to consider this to be the same. In that case, I will propose grapheme_levenshtein separately.
===== Proposal =====
Add mb_levenshtein function.
function mb_levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1, int $deletion_cost = 1, ?string $encoding = null): int {}
===== Backward Incompatible Changes =====
This could break a function existing in userland with the same name.
===== Proposed PHP Version(s) =====
PHP 8.5
===== RFC Impact =====
==== To SAPIs ====
To SAPIs Will add the aforementioned functions to all PHP environments.
==== To Existing Extensions ====
Adds mb_levenshtein() to the mbstring extension.
==== To Opcache ====
No effect.
==== New Constants ====
No new constants.
==== php.ini Defaults ====
No changed php.ini settings.
===== Open Issues =====
https://github.com/php/php-src/issues/10180
===== Future Scope =====
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
===== Proposed Voting Choices =====
Include these so readers know where you are heading and can discuss the proposed voting options.
===== Voting =====
* Yes
* No
===== Implementation =====
https://github.com/php/php-src/pull/16043
===== References =====
Userland implementation is here:
* https://github.com/KEINOS/mb_levenshtein
===== Rejected Features =====
Keep this updated with features that were discussed on the mail lists.