====== PHP RFC: Grapheme cluster for levenshtein, grapheme_levenshtein function ======
* Version: 0.1
* Date: 2024-10-14
* Author: Yuya Hamada, youkidearitai@gmail.com
* Status: Draft
* First Published at: http://wiki.php.net/rfc/grapheme_levenshtein
===== Introduction =====
I creating mb_levenshtein [[https://wiki.php.net/rfc/mb_levenshtein]]. However, there was some discussion that the Levenshtein function for each grapheme cluster might be more logical, and I thought so too, so I created a PoC.
ref: [[https://github.com/php/php-src/issues/16428]]
For example, combined character is works fine.
var_dump(grapheme_levenshtein("\u{0065}\u{0301}", "\u{00e9}")); // Result is 0 when use grapheme_levenshtein. mb_levenshtein is not works well.
Also, variable selector is works fine.
// variable $nabe and $nabe_E0100 is seems nothing different.
// However, $nabe_E0100 is variable selector in U+908A U+E0100.
// So grapheme_levenshtein result is maybe 0.
$nabe = '邊';
$nabe_E0100 = "邊󠄀";
var_dump(grapheme_levenshtein($nabe, $nabe_E0100)); // Result is 0 when use grapheme_levenshtein. mb_levenshtein result is 1 that it's not works fine.
===== Proposal =====
Add grapheme_levenshtein function.
function grapheme_levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1, int $deletion_cost = 1): int|false {}
$string1 and $string2 is only need UTF-8. Returns false is failed parse to UTF-8.
===== Backward Incompatible Changes =====
This could break a function existing in userland with the same name.
===== Proposed PHP Version(s) =====
PHP 8.5
===== RFC Impact =====
==== To SAPIs ====
To SAPIs Will add the aforementioned functions to all PHP environments.
==== To Existing Extensions ====
Adds grapheme_levenshtein() to the intl extension.
==== To Opcache ====
No effect.
==== New Constants ====
No new constants.
==== php.ini Defaults ====
No changed php.ini settings.
===== Open Issues =====
https://github.com/php/php-src/issues/16428
===== Future Scope =====
This section details areas where the feature might be improved in future, but that are not currently proposed in this RFC.
===== Proposed Voting Choices =====
Include these so readers know where you are heading and can discuss the proposed voting options.
===== Voting =====
TBD.
===== Implementation =====
https://github.com/php/php-src/pull/16043
===== References =====
Nothing.
===== Rejected Features =====
Keep this updated with features that were discussed on the mail lists.