rfc:unicode_text_processing

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revisionBoth sides next revision
rfc:unicode_text_processing [2022/11/09 16:47] – created first rough draft derickrfc:unicode_text_processing [2022/12/15 15:28] – Argue the case for a C-based implementation, and mention ICU implementation details derick
Line 3: Line 3:
   * Date: 2022-11-09   * Date: 2022-11-09
   * Author: Derick Rethans <derick@php.net>   * Author: Derick Rethans <derick@php.net>
-  * Status: *Rough* Draft+  * Status: Draft
   * First Published at: http://wiki.php.net/rfc/unicode_text_processing   * First Published at: http://wiki.php.net/rfc/unicode_text_processing
  
Line 10: Line 10:
  
 This RFC suggests to introduce a new class to make using and processing This RFC suggests to introduce a new class to make using and processing
-(Unicode) text significantly more developer friendly compared to the wealth of +(Unicode) text significantly more developer friendly compared to the 
-functionality that the intl extension provides. The goal is to make it easy for +wealth of functionality that the intl extension provides. The goal is to 
-developers to do Unicode text processing correctly. The RFC does not aim to +create an API that developers can use to do Unicode text processing 
-introduce a class that does everything that the intl extension provides with +correctly, without having to know all the intricacies. 
-regards to Unicode strings.+ 
 +Although PHP has decent maths features, it is solely missing performant 
 +Unicode text processing always available in the core. 
 + 
 +==== Definitions ==== 
 + 
 +^ Term ^ Description ^ 
 +| Grapheme | A Unicode "character"A **single** character includes: a normal character (p), a character with diacritics (ô), a character with space modifiers, or an emoji (☺). | 
  
 ===== Proposal ===== ===== Proposal =====
  
-To introduce a new "Text" class, with methods to operate on the text stored +To introduce a new "Text" class, with methods to operate on the text 
-in the objects.+stored in the objects.
  
 Methods on the class will all return a new (immutable) object. Methods on the class will all return a new (immutable) object.
 +
 +The proposal is to make the ''Text'' class part of the PHP core. This would
 +mean that it is therefore always available to user. As the implementation
 +requires ICU, this would also mean that PHP will depend on the ICU library.
 +
 +==== Basics ====
 +
 +Text objects are constructed by passing a UTF-8 encoded string to the
 +constructor.
 +
 +The ''_****_toString()'' method collapses the internally stored text into a
 +UTF-8 encoded string, which can be used by all existing PHP functions
 +that accept strings.
 +
 +The internal representation of the text is UTF-16, as that's what ICU uses.
 +Unlike the PHP 6 approach, the conversion to/from the internal
 +representation only happens on the boundaries: UTF-8 to UTF-16 through
 +the constructor, and the reverse through the ''_****_toString()'' method.
 +
 +There are multiple groups of methods indicated below. Some are to
 +represent PHP's existing string functions (substr, wordwrap, etc.), but
 +with meaningful names.
 +
 +Design Goals:
 +
 +  * keep it simple
 +  * default behaviour should be the most expected
 +  * prefer a method per function, instead of allowing the behaviour of a method to be changed through (optional) arguments.
 +  * operations are on **graphemes**
 +  * no redundant methods that can be constructed from other methods, unless they already exist in PHP, or are frequently used
 +  * more as we discuss this...
 +
 +Non Design Goals:
 +
 +  * introduce every feature of the intl extension
 +
 +Each section below contains a list of expected methods. This list is
 +currently not exhaustive. Please join the discussion on the mailing list
 +to suggest modifications or additions, keeping the design goals in mind.
 +
 +If an argument to any of the methods is listed as ''string|Text'',
 +passing in a ''string'' value will have the same semantics as replacing
 +the passed value with ''new Text($string)''. The locale from the Text
 +object that this method is called on is also used for this new wrapped
 +value, if necessary.
 +
 +==== Locales and Internationalisation ====
 +
 +By default each string will have the "root" collator associated with it,
 +but it is possible to configure a specific collator by using the
 +''$collator'' argument in the constructor. The ''$collator'' is specified as
 +a string describing an ICU locale name:
 +https://unicode-org.github.io/icu/userguide/collation/api.html#instantiating-the-predefined-collators
 +
 +For example, the locale (or collation) name ''en-u-ks-level1'' means
 +case-insensitive sorting for the English locale. This will require
 +extensive documentation.
 +
 +Numerical order collation (such as PHP's ''natsort()'') can be achieved
 +by adding the ''kn'' flag to the locale name, such as in ''de-u-kn''
 +(case-sensitive German, with numerics in value order).
 +
 +Other options are described in BCP47:
 +https://github.com/unicode-org/cldr/blob/main/common/bcp47/collation.xml
 +and defaults at http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings
 +
 +Building a locale/collation string will also be possible by using a
 +''TextCollator'' object, to allow for better and easier-to-read customization
 +of collations. The class performs the same function as ''\Intl\Collator''
 +(https://www.php.net/manual/en/class.collator.php), except that it has
 +descriptive methods to set collation properties. The reason for a separate
 +class is so that you don't have to depend on the ''Intl'' extension, and to
 +make it more developer-friendly. It converts the configured options to a
 +string, which can then be used in any location where ''string $collator'' is
 +used in the function signatures to the methods on the ''Text'' class.
 +
 +
 +==== Construction ====
 +
 +This section lists all the method that construct a Text object.
 +
 +=== __construct(string $text, string $locale = 'root/standard') ===
  
 The constructor takes a UTF-8 encoded text, and stores this in an internal The constructor takes a UTF-8 encoded text, and stores this in an internal
 structure. The constructor will also convert the given text to Unicode structure. The constructor will also convert the given text to Unicode
 Canonical Form. Passing in non-well-formed UTF-8 will result in an Canonical Form. Passing in non-well-formed UTF-8 will result in an
-`InvalidEncodingException`. The constructor will also strip out a BOM+''InvalidEncodingException''. The constructor will also strip out a BOM
 (Byte-Order-Mark) character, if present. (Byte-Order-Mark) character, if present.
  
-By default each string will have the "root" locale associated with itbut it +=== static Text::create(string $textstring $locale = 'root/standard') ===
-is possible to configure a specific locale by using the `$locale` argument in +
-the constructor.+
  
-The ``__toString()`` method collapses the internally stored text into +The Symfony String package, offers static function to construct a String 
-UTF-8 encoded string, which can be used by all existing PHP functions that +through a single-character function (''u''), which you can import into the 
-accept strings.+file scope (with ''use'').
  
-Methods fall into multiple groups. Some to implement PHP's existing +This method solves a similar use, so that you can shorten ''new Text()'' to 
-string functions (substr, wordwrap, etc.), but with meaningful names. A +''t'' after having imported the method into the file's scope with: 
-design goal is to rather create more methods, than allowing the behaviour of +For example with ''use \Text::create as t''.
-methods to be changed through (optional) arguments.+
  
-The internal representation would be UTF-16, as that's what ICU uses. Unlike +=== static Text::join(array(string|Text) $elements, string|Text $separator, string $collator = NULL===
-the PHP 6 approach, the conversion to/from the internal representation only +
-happens on the boundariesUTF-8 to UTF-16 through the constructor, and the +
-reverse through the ``__toString()`` method.+
  
-==== Groups of Methods ====+Creates a new Text object by concatenating the Text element in 
 +''$elements'', inserting ''$separator'' in between each element.
  
-Each section will contain a list of expected methodswhich from the start +The semantics are like: ''implode(string $separatorarray(string) $array)''
-might not be exhaustive. Please join the discussion on the mailing list to +
-suggest modifications or additions, keeping the design goals in mind.+
  
-=== Construction ===+If the ''$collator'' is not specified, it uses the collection of the first 
 +element in the ''$elements'' array. This will also be then set on the created 
 +object.
  
-``__construct(string $textstring $locale = 'C')``+If the ''$elements'' array is emptyan empty ''Text'' object with the 
 +''root'' locale is created.
  
-=== Standard String Operations === 
  
-All string operators operate on **graphemes**, which are generally: a normal +==== Standard String Operations ====
-character, a character with diacritics, a character with space modifiers, or +
-an emojis.+
  
-I am not sure if these should accept `string|Text` or only `Text` as 
-`$textToFind`. Accepting a string makes for a easier to use API, but with the 
-caveat that we internally need to convert it pretty much to a `Text` object 
-any way. 
  
-``splitByText(Text $separator, int $limit = PHP_INT_MAX): array(Text)`` +=== split(string|Text $separator, int $limit = PHP_INT_MAX): array(Text) ===
- Returns an array of Text objects, each of which is a substring of `$this`, +
- formed by splitting it on boundaries formed by the text `$separator`.+
  
- Like `explode($separator, $limit)`.+Returns an array of Text objects, each of which is a substring of ''$this'', 
 +formed by splitting it on boundaries formed by the text ''$separator''.
  
-``static Text::joinFromTexts(array(Text) $elements, Text $separator`` +Like ''explode($separator, $limit)''.
- Creates a new Text object by concatenating the each Text element in +
- `$elements`inserting `$separator` in between each element.+
  
- Semantics like `implode(string $separator, array(string) $array);` 
  
-``subString(int $offset, int $length) : Text|false`` +=== subString(int $offset, int $length) : Text|false ===
- Returns a sub-string, starting at `$offset` for `$length` graphemes.+
  
- Like: `grapheme_substr($this, $offset$length)` +Returns a sub-stringstarting at ''$offset'' for ''$length'' graphemes.
- https://www.php.net/manual/en/function.grapheme-substr.php+
  
-``trimLeft`` +Like: ''grapheme_substr($this$offset$length)'' 
-``trimRight`` +https://www.php.net/manual/en/function.grapheme-substr.php
-``trim`` +
- Removes white space at the start ofthe end ofor both sides of the text.+
  
- Like: `ltrim``rtrim`and `trim`, but with using the unicode definition +=== trimLefttrimRight, trim ===
- of what white space is. https://unicode.org/reports/tr44/#White_Space+
  
-``wrap(int $maxWidthbool $cutLongWords = false) : array(Text)`` +Removes white space at the start ofthe end of, or both sides of the text.
- Wraps a text to a given number of graphemes into an array of Text objects.+
  
- Like: `wordwrap`but based on graphemes and returning an array instead of +Like: ''ltrim'', ''rtrim'', and ''trim'', but with using the Unicode definition 
- inserting a break character.+of what white space is. https://unicode.org/reports/tr44/#White_Space
  
- If `$cutLongWords` is setno Text element will be larger than +=== wrap(int $maxWidthbool $cutLongWords = false) : array(Text) ===
- `$maxWidth`.+
  
-``replaceText(Text $search, Text $replace)`` ??+Wraps a text to a given number of graphemes per lineinto an array of Text 
 +objects.
  
-``replaceTextCaseInsensitively(Text $searchText $replace)`` ?? +Like: ''wordwrap''but based on graphemes and returning an array instead of 
- Will have to use locales too.+inserting a break character.
  
-``reverse()`` +If ''$cutLongWords'' is setno Text element will be larger than 
- Reverses a texttaking into account grapheme boundaries.+''$maxWidth''.
  
-=== Finding text in text ===+ 
 +=== replaceText(string|Text $search, string|Text $replace, int $replaceFrom = 0, int $replaceTo = -1 ) === 
 + 
 +Replaces the first ''$maxReplacements'' occurrences of ''$search'' with 
 +''$replace''
 + 
 +The locale of ''$search'' is used to find sub-strings that 
 +match, if it is a ''Text'' object, otherwise the locale embedded in the object 
 +that the method is called on. 
 + 
 +The ''$replaceFrom'' and ''$replaceTo'' arguments control which found 
 +items are being replaced. The ''$replaceFrom'' argument is the first 
 +argument that is being replaced (0-indexed), and ''$replaceTo'' is the 
 +last item. Positive numbers are counted from the first occurrence of 
 +''$search'' in the Text, and negative numbers from the last found 
 +occurrence. 
 + 
 +In order to find sub-strings case-insensitively, you can use the ''$collator'' 
 +argument to the constructor of the ''$search'' argument. 
 + 
 +=== reverse() === 
 + 
 +Reverses a text, taking into account grapheme boundaries. 
 + 
 + 
 +==== Finding Text in Text ====
  
 Methods to find text in other text. Methods to find text in other text.
  
-``getPositionOfFirstOccurrence(string|Text $textToFindint $offset) : int|false`` +In all these methodsthe locale of ''$search'' is used to find sub-strings that  
- Returns the position (in grapheme units) of the first occurrence of +match, if it is a ''Text'' object, otherwise the locale embedded in the object 
- `$textToFind` starting at the (grapheme) `$offset`, or false if not found.+that the method is called on.
  
- Like: `grapheme_strpos($this, $textToFind, $offset)` 
- https://www.php.net/manual/en/function.grapheme-strpos.php 
  
-``getPositionOfLastOccurrence(string|Text $textToFind, int $offset) : int|false`` +=== getPositionOfFirstOccurrence(string|Text $search, int $offset) : int|false ===
- Like `getPositionOfFirstOccurrence` but then from the end of the text.+
  
-``returnFromFirstOccurence(string|Text $textToFind: Text|false`` +Returns the position (in grapheme unitsof the first occurrence of 
- Returns the `Text` starting with the `$textToFind` if foundand +''$search'' starting at the (grapheme) ''$offset''or false if not found.
- otherwise `false`.+
  
- Like: `grapheme_strstr($this, $textToFind)` +Like: ''grapheme_strpos($this, $search, $offset)'' 
- (https://www.php.net/manual/en/function.grapheme-strstr.php)+https://www.php.net/manual/en/function.grapheme-strpos.php
  
-``returnFromLastOccurence(string|Text $textToFind) : Text|false`` +*I think this method name is too long*
- Like `returnFromFirstOccurence` but then from the end of the text.+
  
-`compareWith(Text $other) : int` (or also the Text's compare handler) +=== getPositionOfLastOccurrence(string|Text $search, int $offset) : int|false ===
- Needs to use a locale, and sorting text strength (to avoid all the many +
- options)... perhaps use Intl's collator instead? Or have two methods?+
  
-`compareWithNaturalOrder(Text $other) : int` 
- Like `strnatcmp`/`strnatcasecmp`. Would be a short cut for using 
- `compareWithCollator` with a `$collator` with the NUMERIC_COLLATION option 
- turned on. 
  
-`compareWithCollator(Text $other, \Intl\Collator $collator) : int`+Like ''getPositionOfFirstOccurrence'' but then from the end of the text.
  
-``contains(Text $string)`` 
- Returns true if the text `$string` can be found in the text. 
  
- Like `str_contains`.+=== returnFromFirstOccurence(string|Text $search) : Text|false ===
  
-``endsWith(Text $string)``+Returns the ''Text'' starting with the ''$search'' if found, and 
 +otherwise ''false''.
  
-``startsWith(Text $string)``+Like: ''grapheme_strstr($this, $search)'' 
 +(https://www.php.net/manual/en/function.grapheme-strstr.php)
  
  
-Case-insensitive variants are not included. If you need this, convert the +=== returnFromLastOccurence(string|Text $search: Text|false ===
-text(swith ``toLower`` first. Or allow for using Intl's Collator? That'd be +
-nicer...+
  
-=== Case Conversions ===+Like ''returnFromFirstOccurence'' but then from the end of the text.
  
-``toLower`` +=== contains(string|Text $search) ===
- Converts the text to lower case, using the lower case variant of each +
- Unicode code point that makes up the text.+
  
-``toUpper``+Returns true if the text ''$search'' can be found in the text.
  
-``toTitle`` +Like ''str_contains''.
-+
-``firstToLower`` +
- Converts the first grapheme in the text to a lower case variant.+
  
-``firstToUpper`` 
  
-``firstToTitle``+=== endsWith(string|Text $search) : bool ===
  
 +Compares the last ''$search.Length()'' graphemes of ''$this''.
  
-=== Counting ===+Case-insensitive comparison can be achieved by setting the right 
 +''$collator'' on ''$search''.
  
-`getByteCount()` +Could be constructed from ''getPositionOflastOccurrence()'' and 
- Returns the size in bytes that the text will take when converted to UTF-8.+''length()'', but it's an often required method, and standard PHP has it 
 +too.
  
-`length()` 
-`getCharacterCount()` 
- Returns the number of characters that make up the text. A character (also 
- sometimes call a grapheme) consists of the base-character, and all 
- combining diacritics. Unicode calls these "extended grapheme clusters". 
- http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries 
  
-`getCodePointCount()+=== startsWith(string|Text $search: bool ===
- Returns the number of Unicode code points that make up the text. +
- (Not sure if we should add this, as it doesn't really have any use).+
  
-`countWords()` +Compares the first ''$search.Length()'' graphemes of ''$this''
- Pretty much a shortcut for::+ 
 +Case-insensitive comparison can be achieved by setting the right 
 +''$collator'' on ''$search''
 + 
 +Could be constructed from ''getPositionOfFirstOccurrence()'', 
 +but it's an often required method, and standard PHP has it 
 +too. 
 + 
 + 
 +==== Comparing Text Objects ==== 
 + 
 +=== compareWith(Text $other, string $collator = NULL) : int === 
 + 
 +Uses the configured ''$collator'' of ''$this'' to compare it against 
 +''$other'', unless the ''$collator'' argument is specified as an override. 
 + 
 +This same method is also used for comparing two Text objects as "compare 
 +handler". Here only the locale on ''$this'' is taken into account. 
 + 
 + 
 +==== Case Conversions ==== 
 + 
 +These operations all use the collation that is configured on the Text object. 
 + 
 +=== toLower === 
 + 
 +Converts the text to lower case, using the lower case variant of each 
 +Unicode code point that makes up the text. 
 + 
 +=== toUpper === 
 + 
 +The same, but then to upper case. 
 + 
 +=== toTitle === 
 + 
 +The same, but then to title case (the first letter of each word). 
 + 
 +=== firstToLower === 
 + 
 +Converts the first grapheme in the text to a lower case variant. 
 + 
 +=== firstToUpper === 
 + 
 +The same, but then to upper case. 
 + 
 +=== firstToTitle === 
 + 
 +The same, but then to title case (the first letter of each word). 
 + 
 + 
 +=== wordsToLower === 
 + 
 +Converts the first grapheme in every word to an lower case variant. 
 + 
 +=== wordsToUpper === 
 + 
 +The same, but then to upper case. 
 + 
 +=== wordsToTitle === 
 + 
 +The same, but then to title case (the first letter of each word). 
 + 
 + 
 +==== Counting ==== 
 + 
 + 
 +=== getByteCount() === 
 + 
 +Returns the size in bytes that the text will take when converted to UTF-8. 
 + 
 + 
 +=== length(), getCharacterCount() === 
 + 
 +Returns the number of characters that make up the text. A character (also 
 +sometimes call a grapheme) consists of the base-character, and all 
 +combining diacritics. Unicode calls these "extended grapheme clusters"
 +http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries 
 + 
 + 
 +=== getCodePointCount() === 
 + 
 +Returns the number of Unicode code points that make up the text. 
 +(Not sure if we should add this, as it doesn't really have any use). 
 + 
 + 
 +=== getWordCount() === 
 + 
 +Pretty much a shortcut for::
  
  $count = 0;  $count = 0;
  foreach ($text->getWordIterator as $word) { $count++ };  foreach ($text->getWordIterator as $word) { $count++ };
  
- Uses the locale, just like the iterators.+Uses the locale, just like the iterators.
  
  
-=== Iterators ===+==== Iterators ====
  
 These functions return an iterator that can be used to iterator over the text. These functions return an iterator that can be used to iterator over the text.
 The return of the iterators are effected by the text's locale. The return of the iterators are effected by the text's locale.
 +i
 +These are inspired by ICU4J's BreakIterators
 +(https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/BreakIterator.html)
 +and Intl's create*Instance methods on ''Intl\BreakIterator''
 +(https://www.php.net/manual/en/class.intlbreakiterator.php).
  
-``getCharacterIterator``+=== getCharacterIterator ===
  
-``getLineIterator``+Returns an Iterator that locates boundaries between logical characters. 
 +Because of the structure of the Unicode encoding, a logical character may be 
 +stored internally as more than one Unicode code point. (A with an umlaut may 
 +be stored as an 'a' followed by a separate combining umlaut character, for 
 +example, but the user still thinks of it as one character.) This iterator 
 +allows various processes (especially text editors) to treat as characters the 
 +units of text that a user would think of as characters, rather than the units 
 +of text that the computer sees as "characters".
  
-``getSentenceIterator``+=== getWordIterator ===
  
-``getTitleIterator``+Returns an Iterator that locates boundaries between words. This is useful 
 +for double-click selection or "find whole words" searches. This type of 
 +iterator makes sure there is a boundary position at the beginning and end 
 +of each legal word. (Numbers count as words, too.) Whitespace and punctuation 
 +are kept separate from real words. 
  
-``getWordIterator``+=== getLineIterator ===
  
 +Returns an Iterator that locates positions where it is legal for a text
 +editor to wrap lines. This is similar to word breaking, but not the same:
 +punctuation and whitespace are generally kept with words (you don't want a
 +line to start with whitespace, for example), and some special characters can
 +force a position to be considered a line-break position or prevent a position
 +from being a line-break position. 
  
-=== Transliteration ===+=== getSentenceIterator === 
 + 
 +Returns an Iterator that locates boundaries between sentences. 
 + 
 + 
 +=== getTitleIterator === 
 + 
 +Returns an Iterator that locates boundaries between title breaks.  
 + 
 + 
 +==== Transliteration ====
  
 Converts text between scripts and other properties. Converts text between scripts and other properties.
  
-``transliterate(string $transliterationString)`` 
  
-``transliterate(\Intl\Transliterator $transliterator)``+=== transliterate(string $transliterationString===
  
-With the first one being a "simple" one to use, and the second using Intl's +Transliterates the content of the ''Text'' object according to the rules as 
-Transliterator for more complex cases.+specified in the ''$transliterationString''.
  
-Should we add shortcuts for a set of often used ones, such as `Any-Latin`? I +There are a few constants for specific and often used cases, such as creating 
-think so, as it's the majority use case.+an ASCII transliterated version of any Text:
  
-``toLatin`` + - const Text::toAscii : A shortcut for a transliteration string that converts 
- Converts any script to Latin.+   any script to Latin, and also strips all the accents.
  
-``removeAccents`` + - const Text::toLatin : A shortcut for a transliteration string that converts 
- Removes the accents from a (latin script) text.+   any script to Latin, but does not remove the accents.
  
- A shortcut for the transliteration string `"Latin-ASCII"` (or a more + - const Text::removeAccents : Removes the accents from a Text. A shortcut for 
- suitable one, which I believe is `"NFD; [:Nonspacing Mark:] Remove; +   the transliteration string ''"NFD; [:Nonspacing Mark:] Remove; NFC."''.
- NFC."`.+
  
 +===== Implementation Details =====
 +
 +The functionality as is described in this RFC is mostly implemented by using
 +functionality from the ICU library, which is also used by the Intl extension.
 +
 +In order for PHP to continue to work on an as widest range of platforms and
 +distributions, the minimum ICU version will be chosen accordingly to common
 +Linux distributions' lowest version, which would include the version of PHP in
 +which this functionality is implemented.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
  
-Introducing a new class could impact code bases that already use this class +Introducing a new ''Text'' class could impact code bases that already use this 
-name. But as PHP owns the global namespace, this should not deter us from +class name. But as PHP owns the global namespace, this should not deter us 
-adding such a code class.+from adding such a code class.
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
Line 261: Line 457:
 ===== Open Issues ===== ===== Open Issues =====
  
-==== Class Name ==== 
  
-I have currently picked "Text", as it describes that the object does not only +===== Questions and Answers ===== 
-represent single words (strings). Alternatively, we can pick something like + 
-"Utext" (for Unicode Text), but I find that a distraction.+==== Why is this not a composer package? ==== 
 + 
 +The goal of this RFC is that PHP users can always rely on performant text 
 +processing capabilities.
  
 +Text processors written in PHP already exist, but suffer from performance
 +issues (PHP is slower than C), and are sometimes tailored to specific use
 +cases. By having them written in C, and utilising ICU's well tested and often
 +updated rules and algorithms, both the performance and correctness issues will
 +be addressed.
  
 ===== Future Scope ===== ===== Future Scope =====
Line 283: Line 486:
 ===== Implementation ===== ===== Implementation =====
  
-After the project is implemented, this section should contain +After the project is implemented, this section should contain
   - the version(s) it was merged into   - the version(s) it was merged into
   - a link to the git commit(s)   - a link to the git commit(s)
rfc/unicode_text_processing.txt · Last modified: 2022/12/21 11:48 by derick