rfc:unicode_text_processing
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:unicode_text_processing [2022/12/18 17:29] – Fix several typos or difficult wording theodorejb | rfc:unicode_text_processing [2024/09/11 14:16] (current) – derick | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Unicode Text Processing ====== | ====== PHP RFC: Unicode Text Processing ====== | ||
- | * Version: 0.9 | + | * Version: 0.9.2 |
- | * Date: 2022-12-16 (Original date: 2022-12-15) | + | * Date: 2022-12-21 (Original date: 2022-12-15) |
* Author: Derick Rethans < | * Author: Derick Rethans < | ||
* Status: Draft | * Status: Draft | ||
* First Published at: http:// | * First Published at: http:// | ||
- | |||
===== Introduction ===== | ===== Introduction ===== | ||
Line 32: | Line 31: | ||
The proposal is to make the '' | The proposal is to make the '' | ||
- | mean that it is therefore always available to users. As the implementation | + | mean that it is therefore always available to user. As the implementation |
requires ICU, this would also mean that PHP will depend on the ICU library. | requires ICU, this would also mean that PHP will depend on the ICU library. | ||
Line 72: | Line 71: | ||
If an argument to any of the methods is listed as '' | If an argument to any of the methods is listed as '' | ||
passing in a '' | passing in a '' | ||
- | the passed value with '' | + | the passed value with '' |
- | object that this method is called on is also used for this new wrapped | + | from the Text object that this method is called on is also used for this new |
- | value, if necessary. | + | wrapped |
- | ==== Locales and Internationalisation ==== | + | ==== Locales, Collators, |
- | By default each string will have the " | + | By default each string will have the "root" locale and " |
- | but it is possible to configure a specific collator by using the | + | associated with it, but it is possible to configure a specific |
- | '' | + | collator by using the '' |
- | a string describing an ICU locale name: | + | addition to the locale, and affects sorting and finding operations. |
+ | |||
+ | The '' | ||
+ | name: | ||
https:// | https:// | ||
- | For example, the locale (or collation) name '' | + | The methods on the Text object all use the '' |
- | case-insensitive sorting for the English locale. | + | |
- | extensive documentation. | + | For example, the locale (and collation) name '' |
+ | case-insensitive sorting | ||
+ | The format of this locale/ | ||
- | Numerical order collation (such as PHP's '' | + | Numerical order collation (such as PHP's '' |
- | by adding the '' | + | adding the '' |
- | (case-sensitive German, with numerics in value order). | + | (case-sensitive German |
Other options are described in BCP47: | Other options are described in BCP47: | ||
Line 109: | Line 113: | ||
==== Construction ==== | ==== Construction ==== | ||
- | This section lists all the methods | + | This section lists all the method |
- | === __construct(string $text, string $locale | + | === __construct(string $text, string $collation |
The constructor takes a UTF-8 encoded text, and stores this in an internal | The constructor takes a UTF-8 encoded text, and stores this in an internal | ||
Line 120: | Line 124: | ||
if present. | if present. | ||
- | === static Text:: | ||
- | The Symfony String package offers a static function to construct a String | + | === static Text:: |
+ | |||
+ | The Symfony String package, offers a static function to construct a String | ||
through a single-character function ('' | through a single-character function ('' | ||
file scope (with '' | file scope (with '' | ||
This method solves a similar use, so that you can shorten '' | This method solves a similar use, so that you can shorten '' | ||
- | '' | + | '' |
- | '' | + | For example with '' |
- | === static Text:: | ||
- | Creates a new Text object by concatenating the Text element | + | === static Text:: |
+ | |||
+ | Creates a new Text object by concatenating | ||
+ | into a new Text object. | ||
+ | |||
+ | If the '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | === static Text:: | ||
+ | |||
+ | Creates a new Text object by looping over all the string/Text elements | ||
'' | '' | ||
The semantics are like: '' | The semantics are like: '' | ||
- | If the '' | + | If the '' |
- | element | + | element |
- | object. | + | created |
- | If the '' | + | If the '' |
- | '' | + | '' |
+ | If the iterator produces a non-string/ | ||
+ | will be thrown. | ||
==== Standard String Operations ==== | ==== Standard String Operations ==== | ||
Line 180: | Line 197: | ||
If '' | If '' | ||
'' | '' | ||
- | |||
- | |||
- | === replaceText(string|Text $search, string|Text $replace, int $replaceFrom = 0, int $replaceTo = -1 ) : \Text === | ||
- | |||
- | Replaces occurrences of '' | ||
- | |||
- | The locale of '' | ||
- | match, if it is a '' | ||
- | that the method is called on. | ||
- | |||
- | The '' | ||
- | items are being replaced. The '' | ||
- | argument that is being replaced (0-indexed), | ||
- | last item. Positive numbers are counted from the first occurrence of | ||
- | '' | ||
- | occurrence. | ||
- | |||
- | In order to find sub-strings case-insensitively, | ||
- | argument to '' | ||
=== reverse() : \Text === | === reverse() : \Text === | ||
Line 209: | Line 207: | ||
Methods to find text in other text. | Methods to find text in other text. | ||
- | In all these methods, the locale of '' | + | In all these methods, the locale |
- | match, if it is a '' | + | sub-strings that match, if it is a '' |
- | that the method is called on. | + | collator that are embedded in the object that the method is called on is used. |
Line 241: | Line 239: | ||
(https:// | (https:// | ||
- | Alternative suggested names: '' | + | Alternative suggested names: '' |
Line 247: | Line 245: | ||
Like '' | Like '' | ||
+ | |||
+ | Alternative suggested names: '' | ||
+ | |||
=== contains(string|Text $search) === | === contains(string|Text $search) === | ||
Line 253: | Line 254: | ||
Like '' | Like '' | ||
- | |||
- | Alternative suggested names: '' | ||
Line 262: | Line 261: | ||
Case-insensitive comparison can be achieved by setting the right | Case-insensitive comparison can be achieved by setting the right | ||
- | '' | + | '' |
Could be constructed from '' | Could be constructed from '' | ||
Line 274: | Line 273: | ||
Case-insensitive comparison can be achieved by setting the right | Case-insensitive comparison can be achieved by setting the right | ||
- | '' | + | '' |
Could be constructed from '' | Could be constructed from '' | ||
but it's an often required method, and standard PHP has it | but it's an often required method, and standard PHP has it | ||
too. | too. | ||
+ | |||
+ | === replaceText(string|Text $search, string|Text $replace, int $replaceFrom = 0, int $replaceTo = -1 ) : \Text === | ||
+ | |||
+ | Replaces occurrences of '' | ||
+ | |||
+ | The '' | ||
+ | items are being replaced. The '' | ||
+ | argument that is being replaced (0-indexed), | ||
+ | last item. Positive numbers are counted from the first occurrence of | ||
+ | '' | ||
+ | occurrence. | ||
+ | |||
+ | In order to find sub-strings case-insensitively, | ||
+ | argument to '' | ||
==== Comparing Text Objects ==== | ==== Comparing Text Objects ==== | ||
- | === compareWith(Text $other, string $collator | + | === compareWith(Text $other, string $collation |
- | Uses the configured '' | + | Uses the configured '' |
- | '' | + | '' |
This same method is also used for comparing two Text objects as " | This same method is also used for comparing two Text objects as " | ||
- | handler" | + | handler" |
+ | taken into account. | ||
+ | |||
+ | === equals(Text $other, string $collation = NULL) : boolean === | ||
+ | |||
+ | Alias for '' | ||
Line 383: | Line 401: | ||
These functions return an iterator that can be used to iterator over the text. | These functions return an iterator that can be used to iterator over the text. | ||
The return of the iterators are effected by the text's locale. | The return of the iterators are effected by the text's locale. | ||
- | i | + | |
These are inspired by ICU4J' | These are inspired by ICU4J' | ||
(https:// | (https:// | ||
Line 477: | Line 495: | ||
- Add a method a like mb_strcut, to extract a string of a maximum amount of bytes from a position, as encoded through UTF-8. | - Add a method a like mb_strcut, to extract a string of a maximum amount of bytes from a position, as encoded through UTF-8. | ||
- | - Tidy up language related to locale/ | ||
===== Questions and Answers ===== | ===== Questions and Answers ===== | ||
Line 522: | Line 539: | ||
===== Changes ===== | ===== Changes ===== | ||
+ | |||
+ | 0.9.2 — 2022-12-21 | ||
+ | |||
+ | * Tim Düsterhus: Added concat and equals methods; changed join to accept an iterator. | ||
+ | * Enhance explanation of locales and collations, and standardize on using '' | ||
0.9.1 — 2022-12-16 | 0.9.1 — 2022-12-16 |
rfc/unicode_text_processing.1671384551.txt.gz · Last modified: 2022/12/18 17:29 by theodorejb