rfc:strtolower-ascii
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:strtolower-ascii [2021/09/22 23:59] – edits due to cscott's review tstarling | rfc:strtolower-ascii [2021/12/10 21:52] (current) – status: accepted tstarling | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Locale-independent case conversion ====== | ====== PHP RFC: Locale-independent case conversion ====== | ||
- | * Version: | + | * Version: |
* Date: 2021-09-22 | * Date: 2021-09-22 | ||
* Author: Tim Starling < | * Author: Tim Starling < | ||
- | * Status: | + | * Status: |
* Target version: PHP 8.2 | * Target version: PHP 8.2 | ||
* Implementation: | * Implementation: | ||
Line 15: | Line 15: | ||
Prior to PHP 8.0, PHP's locale was set from the environment. When a user installs Linux, it asks what language you want it to be in. The user might not fully appreciate the consequences of this decision. It not only sets the user interface language for built-in commands, it also pervasively changes how string handling in the C library works. For example, a user selecting " | Prior to PHP 8.0, PHP's locale was set from the environment. When a user installs Linux, it asks what language you want it to be in. The user might not fully appreciate the consequences of this decision. It not only sets the user interface language for built-in commands, it also pervasively changes how string handling in the C library works. For example, a user selecting " | ||
- | In an era of network connectivity and standardized text-based protocols, natural language is a minority application for case conversion. But even if the user did want natural language case conversion, they would be unlikely to achieve success with strtolower(). This is because it processes the string one byte at a time, feeding each byte to the C library' | + | In an era of standardized text-based protocols, natural language is a minority application for case conversion. But even if the user did want natural language case conversion, they would be unlikely to achieve success with strtolower(). This is because it processes the string one byte at a time, feeding each byte to the C library' |
PHP 8.0 stopped respecting the locale environment variables. So the locale is always " | PHP 8.0 stopped respecting the locale environment variables. So the locale is always " | ||
Line 40: | Line 40: | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | |||
- | ==== Main changes ==== | ||
The following PHP string functions will do ASCII case conversion: | The following PHP string functions will do ASCII case conversion: | ||
Line 55: | Line 53: | ||
* str_ireplace | * str_ireplace | ||
- | Note that strcasecmp(), | + | Also: |
- | php_strtolower() and php_strtoupper() are the internal C API equivalent of strtoupper() and strtolower(). After reviewing the callers of these functions in the core tree, I decided that they should also be part of this change. They will henceforth | + | * In arsort(), asort(), krsort(), ksort(), rsort(): SORT_FLAG_CASE will mean sorting by ASCII case folding. |
+ | * array_change_key_case() | ||
- | ==== Consequent changes ==== | + | Note that strcasecmp(), |
- | The flow-on effects of the change to the behavior of php_strtolower() and php_strtoupper() are a microcosm of the damaging and inappropriate uses locale-sensitive | + | ASCII case conversion is identical |
- | + | ||
- | * strip_tags(): | + | |
- | * grapheme_stripos() and grapheme_strripos() currently have a locale-sensitive " | + | |
- | * ldap_get_entries(): | + | |
- | * mb_send_mail(): | + | |
- | * oci_pconnect(): | + | |
- | * PDO DBLIB: ASCII will be used when stringifying UNIQUE column values and converting them to uppercase. | + | |
- | * SoapClient: function names will be indexed by the ASCII lowercase name, consistent | + | |
- | * get_meta_tags(): | + | |
- | * http stream wrapper: HTTP headers will be matched by the ASCII lower case name. | + | |
- | * phpinfo(): Anchor names contain the lower-case version of the extension name. This will become ASCII lower case. | + | |
- | * xml_parser_set_option(): | + | |
- | * Stream protocol names will be matched by ASCII case insensitivity. | + | |
- | * PHP manual docref URLs will be constructed by ASCII case conversion of the class and function. | + | |
- | * rfc1867.c: When processing | + | |
- | + | ||
- | ==== New functions ==== | + | |
- | + | ||
- | I am proposing that locale-sensitive case conversion be provided by functions called ctype_tolower() and ctype_toupper(). Effectively, | + | |
- | + | ||
- | * tolower() and toupper() are in ctype.h, so it fits with ctype' | + | |
- | * The limitations of the implementation are shared by the other ctype functions and so are less likely to be surprising. | + | |
- | * The result is consistent with ctype_islower() and ctype_isupper(). | + | |
- | * It's easy to do, and maybe someone will want them. | + | |
- | + | ||
- | Some statements in the manual about what the ctype extension is for will have to be updated. | + | |
- | + | ||
- | For completeness, | + | |
===== Alternatives considered ===== | ===== Alternatives considered ===== | ||
Line 104: | Line 75: | ||
It is not possible for strtolower() to raise a deprecation warning depending on its input, because there is no way to tell whether a given case transformation was intended by the caller. | It is not possible for strtolower() to raise a deprecation warning depending on its input, because there is no way to tell whether a given case transformation was intended by the caller. | ||
+ | |||
+ | I considered introducing ctype_tolower() and ctype_toupper(), | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | This RFC is part of a program of reducing locale dependence | + | I didn't include strnatcasecmp() and natcasesort() in this RFC, because they also use isdigit() and isspace(). They could be migrated |
+ | |||
+ | There are about 50 direct callers of tolower() and toupper() which I haven' | ||
- | ===== Proposed | + | ===== Voting ===== |
- | I would consider making the introduction of ctype_tolower() and ctype_toupper() be optional. But if that seems uncontroversial during the discussion phase, we can just have a yes/no vote. | + | Voting period: 2021-11-25 to 2021-12-09. |
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
rfc/strtolower-ascii.1632355185.txt.gz · Last modified: 2021/09/22 23:59 by tstarling