rfc:strtolower-ascii

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:strtolower-ascii [2021/09/24 03:18] – remove "consequent changes" since nikic will merge most of them separately without an RFC tstarlingrfc:strtolower-ascii [2021/12/10 21:52] (current) – status: accepted tstarling
Line 1: Line 1:
 ====== PHP RFC: Locale-independent case conversion ====== ====== PHP RFC: Locale-independent case conversion ======
-  * Version: 1.1+  * Version: 1.2
   * Date: 2021-09-22   * Date: 2021-09-22
   * Author: Tim Starling <tstarling@wikimedia.org>   * Author: Tim Starling <tstarling@wikimedia.org>
-  * Status: Under Discussion+  * Status: Accepted
   * Target version: PHP 8.2   * Target version: PHP 8.2
   * Implementation: https://github.com/php/php-src/pull/7506   * Implementation: https://github.com/php/php-src/pull/7506
Line 40: Line 40:
  
 ===== Proposal ===== ===== Proposal =====
- 
-==== Main changes ==== 
  
 The following PHP string functions will do ASCII case conversion: The following PHP string functions will do ASCII case conversion:
Line 61: Line 59:
  
 Note that strcasecmp(), strncasecmp() and substr_compare() with $case_insensitive = true were already using ASCII case conversion. Note that strcasecmp(), strncasecmp() and substr_compare() with $case_insensitive = true were already using ASCII case conversion.
- 
-php_strtolower() and php_strtoupper() are the internal C API equivalent of strtoupper() and strtolower(). After reviewing the callers of these functions in the core tree, I decided that they should also be part of this change. They will henceforth do ASCII case conversion. 
- 
-For consistency, I also made the case comparison functions in zend_operators.c do ASCII case conversion, specifically string_compare_function_ex, string_case_compare_function, zend_binary_zval_strcasecmp and zend_binary_zval_strncasecmp. 
  
 ASCII case conversion is identical to case conversion with the "C" locale. So these changes have no effect unless setlocale() was called. ASCII case conversion is identical to case conversion with the "C" locale. So these changes have no effect unless setlocale() was called.
- 
-==== New functions ==== 
- 
-I am proposing that locale-sensitive case conversion be provided by functions called ctype_tolower() and ctype_toupper(). Effectively, strtolower() will be renamed to ctype_tolower() and strtoupper() will be renamed to ctype_toupper(). My reasons are: 
- 
-  * tolower() and toupper() are in ctype.h, so it fits with ctype's theme of providing access to ctype.h functions. 
-  * The limitations of the implementation are shared by the other ctype functions and so are less likely to be surprising. 
-  * The result is consistent with ctype_islower() and ctype_isupper(). 
-  * It's easy to do, and maybe someone will want them. 
- 
-Some statements in the manual about what the ctype extension is for will have to be updated. 
- 
-For completeness, I have introduced a family of upper case functions to zend_operators.c by analogy with the lower case functions, most of which are currently not called. 
  
 ===== Alternatives considered ===== ===== Alternatives considered =====
Line 94: Line 75:
  
 It is not possible for strtolower() to raise a deprecation warning depending on its input, because there is no way to tell whether a given case transformation was intended by the caller. It is not possible for strtolower() to raise a deprecation warning depending on its input, because there is no way to tell whether a given case transformation was intended by the caller.
 +
 +I considered introducing ctype_tolower() and ctype_toupper(), which would do locale-sensitive case conversion like the old strtolower() and strtoupper(), but Nikita suggested that we may want to make the ctype extension generally be locale-independent, which would make these functions redundant.
  
 ===== Future Scope ===== ===== Future Scope =====
  
-I didn't include strnatcasecmp() and natcasesort() in this RFC, because they also use isdigit() and isspace(), and because they are intended for natural language processing. They could be migrated in future.+I didn't include strnatcasecmp() and natcasesort() in this RFC, because they also use isdigit() and isspace(). They could be migrated in future.
  
-There are about 50 direct callers of tolower() and toupper() which I haven't migrated. They are similar in flavor to the php_strtolower() callers.+There are about 50 direct callers of tolower() and toupper() which I haven't migrated.
  
-===== Proposed Voting Choices =====+===== Voting =====
  
-The introduction of ctype_tolower() and ctype_toupper() can be a separate vote, if they seem controversial during the discussion stage.+Voting period: 2021-11-25 to 2021-12-09.
  
 +<doodle title="Use locale-independent case conversion for string functions as proposed?" auth="tstarling" voteType="single" closed="true">
 +   * Yes
 +   * No
 +</doodle>
  
rfc/strtolower-ascii.txt · Last modified: 2021/12/10 21:52 by tstarling