rfc:trailing_whitespace_numerics

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:trailing_whitespace_numerics [2017/01/18 04:02] – formatting ajfrfc:trailing_whitespace_numerics [2020/07/23 21:50] (current) – mark as superseded ajf
Line 1: Line 1:
 ====== PHP RFC: Permit trailing whitespace in numeric strings ====== ====== PHP RFC: Permit trailing whitespace in numeric strings ======
   * Version: 1.0   * Version: 1.0
-  * Date: 2017-01-18 +  * Date: 2019-03-06 
-  * Author: Andrea Faulds, ajf@ajf.me +  * Author: Andrea Faulds, <ajf@ajf.me> 
-  * Status: Draft+  * Status: Superseded by George Peter Baynard's [[rfc:saner-numeric-strings|PHP RFC: Saner numeric strings]] (partly based on this RFC), with permission.
   * First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics   * First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics
  
-===== Introduction ===== +===== Technical Background ===== 
-PHP currently ignores whitespace at the start of a numeric string: <php>"  123"</php> and <php>"123"</php> are considered equivalent. However, it considers whitespace at the end of a numeric string to be “non-well-formed”: <php>+"123   "</php> produces an <php>E_NOTICE</php>-level errorand <php>\is_numeric("123   ")</php> returns <php>false</php>.+The PHP language has a concept of //numeric strings//, strings which can be interpreted as numbersThis concept is used in a few places:
  
-This can be unhelpful. One reason for this is because trailing whitespace occurs in similar situations to leading whitespace. For example, a user might copy and paste a number into a form fieldThe likelihood of unintentionally pasting trailing whitespace, in this case, is similar to pasting leading whitespaceand both are equally meaninglessPHP unhelpfully only complains about the latter: whether you want to reject unneeded whitespace, or ignore itPHP's behaviour only does half the job.+  * Explicit conversions of strings to number typese.g. <php>$= "123"; $b = (float)$a; // float(123)</php> 
 +  * Implicit conversions of strings to number types, e.g. <php>$a = "123"; $b = intdiv($a1); // int(123)</php> (if ''strict_types=1'' is not set) 
 +  * Comparisonse.g. <php>$a = "123"; $b = "123.0"; $c = ($a == $b); // bool(true)</php> 
 +  * The <php>is_numeric()</php> functione.g. <php>$a = "123"; $b = is_numeric($a); // bool(true)</php>
  
-Additionallythere are some scenarios specific to trailing whitespaceFor instance, when reading a number out of multi-line text, a string may contain line-ending characters. Currently, PHP would complain when this number is used.+A string can be categorised in three ways according to its numericnessas [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]:
  
-Moreoveraccepting leading whitespace yet rejecting trailing whitespace is inconsistent and surprising.+  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]]optionally preceded by whitespace characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>
 +  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including whitespace characters). For example, <php>"123abc"</php> or <php>"123 "</php>
 +  * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.
  
-===== Proposal ===== +The difference between a numeric string and a leading-numeric string is significantbecause certain operations distinguish between these:
-This RFC proposes to change PHP's behaviour, such that trailing whitespace is accepted in a numeric string, much like leading whitespace. This would make PHP's more consistentless surprising, and save time by avoiding the need to trim trailing whitespace from numeric strings.+
  
-For the PHP interpreterthis would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend EngineThis would therefore affect PHP features which make use of this functionincluding:+  * <php>is_numeric()</php> returns <php>TRUE</php> only for numeric strings 
 +  * Arithmetic operations (e.g. <php>$a * $b</php><php>$a + $b</php>) accept and implicitly convert both numeric and leading-numeric strings, but trigger the <php>E_NOTICE</php> “A non well formed numeric value encountered” for leading-numeric strings 
 +  * When ''strict_types=1'' is not set, <php>int</php> and <php>float</php> parameter and return type declarations will accept and implicitly convert both numeric and leading-numeric strings, but likewise trigger the same <php>E_NOTICE</php> 
 +  * Type casts and other explicit conversions to integer or float (e.g. <php>(int)</php><php>(float)</php>, <php>settype()</php>) accept all strings, converting both numeric and leading-numeric strings and producing 0 for non-numeric strings 
 +  * String-to-string comparisons with <php>==</php> etc perform numeric comparison if only both strings are numeric strings 
 +  * String-to-int/float comparisons with <php>==</php> etc type-juggle the string (and thus perform numeric comparison) if it is either a numeric string or a non-numeric string
  
-  * [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] will no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace +It is notable that while a numeric string may contain leading whitespace, only leading-numeric string may contain trailing whitespace.
-  * The <php>int</php> and <php>float</php> type declarations willin weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed numeric string with trailing whitespace +
-  * Type checks for built-in/extension (“internal”) PHP functions will, in weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace +
-  * The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123  " == 123</php> produces <php>true</php>, much like <php>"  123" == 123</php> does at present +
-  * The <php>\is_numeric</php> function will now accept numeric strings with trailing whitespace, and return <php>true</php>+
  
-The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified to be as follows (additions shown in bold):+===== The Problem =====
  
-<blockquote> +The current behaviour of treating strings with leading whitespace as more numeric than strings with trailing whitespace is inconsistent and has no obvious benefit. It is an unintuitive, surprising behaviour.
-''str-numeric::'' +
-''   str-whitespace''<sub>''opt''</sub>''   sign''<sub>''opt''</sub>''   str-number    ''**''str-whitespace''**<sub>**''opt''**</sub>'' +
-'' +
-</blockquote>+
  
-===== Backward Incompatible Changes ===== +The inconsistency itself can require more work from the programmer. If rejecting number strings from user input that contain whitespace is useful to your application — perhaps it must be passed on to a back-end system that cannot handle whitespace — you cannot rely on e.g. <php>is_numeric()</php> to make sure of this for you, it only rejects trailing whitespace; yet simultaneously, if accepting number strings from user input that contain whitespace is useful to your application — perhaps to tolerate accidentally copied-and-pasted spaces — you cannot rely on e.g. <php>$a + $b</php> to make sure of this for youit only accepts leading whitespace.
-<php>\is_numeric()</php> now returns <php>true</php> rather than <php>false</php> for numeric strings with trailing whitespace. The author does not expect this is likely to cause significant backwards-compatibility issuesbecause only trailing whitespace and not not leading whitespace being invalid is uncommon. Additionally, the new behaviour may be the one intended.+
  
-===== Proposed PHP Version(s) ===== +Beyond the inconsistency, the current rejection of trailing whitespace is annoying for programs reading data from files or similar whitespace-separated data streams: 
-This is proposed for the next PHP 7.x. At the time of writingthat would be PHP 7.2.+ 
 +<code php> 
 +<?php 
 + 
 +$total 0; 
 +foreach (file("numbers.txt") as $number) { 
 +    $total +$number; // Currently produces “Notice: A non well formed numeric value encountered” on every iteration, because $number ends in "\n" 
 +
 +?> 
 +</code> 
 + 
 +Finally, the current behaviour makes [[rfc:string_to_number_comparison|potential simplifications to numeric string handling]] less palatable if they make leading-numeric strings be tolerated in less places, because of a perception that a lot of existing code may rely on the tolerance of trailing whitespace. 
 + 
 +===== Proposal ===== 
 + 
 +For the next PHP 7.x (currently PHP 7.4), this RFC proposes that trailing whitespace be accepted in numeric strings just as leading whitespace is. 
 + 
 +For the PHP interpreter, this would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this functionincluding: 
 + 
 +  * [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] would no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace 
 +  * The <php>int</php> and <php>float</php> type declarations would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace 
 +  * Type checks for built-in/extension (“internal”) PHP functions would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace 
 +  * The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123  " == "  123"</php> produces <php>true</php>, instead of <php>false</php> 
 +  * The <php>\is_numeric</php> function would return <php>true</php> for numeric strings with trailing whitespace 
 +  * The <php>++</php> and <php>--</php> operators woukd convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules 
 + 
 +The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified by the addition of ''str-whitespace''<sub>''opt''</sub> after ''str-number''
 + 
 +This change would be almost completely backwards-compatible, as no string that was previously accepted would now be rejected. However, if an application relies on trailing whitespace not being considered well-formed, it would need updating.
  
 ===== RFC Impact ===== ===== RFC Impact =====
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
-Any extension using ''is_numeric_string'', its variants, and other functions which themselves use it, on will be affected.+Any extension using ''is_numeric_string'', its variants, or other functions which themselves use it, will be affected.
  
 ==== To Opcache ==== ==== To Opcache ====
-FIXME: I have not yet verified the RFC's compatibility with opcache.+In the patch, all tests pass with Opcache enabled. I am not aware of any issues arising here.
  
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
Line 51: Line 79:
  
 ===== Future Scope ===== ===== Future Scope =====
-None conceivable.+If adopted, this would make Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]] look more reasonable. 
 + 
 +I would also plan a second RFC in a similar vein to Nikita's, which would simplify things by removing the concept of leading-numeric strings: strings are either numeric and accepted, or non-numeric and not accepted.
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-This is a language changeand requires a 2/3 majority. The vote is a two-choice Yes/No vote on whether to accept the RFC and apply its changes to the next applicable version of PHP.+Per the Voting RFCthere would be a single Yes/No vote requiring a 2/3 majority.
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
 A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317 A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317
  
-FIXME: There is not yet a language specification patch.+I do not yet have a language specification patch.
  
 ===== Implementation ===== ===== Implementation =====
Line 67: Line 97:
   - a link to the PHP manual entry for the feature   - a link to the PHP manual entry for the feature
   - a link to the language specification section (if any)   - a link to the language specification section (if any)
 +
 +===== Changelog =====
 +
 +  * 2020-06-24: Take-over by George Peter Banyard with the consent of Andrea Faulds
 +  * 2019-03-06, v1.0: First non-draft version, dropped the second proposal from the RFC for now, I can make that as a follow-up RFC
 +  * 2019-02-07 (draft): Added proposal to remove “non-well-formed” numeric strings at the suggestion of Nikita Popov, renamed to “Revise trailing character handling for numeric strings”
 +  * 2017-01-18 (draft): First draft as “Permit trailing whitespace in numeric strings”
rfc/trailing_whitespace_numerics.txt · Last modified: 2020/07/23 21:50 by ajf