rfc:trailing_whitespace_numerics

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
rfc:trailing_whitespace_numerics [2017/01/18 03:59] ajfrfc:trailing_whitespace_numerics [2020/06/24 13:34] – Update implementation link for one based onto master/PHP 8.0 girgias
Line 1: Line 1:
 ====== PHP RFC: Permit trailing whitespace in numeric strings ====== ====== PHP RFC: Permit trailing whitespace in numeric strings ======
   * Version: 1.0   * Version: 1.0
-  * Date: 2017-01-18 +  * Date: 2019-03-06 
-  * Author: Andrea Faulds, ajf@ajf.me +  * Author: Andrea Faulds, <ajf@ajf.me>, George Peter Banyard <girgias@php.net> 
-  * Status: Draft+  * Status: Under Discussion
   * First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics   * First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics
 +  * Implementation: https://github.com/php/php-src/pull/5762
  
-===== Introduction ===== +===== Technical Background ===== 
-PHP currently ignores whitespace at the start of a numeric string: <php>"  123"</php> and <php>"123"</php> are considered equivalent. However, it considers whitespace at the end of a numeric string to be “non-well-formed”: <php>+"123   "</php> produces an <php>E_NOTICE</php>-level errorand <php>\is_numeric("123   ")</php> returns <php>false</php>.+The PHP language has a concept of //numeric strings//, strings which can be interpreted as numbersThis concept is used in a few places:
  
-This can be unhelpful. One reason for this is because trailing whitespace occurs in similar situations to leading whitespace. For example, a user might copy and paste a number into a form fieldThe likelihood of unintentionally pasting trailing whitespace, in this case, is similar to pasting leading whitespaceand both are equally meaninglessPHP unhelpfully only complains about the latter: whether you want to reject unneeded whitespace, or ignore itPHP's behaviour only does half the job.+  * Explicit conversions of strings to number typese.g. <php>$= "123"; $b = (float)$a; // float(123)</php> 
 +  * Implicit conversions of strings to number types, e.g. <php>$a = "123"; $b = intdiv($a1); // int(123)</php> (if ''strict_types=1'' is not set) 
 +  * Comparisonse.g. <php>$a = "123"; $b = "123.0"; $c = ($a == $b); // bool(true)</php> 
 +  * The <php>is_numeric()</php> functione.g. <php>$a = "123"; $b = is_numeric($a); // bool(true)</php>
  
-Additionallythere are some scenarios specific to trailing whitespaceFor instance, when reading a number out of multi-line text, a string may contain line-ending characters. Currently, PHP would complain when this number is used.+A string can be categorised in three ways according to its numericnessas [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]:
  
-Moreoveraccepting leading whitespace yet rejecting trailing whitespace is inconsistent and surprising.+  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]]optionally preceded by whitespace characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>
 +  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including whitespace characters). For example, <php>"123abc"</php> or <php>"123 "</php>
 +  * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.
  
-===== Proposal ===== +The difference between a numeric string and a leading-numeric string is significantbecause certain operations distinguish between these:
-This RFC proposes to change PHP's behaviour, such that trailing whitespace is accepted in a numeric string, much like leading whitespace. This would make PHP's more consistentless surprising, and save time by avoiding the need to trim trailing whitespace from numeric strings.+
  
-For the PHP interpreterthis would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend EngineThis would therefore affect PHP features which make use of this functionincluding:+  * <php>is_numeric()</php> returns <php>TRUE</php> only for numeric strings 
 +  * Arithmetic operations (e.g. <php>$a * $b</php><php>$a + $b</php>) accept and implicitly convert both numeric and leading-numeric strings, but trigger the <php>E_NOTICE</php> “A non well formed numeric value encountered” for leading-numeric strings 
 +  * When ''strict_types=1'' is not set, <php>int</php> and <php>float</php> parameter and return type declarations will accept and implicitly convert both numeric and leading-numeric strings, but likewise trigger the same <php>E_NOTICE</php> 
 +  * Type casts and other explicit conversions to integer or float (e.g. <php>(int)</php><php>(float)</php>, <php>settype()</php>) accept all strings, converting both numeric and leading-numeric strings and producing 0 for non-numeric strings 
 +  * String-to-string comparisons with <php>==</php> etc perform numeric comparison if only both strings are numeric strings 
 +  * String-to-int/float comparisons with <php>==</php> etc type-juggle the string (and thus perform numeric comparison) if it is either a numeric string or a non-numeric string
  
-* [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] will no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace +It is notable that while a numeric string may contain leading whitespace, only leading-numeric string may contain trailing whitespace.
-* The <php>int</php> and <php>float</php> type declarations willin weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed numeric string with trailing whitespace +
-* Type checks for built-in/extension (“internal”) PHP functions will, in weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace +
-* The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123  " == 123</php> produces <php>true</php>, much like <php>"  123" == 123</php> does at present +
-* The <php>\is_numeric</php> function will now accept numeric strings with trailing whitespace, and return <php>true</php>+
  
-The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified to be as follows (additions shown in bold):+===== The Problem =====
  
-<blockquote> +The current behaviour of treating strings with leading whitespace as more numeric than strings with trailing whitespace is inconsistent and has no obvious benefit. It is an unintuitive, surprising behaviour.
-''str-numeric::'' +
-''   str-whitespace''<sub>''opt''</sub>''   sign''<sub>''opt''</sub>''   str-number    ''**''str-whitespace''**<sub>**''opt''**</sub>'' +
-'' +
-</blockquote>+
  
-===== Backward Incompatible Changes ===== +The inconsistency itself can require more work from the programmer. If rejecting number strings from user input that contain whitespace is useful to your application — perhaps it must be passed on to a back-end system that cannot handle whitespace — you cannot rely on e.g. <php>is_numeric()</php> to make sure of this for you, it only rejects trailing whitespace; yet simultaneously, if accepting number strings from user input that contain whitespace is useful to your application — perhaps to tolerate accidentally copied-and-pasted spaces — you cannot rely on e.g. <php>$a + $b</php> to make sure of this for youit only accepts leading whitespace.
-<php>\is_numeric()</php> now returns <php>true</php> rather than <php>false</php> for numeric strings with trailing whitespace. The author does not expect this is likely to cause significant backwards-compatibility issuesbecause only trailing whitespace and not not leading whitespace being invalid is uncommon. Additionally, the new behaviour may be the one intended.+
  
-===== Proposed PHP Version(s) ===== +Beyond the inconsistency, the current rejection of trailing whitespace is annoying for programs reading data from files or similar whitespace-separated data streams: 
-This is proposed for the next PHP 7.x. At the time of writingthat would be PHP 7.2.+ 
 +<code php> 
 +<?php 
 + 
 +$total 0; 
 +foreach (file("numbers.txt") as $number) { 
 +    $total +$number; // Currently produces “Notice: A non well formed numeric value encountered” on every iteration, because $number ends in "\n" 
 +
 +?> 
 +</code> 
 + 
 +Finally, the current behaviour makes [[rfc:string_to_number_comparison|potential simplifications to numeric string handling]] less palatable if they make leading-numeric strings be tolerated in less places, because of a perception that a lot of existing code may rely on the tolerance of trailing whitespace. 
 + 
 +===== Proposal ===== 
 + 
 +For the next PHP 7.x (currently PHP 7.4), this RFC proposes that trailing whitespace be accepted in numeric strings just as leading whitespace is. 
 + 
 +For the PHP interpreter, this would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this functionincluding: 
 + 
 +  * [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] would no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace 
 +  * The <php>int</php> and <php>float</php> type declarations would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace 
 +  * Type checks for built-in/extension (“internal”) PHP functions would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace 
 +  * The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123  " == "  123"</php> produces <php>true</php>, instead of <php>false</php> 
 +  * The <php>\is_numeric</php> function would return <php>true</php> for numeric strings with trailing whitespace 
 +  * The <php>++</php> and <php>--</php> operators woukd convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules 
 + 
 +The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified by the addition of ''str-whitespace''<sub>''opt''</sub> after ''str-number''
 + 
 +This change would be almost completely backwards-compatible, as no string that was previously accepted would now be rejected. However, if an application relies on trailing whitespace not being considered well-formed, it would need updating.
  
 ===== RFC Impact ===== ===== RFC Impact =====
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
-Any extension using ''is_numeric_string'', its variants, and other functions which themselves use it, on will be affected.+Any extension using ''is_numeric_string'', its variants, or other functions which themselves use it, will be affected.
  
 ==== To Opcache ==== ==== To Opcache ====
-FIXME: I have not yet verified the RFC's compatibility with opcache.+In the patch, all tests pass with Opcache enabled. I am not aware of any issues arising here.
  
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
Line 51: Line 80:
  
 ===== Future Scope ===== ===== Future Scope =====
-None conceivable.+If adopted, this would make Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]] look more reasonable. 
 + 
 +I would also plan a second RFC in a similar vein to Nikita's, which would simplify things by removing the concept of leading-numeric strings: strings are either numeric and accepted, or non-numeric and not accepted.
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-This is a language changeand requires a 2/3 majority. The vote is a two-choice Yes/No vote on whether to accept the RFC and apply its changes to the next applicable version of PHP.+Per the Voting RFCthere would be a single Yes/No vote requiring a 2/3 majority.
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
-A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317+A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/5762
  
-FIXME: There is not yet a language specification patch.+I do not yet have a language specification patch.
  
 ===== Implementation ===== ===== Implementation =====
Line 67: Line 98:
   - a link to the PHP manual entry for the feature   - a link to the PHP manual entry for the feature
   - a link to the language specification section (if any)   - a link to the language specification section (if any)
 +
 +===== Changelog =====
 +
 +  * 2020-06-24: Take-over by George Peter Banyard with the consent of Andrea Faulds
 +  * 2019-03-06, v1.0: First non-draft version, dropped the second proposal from the RFC for now, I can make that as a follow-up RFC
 +  * 2019-02-07 (draft): Added proposal to remove “non-well-formed” numeric strings at the suggestion of Nikita Popov, renamed to “Revise trailing character handling for numeric strings”
 +  * 2017-01-18 (draft): First draft as “Permit trailing whitespace in numeric strings”
rfc/trailing_whitespace_numerics.txt · Last modified: 2020/07/23 21:50 by ajf