rfc:trailing_whitespace_numerics

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
rfc:trailing_whitespace_numerics [2017/01/18 03:59] – drafted ajfrfc:trailing_whitespace_numerics [2020/07/23 21:50] (current) – mark as superseded ajf
Line 1: Line 1:
 ====== PHP RFC: Permit trailing whitespace in numeric strings ====== ====== PHP RFC: Permit trailing whitespace in numeric strings ======
   * Version: 1.0   * Version: 1.0
-  * Date: 2017-01-18 +  * Date: 2019-03-06 
-  * Author: Andrea Faulds, ajf@ajf.me +  * Author: Andrea Faulds, <ajf@ajf.me> 
-  * Status: Draft+  * Status: Superseded by George Peter Baynard's [[rfc:saner-numeric-strings|PHP RFC: Saner numeric strings]] (partly based on this RFC), with permission.
   * First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics   * First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics
  
-This is suggested template for PHP Request for Comments (RFCs). Change this template to suit your RFC.  Not all RFCs need to be tightly specified.  Not all RFCs need all the sections below. +===== Technical Background ===== 
-Read https://wiki.php.net/rfc/howto carefully!+The PHP language has concept of //numeric strings//, strings which can be interpreted as numbers. This concept is used in a few places:
  
-===== Introduction ===== +  * Explicit conversions of strings to number types, e.g. <php>$a = "123"; $b = (float)$a; // float(123)</php> 
-PHP currently ignores whitespace at the start of a numeric string: <php>"  123"</php> and <php>"123"</php> are considered equivalent. Howeverit considers whitespace at the end of a numeric string to be “non-well-formed”: <php>+"123   "</php> produces an <php>E_NOTICE</php>-level errorand <php>\is_numeric("123   ")</php> returns <php>false</php>.+  * Implicit conversions of strings to number types, e.g. <php>$a = "123"; $b = intdiv($a, 1); // int(123)</php> (if ''strict_types=1'' is not set) 
 +  * Comparisonse.g. <php>$a = "123"; $b = "123.0"; $c = ($a == $b); // bool(true)</php> 
 +  * The <php>is_numeric()</php> functione.g. <php>$a = "123"; $b = is_numeric($a)// bool(true)</php>
  
-This can be unhelpful. One reason for this is because trailing whitespace occurs in similar situations to leading whitespace. For examplea user might copy and paste a number into a form fieldThe likelihood of unintentionally pasting trailing whitespace, in this case, is similar to pasting leading whitespace, and both are equally meaninglessPHP unhelpfully only complains about the latterwhether you want to reject unneeded whitespace, or ignore it, PHP's behaviour only does half the job.+A string can be categorised in three ways according to its numericnessas [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]:
  
-Additionallythere are some scenarios specific to trailing whitespace. For instancewhen reading a number out of multi-line text, a string may contain line-ending characters. CurrentlyPHP would complain when this number is used.+  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]]optionally preceded by whitespace characters. For example<php>"123"</php> or <php>"  1.23e2"</php>
 +  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including whitespace characters)For example<php>"123abc"</php> or <php>"123 "</php>
 +  * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.
  
-Moreover, accepting leading whitespace yet rejecting trailing whitespace is inconsistent and surprising.+The difference between a numeric string and a leading-numeric string is significant, because certain operations distinguish between these:
  
-===== Proposal ===== +  * <php>is_numeric()</php> returns <php>TRUE</php> only for numeric strings 
-This RFC proposes to change PHP's behavioursuch that trailing whitespace is accepted in a numeric stringmuch like leading whitespace. This would make PHP's more consistent, less surprising, and save time by avoiding the need to trim trailing whitespace from numeric strings.+  * Arithmetic operations (e.g. <php>$a * $b</php><php>$+ $b</php>) accept and implicitly convert both numeric and leading-numeric stringsbut trigger the <php>E_NOTICE</php> “A non well formed numeric value encountered” for leading-numeric strings 
 +  * When ''strict_types=1'' is not set<php>int</php> and <php>float</php> parameter and return type declarations will accept and implicitly convert both numeric and leading-numeric strings, but likewise trigger the same <php>E_NOTICE</php> 
 +  * Type casts and other explicit conversions to integer or float (e.g. <php>(int)</php>, <php>(float)</php>, <php>settype()</php>) accept all strings, converting both numeric and leading-numeric strings and producing 0 for non-numeric strings 
 +  * String-to-string comparisons with <php>==</php> etc perform numeric comparison if only both strings are numeric strings 
 +  * String-to-int/float comparisons with <php>==</php> etc type-juggle the string (and thus perform numeric comparison) if it is either a numeric string or a non-numeric string
  
-For the PHP interpreterthis would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend EngineThis would therefore affect PHP features which make use of this function, including:+It is notable that while a numeric string may contain leading whitespaceonly a leading-numeric string may contain trailing whitespace.
  
-* [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] will no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace +====The Problem =====
-* The <php>int</php> and <php>float</php> type declarations will, in weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace +
-* Type checks for built-in/extension (“internal”) PHP functions will, in weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace +
-* The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123 == 123</php> produces <php>true</php>, much like <php>"  123" == 123</php> does at present +
-The <php>\is_numeric</php> function will now accept numeric strings with trailing whitespace, and return <php>true</php>+
  
-The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified to be as follows (additions shown in bold):+The current behaviour of treating strings with leading whitespace as more numeric than strings with trailing whitespace is inconsistent and has no obvious benefit. It is an unintuitive, surprising behaviour.
  
-<blockquote> +The inconsistency itself can require more work from the programmer. If rejecting number strings from user input that contain whitespace is useful to your application — perhaps it must be passed on to a back-end system that cannot handle whitespace — you cannot rely on e.g. <php>is_numeric()</phpto make sure of this for you, it only rejects trailing whitespace; yet simultaneously, if accepting number strings from user input that contain whitespace is useful to your application — perhaps to tolerate accidentally copied-and-pasted spaces — you cannot rely on e.g. <php>$a + $b</phpto make sure of this for you, it only accepts leading whitespace.
-''str-numeric::'' +
-''   str-whitespace''<sub>''opt''</sub>''   sign''<sub>''opt''</sub>''   str-number    ''**''str-whitespace''**<sub>**''opt''**</sub>'' +
-'' +
-</blockquote>+
  
-===== Backward Incompatible Changes ===== +Beyond the inconsistency, the current rejection of trailing whitespace is annoying for programs reading data from files or similar whitespace-separated data streams:
-<php>\is_numeric()</php> now returns <php>true</php> rather than <php>false</php> for numeric strings with trailing whitespace. The author does not expect this is likely to cause significant backwards-compatibility issues, because only trailing whitespace and not not leading whitespace being invalid is uncommon. Additionally, the new behaviour may be the one intended.+
  
-===== Proposed PHP Version(s) ===== +<code php> 
-This is proposed for the next PHP 7.x. At the time of writingthat would be PHP 7.2.+<?php 
 + 
 +$total 0; 
 +foreach (file("numbers.txt") as $number) { 
 +    $total +$number; // Currently produces “Notice: A non well formed numeric value encountered” on every iteration, because $number ends in "\n" 
 +
 +?> 
 +</code> 
 + 
 +Finally, the current behaviour makes [[rfc:string_to_number_comparison|potential simplifications to numeric string handling]] less palatable if they make leading-numeric strings be tolerated in less places, because of a perception that a lot of existing code may rely on the tolerance of trailing whitespace. 
 + 
 +===== Proposal ===== 
 + 
 +For the next PHP 7.x (currently PHP 7.4), this RFC proposes that trailing whitespace be accepted in numeric strings just as leading whitespace is. 
 + 
 +For the PHP interpreter, this would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this functionincluding: 
 + 
 +  * [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] would no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace 
 +  * The <php>int</php> and <php>float</php> type declarations would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace 
 +  * Type checks for built-in/extension (“internal”) PHP functions would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace 
 +  * The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123  " == "  123"</php> produces <php>true</php>, instead of <php>false</php> 
 +  * The <php>\is_numeric</php> function would return <php>true</php> for numeric strings with trailing whitespace 
 +  * The <php>++</php> and <php>--</php> operators woukd convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules 
 + 
 +The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified by the addition of ''str-whitespace''<sub>''opt''</sub> after ''str-number''
 + 
 +This change would be almost completely backwards-compatible, as no string that was previously accepted would now be rejected. However, if an application relies on trailing whitespace not being considered well-formed, it would need updating.
  
 ===== RFC Impact ===== ===== RFC Impact =====
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
-Any extension using ''is_numeric_string'', its variants, and other functions which themselves use it, on will be affected.+Any extension using ''is_numeric_string'', its variants, or other functions which themselves use it, will be affected.
  
 ==== To Opcache ==== ==== To Opcache ====
-FIXME: I have not yet verified the RFC's compatibility with opcache.+In the patch, all tests pass with Opcache enabled. I am not aware of any issues arising here.
  
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
Line 54: Line 79:
  
 ===== Future Scope ===== ===== Future Scope =====
-None conceivable.+If adopted, this would make Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]] look more reasonable. 
 + 
 +I would also plan a second RFC in a similar vein to Nikita's, which would simplify things by removing the concept of leading-numeric strings: strings are either numeric and accepted, or non-numeric and not accepted.
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-This is a language changeand requires a 2/3 majority. The vote is a two-choice Yes/No vote on whether to accept the RFC and apply its changes to the next applicable version of PHP.+Per the Voting RFCthere would be a single Yes/No vote requiring a 2/3 majority.
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
 A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317 A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317
  
-FIXME: There is not yet a language specification patch.+I do not yet have a language specification patch.
  
 ===== Implementation ===== ===== Implementation =====
Line 70: Line 97:
   - a link to the PHP manual entry for the feature   - a link to the PHP manual entry for the feature
   - a link to the language specification section (if any)   - a link to the language specification section (if any)
 +
 +===== Changelog =====
 +
 +  * 2020-06-24: Take-over by George Peter Banyard with the consent of Andrea Faulds
 +  * 2019-03-06, v1.0: First non-draft version, dropped the second proposal from the RFC for now, I can make that as a follow-up RFC
 +  * 2019-02-07 (draft): Added proposal to remove “non-well-formed” numeric strings at the suggestion of Nikita Popov, renamed to “Revise trailing character handling for numeric strings”
 +  * 2017-01-18 (draft): First draft as “Permit trailing whitespace in numeric strings”
rfc/trailing_whitespace_numerics.txt · Last modified: 2020/07/23 21:50 by ajf