Both sides previous revisionPrevious revisionNext revision | Previous revision |
rfc:trailing_whitespace_numerics [2017/01/18 03:59] – ajf | rfc:trailing_whitespace_numerics [2020/07/23 21:50] (current) – mark as superseded ajf |
---|
====== PHP RFC: Permit trailing whitespace in numeric strings ====== | ====== PHP RFC: Permit trailing whitespace in numeric strings ====== |
* Version: 1.0 | * Version: 1.0 |
* Date: 2017-01-18 | * Date: 2019-03-06 |
* Author: Andrea Faulds, ajf@ajf.me | * Author: Andrea Faulds, <ajf@ajf.me> |
* Status: Draft | * Status: Superseded by George Peter Baynard's [[rfc:saner-numeric-strings|PHP RFC: Saner numeric strings]] (partly based on this RFC), with permission. |
* First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics | * First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics |
| |
===== Introduction ===== | ===== Technical Background ===== |
PHP currently ignores whitespace at the start of a numeric string: <php>" 123"</php> and <php>"123"</php> are considered equivalent. However, it considers whitespace at the end of a numeric string to be “non-well-formed”: <php>+"123 "</php> produces an <php>E_NOTICE</php>-level error, and <php>\is_numeric("123 ")</php> returns <php>false</php>. | The PHP language has a concept of //numeric strings//, strings which can be interpreted as numbers. This concept is used in a few places: |
| |
This can be unhelpful. One reason for this is because trailing whitespace occurs in similar situations to leading whitespace. For example, a user might copy and paste a number into a form field. The likelihood of unintentionally pasting trailing whitespace, in this case, is similar to pasting leading whitespace, and both are equally meaningless. PHP unhelpfully only complains about the latter: whether you want to reject unneeded whitespace, or ignore it, PHP's behaviour only does half the job. | * Explicit conversions of strings to number types, e.g. <php>$a = "123"; $b = (float)$a; // float(123)</php> |
| * Implicit conversions of strings to number types, e.g. <php>$a = "123"; $b = intdiv($a, 1); // int(123)</php> (if ''strict_types=1'' is not set) |
| * Comparisons, e.g. <php>$a = "123"; $b = "123.0"; $c = ($a == $b); // bool(true)</php> |
| * The <php>is_numeric()</php> function, e.g. <php>$a = "123"; $b = is_numeric($a); // bool(true)</php> |
| |
Additionally, there are some scenarios specific to trailing whitespace. For instance, when reading a number out of multi-line text, a string may contain line-ending characters. Currently, PHP would complain when this number is used. | A string can be categorised in three ways according to its numericness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]: |
| |
Moreover, accepting leading whitespace yet rejecting trailing whitespace is inconsistent and surprising. | * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by whitespace characters. For example, <php>"123"</php> or <php>" 1.23e2"</php>. |
| * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters (including whitespace characters). For example, <php>"123abc"</php> or <php>"123 "</php>. |
| * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string. |
| |
===== Proposal ===== | The difference between a numeric string and a leading-numeric string is significant, because certain operations distinguish between these: |
This RFC proposes to change PHP's behaviour, such that trailing whitespace is accepted in a numeric string, much like leading whitespace. This would make PHP's more consistent, less surprising, and save time by avoiding the need to trim trailing whitespace from numeric strings. | |
| |
For the PHP interpreter, this would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this function, including: | * <php>is_numeric()</php> returns <php>TRUE</php> only for numeric strings |
| * Arithmetic operations (e.g. <php>$a * $b</php>, <php>$a + $b</php>) accept and implicitly convert both numeric and leading-numeric strings, but trigger the <php>E_NOTICE</php> “A non well formed numeric value encountered” for leading-numeric strings |
| * When ''strict_types=1'' is not set, <php>int</php> and <php>float</php> parameter and return type declarations will accept and implicitly convert both numeric and leading-numeric strings, but likewise trigger the same <php>E_NOTICE</php> |
| * Type casts and other explicit conversions to integer or float (e.g. <php>(int)</php>, <php>(float)</php>, <php>settype()</php>) accept all strings, converting both numeric and leading-numeric strings and producing 0 for non-numeric strings |
| * String-to-string comparisons with <php>==</php> etc perform numeric comparison if only both strings are numeric strings |
| * String-to-int/float comparisons with <php>==</php> etc type-juggle the string (and thus perform numeric comparison) if it is either a numeric string or a non-numeric string |
| |
* [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] will no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace | It is notable that while a numeric string may contain leading whitespace, only a leading-numeric string may contain trailing whitespace. |
* The <php>int</php> and <php>float</php> type declarations will, in weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace | |
* Type checks for built-in/extension (“internal”) PHP functions will, in weak typing mode, no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace | |
* The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123 " == 123</php> produces <php>true</php>, much like <php>" 123" == 123</php> does at present | |
* The <php>\is_numeric</php> function will now accept numeric strings with trailing whitespace, and return <php>true</php> | |
| |
The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified to be as follows (additions shown in bold): | ===== The Problem ===== |
| |
<blockquote> | The current behaviour of treating strings with leading whitespace as more numeric than strings with trailing whitespace is inconsistent and has no obvious benefit. It is an unintuitive, surprising behaviour. |
''str-numeric::'' | |
'' str-whitespace''<sub>''opt''</sub>'' sign''<sub>''opt''</sub>'' str-number ''**''str-whitespace''**<sub>**''opt''**</sub>'' | |
'' | |
</blockquote> | |
| |
===== Backward Incompatible Changes ===== | The inconsistency itself can require more work from the programmer. If rejecting number strings from user input that contain whitespace is useful to your application — perhaps it must be passed on to a back-end system that cannot handle whitespace — you cannot rely on e.g. <php>is_numeric()</php> to make sure of this for you, it only rejects trailing whitespace; yet simultaneously, if accepting number strings from user input that contain whitespace is useful to your application — perhaps to tolerate accidentally copied-and-pasted spaces — you cannot rely on e.g. <php>$a + $b</php> to make sure of this for you, it only accepts leading whitespace. |
<php>\is_numeric()</php> now returns <php>true</php> rather than <php>false</php> for numeric strings with trailing whitespace. The author does not expect this is likely to cause significant backwards-compatibility issues, because only trailing whitespace and not not leading whitespace being invalid is uncommon. Additionally, the new behaviour may be the one intended. | |
| |
===== Proposed PHP Version(s) ===== | Beyond the inconsistency, the current rejection of trailing whitespace is annoying for programs reading data from files or similar whitespace-separated data streams: |
This is proposed for the next PHP 7.x. At the time of writing, that would be PHP 7.2. | |
| <code php> |
| <?php |
| |
| $total = 0; |
| foreach (file("numbers.txt") as $number) { |
| $total += $number; // Currently produces “Notice: A non well formed numeric value encountered” on every iteration, because $number ends in "\n" |
| } |
| ?> |
| </code> |
| |
| Finally, the current behaviour makes [[rfc:string_to_number_comparison|potential simplifications to numeric string handling]] less palatable if they make leading-numeric strings be tolerated in less places, because of a perception that a lot of existing code may rely on the tolerance of trailing whitespace. |
| |
| ===== Proposal ===== |
| |
| For the next PHP 7.x (currently PHP 7.4), this RFC proposes that trailing whitespace be accepted in numeric strings just as leading whitespace is. |
| |
| For the PHP interpreter, this would be accomplished by modifying the ''is_numeric_string'' C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this function, including: |
| |
| * [[rfc:invalid_strings_in_arithmetic|Arithmetic operators]] would no longer produce an <php>E_NOTICE</php>-level error when used with a numeric string with trailing whitespace |
| * The <php>int</php> and <php>float</php> type declarations would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace |
| * Type checks for built-in/extension (“internal”) PHP functions would no longer produce an <php>E_NOTICE</php>-level error when passed a numeric string with trailing whitespace |
| * The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example, <php>"123 " == " 123"</php> produces <php>true</php>, instead of <php>false</php> |
| * The <php>\is_numeric</php> function would return <php>true</php> for numeric strings with trailing whitespace |
| * The <php>++</php> and <php>--</php> operators woukd convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules |
| |
| The PHP language specification's [[https://github.com/php/php-langspec/blob/master/spec/05-types.md#the-string-type|definition of str-numeric]] would be modified by the addition of ''str-whitespace''<sub>''opt''</sub> after ''str-number''. |
| |
| This change would be almost completely backwards-compatible, as no string that was previously accepted would now be rejected. However, if an application relies on trailing whitespace not being considered well-formed, it would need updating. |
| |
===== RFC Impact ===== | ===== RFC Impact ===== |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== |
Any extension using ''is_numeric_string'', its variants, and other functions which themselves use it, on will be affected. | Any extension using ''is_numeric_string'', its variants, or other functions which themselves use it, will be affected. |
| |
==== To Opcache ==== | ==== To Opcache ==== |
FIXME: I have not yet verified the RFC's compatibility with opcache. | In the patch, all tests pass with Opcache enabled. I am not aware of any issues arising here. |
| |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== |
| |
===== Future Scope ===== | ===== Future Scope ===== |
None conceivable. | If adopted, this would make Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]] look more reasonable. |
| |
| I would also plan a second RFC in a similar vein to Nikita's, which would simplify things by removing the concept of leading-numeric strings: strings are either numeric and accepted, or non-numeric and not accepted. |
| |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== |
This is a language change, and requires a 2/3 majority. The vote is a two-choice Yes/No vote on whether to accept the RFC and apply its changes to the next applicable version of PHP. | Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority. |
| |
===== Patches and Tests ===== | ===== Patches and Tests ===== |
A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317 | A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317 |
| |
FIXME: There is not yet a language specification patch. | I do not yet have a language specification patch. |
| |
===== Implementation ===== | ===== Implementation ===== |
- a link to the PHP manual entry for the feature | - a link to the PHP manual entry for the feature |
- a link to the language specification section (if any) | - a link to the language specification section (if any) |
| |
| ===== Changelog ===== |
| |
| * 2020-06-24: Take-over by George Peter Banyard with the consent of Andrea Faulds |
| * 2019-03-06, v1.0: First non-draft version, dropped the second proposal from the RFC for now, I can make that as a follow-up RFC |
| * 2019-02-07 (draft): Added proposal to remove “non-well-formed” numeric strings at the suggestion of Nikita Popov, renamed to “Revise trailing character handling for numeric strings” |
| * 2017-01-18 (draft): First draft as “Permit trailing whitespace in numeric strings” |