PHP RFC: Permit trailing whitespace in numeric strings
- Version: 1.0
- Date: 2019-03-06
- Author: Andrea Faulds, ajf@ajf.me
- Status: Superseded by George Peter Baynard's PHP RFC: Saner numeric strings (partly based on this RFC), with permission.
- First Published at: http://wiki.php.net/rfc/trailing_whitespace_numerics
Technical Background
The PHP language has a concept of numeric strings, strings which can be interpreted as numbers. This concept is used in a few places:
- Explicit conversions of strings to number types, e.g.
$a = "123"; $b = (float)$a; // float(123)
- Implicit conversions of strings to number types, e.g.
$a = "123"; $b = intdiv($a, 1); // int(123)
(ifstrict_types=1
is not set) - Comparisons, e.g.
$a = "123"; $b = "123.0"; $c = ($a == $b); // bool(true)
- The
is_numeric()
function, e.g.$a = "123"; $b = is_numeric($a); // bool(true)
A string can be categorised in three ways according to its numericness, as described by the language specification:
- A numeric string is a string containing only a number, optionally preceded by whitespace characters. For example,
"123"
or" 1.23e2"
. - A leading-numeric string is a string that begins with a numeric string but is followed by non-number characters (including whitespace characters). For example,
"123abc"
or"123 "
. - A non-numeric string is a string which is neither a numeric string nor a leading-numeric string.
The difference between a numeric string and a leading-numeric string is significant, because certain operations distinguish between these:
is_numeric()
returnsTRUE
only for numeric strings- Arithmetic operations (e.g.
$a * $b
,$a + $b
) accept and implicitly convert both numeric and leading-numeric strings, but trigger theE_NOTICE
“A non well formed numeric value encountered” for leading-numeric strings - When
strict_types=1
is not set,int
andfloat
parameter and return type declarations will accept and implicitly convert both numeric and leading-numeric strings, but likewise trigger the sameE_NOTICE
- Type casts and other explicit conversions to integer or float (e.g.
(int)
,(float)
,settype()
) accept all strings, converting both numeric and leading-numeric strings and producing 0 for non-numeric strings - String-to-string comparisons with
==
etc perform numeric comparison if only both strings are numeric strings - String-to-int/float comparisons with
==
etc type-juggle the string (and thus perform numeric comparison) if it is either a numeric string or a non-numeric string
It is notable that while a numeric string may contain leading whitespace, only a leading-numeric string may contain trailing whitespace.
The Problem
The current behaviour of treating strings with leading whitespace as more numeric than strings with trailing whitespace is inconsistent and has no obvious benefit. It is an unintuitive, surprising behaviour.
The inconsistency itself can require more work from the programmer. If rejecting number strings from user input that contain whitespace is useful to your application — perhaps it must be passed on to a back-end system that cannot handle whitespace — you cannot rely on e.g. is_numeric()
to make sure of this for you, it only rejects trailing whitespace; yet simultaneously, if accepting number strings from user input that contain whitespace is useful to your application — perhaps to tolerate accidentally copied-and-pasted spaces — you cannot rely on e.g. $a + $b
to make sure of this for you, it only accepts leading whitespace.
Beyond the inconsistency, the current rejection of trailing whitespace is annoying for programs reading data from files or similar whitespace-separated data streams:
<?php $total = 0; foreach (file("numbers.txt") as $number) { $total += $number; // Currently produces “Notice: A non well formed numeric value encountered” on every iteration, because $number ends in "\n" } ?>
Finally, the current behaviour makes potential simplifications to numeric string handling less palatable if they make leading-numeric strings be tolerated in less places, because of a perception that a lot of existing code may rely on the tolerance of trailing whitespace.
Proposal
For the next PHP 7.x (currently PHP 7.4), this RFC proposes that trailing whitespace be accepted in numeric strings just as leading whitespace is.
For the PHP interpreter, this would be accomplished by modifying the is_numeric_string
C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this function, including:
- Arithmetic operators would no longer produce an
E_NOTICE
-level error when used with a numeric string with trailing whitespace - The
int
andfloat
type declarations would no longer produce anE_NOTICE
-level error when passed a numeric string with trailing whitespace - Type checks for built-in/extension (“internal”) PHP functions would no longer produce an
E_NOTICE
-level error when passed a numeric string with trailing whitespace - The comparison operators will now consider numeric strings with trailing whitespace to be numeric, therefore meaning that, for example,
"123 " == " 123"
producestrue
, instead offalse
- The
\is_numeric
function would returntrue
for numeric strings with trailing whitespace - The
++
and--
operators woukd convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules
The PHP language specification's definition of str-numeric would be modified by the addition of str-whitespace
opt
after str-number
.
This change would be almost completely backwards-compatible, as no string that was previously accepted would now be rejected. However, if an application relies on trailing whitespace not being considered well-formed, it would need updating.
RFC Impact
To Existing Extensions
Any extension using is_numeric_string
, its variants, or other functions which themselves use it, will be affected.
To Opcache
In the patch, all tests pass with Opcache enabled. I am not aware of any issues arising here.
Unaffected PHP Functionality
This does not affect the filter extension, which handles numeric strings itself in a different fashion.
Future Scope
If adopted, this would make Nikita Popov's PHP RFC: Saner string to number comparisons look more reasonable.
I would also plan a second RFC in a similar vein to Nikita's, which would simplify things by removing the concept of leading-numeric strings: strings are either numeric and accepted, or non-numeric and not accepted.
Proposed Voting Choices
Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority.
Patches and Tests
A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317
I do not yet have a language specification patch.
Implementation
After the project is implemented, this section should contain
- the version(s) it was merged to
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
- a link to the language specification section (if any)
Changelog
- 2020-06-24: Take-over by George Peter Banyard with the consent of Andrea Faulds
- 2019-03-06, v1.0: First non-draft version, dropped the second proposal from the RFC for now, I can make that as a follow-up RFC
- 2019-02-07 (draft): Added proposal to remove “non-well-formed” numeric strings at the suggestion of Nikita Popov, renamed to “Revise trailing character handling for numeric strings”
- 2017-01-18 (draft): First draft as “Permit trailing whitespace in numeric strings”