The PHP language has a concept of numeric strings, strings which can be interpreted as numbers. This concept is used in a few places:
$a = "123"; $b = (float)$a; // float(123)
$a = "123"; $b = intdiv($a, 1); // int(123)
(if strict_types=1
is not set)$a = "123"; $b = "123.0"; $c = ($a == $b); // bool(true)
is_numeric()
function, e.g. $a = "123"; $b = is_numeric($a); // bool(true)
A string can be categorised in three ways according to its numericness, as described by the language specification:
"123"
or " 1.23e2"
."123abc"
or "123 "
.The difference between a numeric string and a leading-numeric string is significant, because certain operations distinguish between these:
is_numeric()
returns TRUE
only for numeric strings$a * $b
, $a + $b
) accept and implicitly convert both numeric and leading-numeric strings, but trigger the E_NOTICE
“A non well formed numeric value encountered” for leading-numeric stringsstrict_types=1
is not set, int
and float
parameter and return type declarations will accept and implicitly convert both numeric and leading-numeric strings, but likewise trigger the same E_NOTICE
(int)
, (float)
, settype()
) accept all strings, converting both numeric and leading-numeric strings and producing 0 for non-numeric strings==
etc perform numeric comparison if only both strings are numeric strings==
etc type-juggle the string (and thus perform numeric comparison) if it is either a numeric string or a non-numeric stringIt is notable that while a numeric string may contain leading whitespace, only a leading-numeric string may contain trailing whitespace.
The current behaviour of treating strings with leading whitespace as more numeric than strings with trailing whitespace is inconsistent and has no obvious benefit. It is an unintuitive, surprising behaviour.
The inconsistency itself can require more work from the programmer. If rejecting number strings from user input that contain whitespace is useful to your application — perhaps it must be passed on to a back-end system that cannot handle whitespace — you cannot rely on e.g. is_numeric()
to make sure of this for you, it only rejects trailing whitespace; yet simultaneously, if accepting number strings from user input that contain whitespace is useful to your application — perhaps to tolerate accidentally copied-and-pasted spaces — you cannot rely on e.g. $a + $b
to make sure of this for you, it only accepts leading whitespace.
Beyond the inconsistency, the current rejection of trailing whitespace is annoying for programs reading data from files or similar whitespace-separated data streams:
<?php $total = 0; foreach (file("numbers.txt") as $number) { $total += $number; // Currently produces “Notice: A non well formed numeric value encountered” on every iteration, because $number ends in "\n" } ?>
Finally, the current behaviour makes potential simplifications to numeric string handling less palatable if they make leading-numeric strings be tolerated in less places, because of a perception that a lot of existing code may rely on the tolerance of trailing whitespace.
For the next PHP 7.x (currently PHP 7.4), this RFC proposes that trailing whitespace be accepted in numeric strings just as leading whitespace is.
For the PHP interpreter, this would be accomplished by modifying the is_numeric_string
C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this function, including:
E_NOTICE
-level error when used with a numeric string with trailing whitespaceint
and float
type declarations would no longer produce an E_NOTICE
-level error when passed a numeric string with trailing whitespaceE_NOTICE
-level error when passed a numeric string with trailing whitespace"123 " == " 123"
produces true
, instead of false
\is_numeric
function would return true
for numeric strings with trailing whitespace++
and --
operators woukd convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules
The PHP language specification's definition of str-numeric would be modified by the addition of str-whitespace
opt
after str-number
.
This change would be almost completely backwards-compatible, as no string that was previously accepted would now be rejected. However, if an application relies on trailing whitespace not being considered well-formed, it would need updating.
Any extension using is_numeric_string
, its variants, or other functions which themselves use it, will be affected.
In the patch, all tests pass with Opcache enabled. I am not aware of any issues arising here.
This does not affect the filter extension, which handles numeric strings itself in a different fashion.
If adopted, this would make Nikita Popov's PHP RFC: Saner string to number comparisons look more reasonable.
I would also plan a second RFC in a similar vein to Nikita's, which would simplify things by removing the concept of leading-numeric strings: strings are either numeric and accepted, or non-numeric and not accepted.
Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority.
A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317
I do not yet have a language specification patch.
After the project is implemented, this section should contain