The PHP language has a concept of numeric strings, strings which can be interpreted as numbers.
A string can be categorised in three ways according to its numeric-ness, as described by the language specification:
"123"
or " 1.23e2"
."123abc"
or "123 "
.A fourth way PHP might deal with numeric strings is when using an integer string for an array index. An integer string is stricter than a numeric string as it has the following additional constraints:
0
)How PHP deals with array indexes is shown in the following code snippet:
$a = [ "4" => "Integer index", "03" => "Integer index with leading 0/octal", "2str" => "leading numeric string", " 1" => "leading whitespace", "5.5" => "Float", ]; var_dump($a);
Which results in the following output:
array(5) { [4]=> string(13) "Integer index" ["03"]=> string(34) "Integer index with leading 0/octal" ["2str"]=> string(22) "leading numeric string" [" 1"]=> string(19) "leading whitespace" ["5.5"]=> string(5) "Float" }
This RFC does not affect how array indexes behave, and thus won't mention them again.
Another aspect which should be noted is that arithmetic/bitwise operators will convert all operands to their numeric/integer equivalent and emit a notice/warning on malformed/invalid numeric string, except for the &
, |
, and ^
bitwise operators when both operands are strings and the ~
operator, in which case it will perform the operation on the ASCII values of the characters that make up the strings and the result will be a string, as per the documentation on bitwise operators.
One final behaviour of PHP which needs to be presented is how PHP performs weak comparisons, i.e. a comparison with one of the following binary operators: ==
, !=
, <>
, <
, >
, <=
, and >=
, in the string-to-string case and in the string-to-int/float case.
String-to-string comparisons are performed numerically if and only if both strings are numeric strings.
String-to-int/float are always performed numerically, therefore the string will be type-juggled silently regardless of its numeric-ness.
This RFC does not propose to modify this behaviour, see PHP RFC: Saner string to number comparisons instead.
The concept of numeric strings is used in a few places, and the distinction between a numeric string and a leading-numeric string is significant as certain operations distinguish between these:
(int)
and (float)
type casts or settype()
, convert numeric and leading-numeric strings and produce 0
for non-numeric strings silently, e.g.:
strict_type
declare statement or strict_types=0
) due to type declarations [note: internal functions behave similarly in PHP 8], e.g.function foo(int $i) { var_dump($i); } foo("123"); // int(123) foo(" 123"); // int(123) foo("123 "); // int(123) with E_NOTICE "A non well formed numeric value encountered" foo("123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" foo("string"); // TypeError
\is_numeric()
returns true
only for numeric strings, e.g. var_dump(is_numeric("123")); // bool(true) var_dump(is_numeric(" 123")); // bool(true) var_dump(is_numeric("123 ")); // bool(false) var_dump(is_numeric("123abc")); // bool(false)
$str = 'The world'; var_dump($str['4']); // string(1) "w" var_dump($str['04']); // string(1) "w" var_dump($str['4str']); // string(1) "w" with E_NOTICE "A non well formed numeric value encountered" var_dump($str[' 4']); // string(1) "w" var_dump($str['4.5']); // string(1) "w" with E_WARNING "Illegal string offset '4.5'" var_dump($str['string']); // string(1) "T" with E_WARNING "Illegal string offset 'string'"
-
, +
, *
, /
, %
, or **
, strings will be converted to int/float but will emit the E_NOTICE
/E_WARNING
as needed, e.g.var_dump(123 + "123"); // int(246) var_dump(123 + " 123"); // int(246) var_dump(123 + "123 "); // int(246) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 + "123abc"); // int(246) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"
var_dump(123 & "123"); // int(123) var_dump(123 & " 123"); // int(123) var_dump(123 & "123 "); // int(123) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 & "abc"); // int(0) with E_WARNING "A non-numeric value encountered"
The current behaviour of numerical strings has various issues:
\is_numeric()
is misleading, as it will reject values that a weak-mode parameter check will accept.
Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing whitespace allowed. Any other type of string is non-numeric and will throw TypeError
s when used in a numeric context.
This means, all strings which currently emit the E_NOTICE
“A non well formed numeric value encountered” will be reclassified into the E_WARNING
“A non-numeric value encountered” except if the leading-numeric string contained only trailing whitespace. And the various cases which currently emit an E_WARNING
will be promoted to TypeError
s.
One exception to this are type declarations as they only accept proper numeric strings, thus some E_NOTICE
will result in a TypeError
. See below for an example.
For string offsets accessed using numeric strings the following changes will be made:
TypeError
? Our position is that this case should be a TypeError, as it simplifies the implementation and is consistent with the handling of other strings (see this commit).The following cases will produce this behaviour under the proposal:
function foo(int $i) { var_dump($i); } foo("123 "); // int(123) foo("123abc"); // TypeError
\is_numeric
will return true
for numeric strings with trailing whitespacevar_dump(is_numeric("123 ")); // bool(true)
++
and --
operators would convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules$d = "5 "; var_dump(++$d); // int(6)
var_dump("123" == "123 "); // bool(true)
These changes will be accomplished by modifying the is_numeric_string
C function (and its variants) in the Zend Engine.
For the string offset behaviour changes the following C Zend engine function and their JIT equivalent will be modified zend_check_string_offset()
and zend_fetch_dimension_address_read()
.
The PHP language specification's definition of str-numeric would be modified by the addition of str-whitespace
opt
after str-number
and the removal of the following sentence: “A leading-numeric string is a string whose initial characters follow the requirements of a numeric string, and whose trailing characters are non-numeric”.
There are three backward incompatible changes:
''
(an empty string) evaluates to 0
for arithmetic/bitwise operations.
The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous is_numeric()
behaviour:
if (is_numeric($str) && strlen($str) === strlen(rtrim($str)) ){...}
Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.:
var_dump((int) "2px"); // int(2) var_dump((float) "2px"); // float(2) var_dump((int) "2.5px"); // int(2) var_dump((float) "2.5px"); // float(2.5)
The third reason already emitted an E_WARNING
. We considered special-casing this to evaluate to 0
, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError. Therefore a TypeError will also be emitted in this case. The error can be avoided by explicitly checking for an empty string and changing it to 0
.
PHP 8.0.
Any extension using the C is_numeric_string
, its variants, or other functions which themselves use it, will be affected.
None that I am aware of.
This does not affect the filter extension, which handles numeric strings itself in a different fashion.
\is_numeric
to accept or reject numeric strings with leading/trailing whitespacePer the Voting RFC, there is a single Yes/No vote requiring a 2/3 majority for the main proposal. A secondary Yes/No vote requiring a 50%+1 majority will decide whether float strings used as string offsets should continue to produce a warning (with different wording) instead of consistently becoming a TypeError.
Primary vote:
Secondary vote:
A pull request for a complete PHP interpreter patch, including test files, can be found here: https://github.com/php/php-src/pull/5762
A language specification patch still needs to be done.
A possible documentation patch still needs to be done.
After the project is implemented, this section should contain
To Andrea Faulds for the PHP RFC: Permit trailing whitespace in numeric strings on which this RFC and patch is based of.
To Theodore Brown and Larry Garfield for reviewing the RFC.