The PHP language has a concept of numeric strings, strings which can be interpreted as numbers.
A string can be categorised in three ways according to its numeric-ness, as described by the language specification:
"123" or " 1.23e2"."123abc" or "123 ".A fourth way PHP might deal with numeric strings is when using an integer string for an array index. An integer string is stricter than a numeric string as it has the following additional constraints:
0)How PHP deals with array indexes is shown in the following code snippet:
$a = [ "4" => "Integer index", "03" => "Integer index with leading 0/octal", "2str" => "leading numeric string", " 1" => "leading whitespace", "5.5" => "Float", ]; var_dump($a);
Which results in the following output:
array(5) {
[4]=>
string(13) "Integer index"
["03"]=>
string(34) "Integer index with leading 0/octal"
["2str"]=>
string(22) "leading numeric string"
[" 1"]=>
string(19) "leading whitespace"
["5.5"]=>
string(5) "Float"
}
This RFC does not affect how array indexes behave, and thus won't mention them again.
Another aspect which should be noted is that arithmetic/bitwise operators will convert all operands to their numeric/integer equivalent and emit a notice/warning on malformed/invalid numeric string, except for the &, |, and ^ bitwise operators when both operands are strings and the ~ operator, in which case it will perform the operation on the ASCII values of the characters that make up the strings and the result will be a string, as per the documentation on bitwise operators.
One final behaviour of PHP which needs to be presented is how PHP performs weak comparisons, i.e. a comparison with one of the following binary operators: ==, !=, <>, <, >, <=, and >=, in the string-to-string case and in the string-to-int/float case.
String-to-string comparisons are performed numerically if and only if both strings are numeric strings.
String-to-int/float are always performed numerically, therefore the string will be type-juggled silently regardless of its numeric-ness.
This RFC does not propose to modify this behaviour, see PHP RFC: Saner string to number comparisons instead.
The concept of numeric strings is used in a few places, and the distinction between a numeric string and a leading-numeric string is significant as certain operations distinguish between these:
(int) and (float) type casts or settype(), convert numeric and leading-numeric strings and produce 0 for non-numeric strings silently, e.g.:
strict_type declare statement or strict_types=0) due to type declarations [note: internal functions behave similarly in PHP 8], e.g.function foo(int $i) { var_dump($i); } foo("123"); // int(123) foo(" 123"); // int(123) foo("123 "); // int(123) with E_NOTICE "A non well formed numeric value encountered" foo("123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" foo("string"); // TypeError
\is_numeric() returns true only for numeric strings, e.g. var_dump(is_numeric("123")); // bool(true) var_dump(is_numeric(" 123")); // bool(true) var_dump(is_numeric("123 ")); // bool(false) var_dump(is_numeric("123abc")); // bool(false)
$str = 'The world'; var_dump($str['4']); // string(1) "w" var_dump($str['04']); // string(1) "w" var_dump($str['4str']); // string(1) "w" with E_NOTICE "A non well formed numeric value encountered" var_dump($str[' 4']); // string(1) "w" var_dump($str['4.5']); // string(1) "w" with E_WARNING "Illegal string offset '4.5'" var_dump($str['string']); // string(1) "T" with E_WARNING "Illegal string offset 'string'"
-, +, *, /, %, or **, strings will be converted to int/float but will emit the E_NOTICE/E_WARNING as needed, e.g.var_dump(123 + "123"); // int(246) var_dump(123 + " 123"); // int(246) var_dump(123 + "123 "); // int(246) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 + "123abc"); // int(246) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"
var_dump(123 & "123"); // int(123) var_dump(123 & " 123"); // int(123) var_dump(123 & "123 "); // int(123) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 & "abc"); // int(0) with E_WARNING "A non-numeric value encountered"
The current behaviour of numerical strings has various issues:
\is_numeric() is misleading, as it will reject values that a weak-mode parameter check will accept.
Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing whitespace allowed. Any other type of string is non-numeric and will throw TypeErrors when used in a numeric context.
This means, all strings which currently emit the E_NOTICE “A non well formed numeric value encountered” will be reclassified into the E_WARNING “A non-numeric value encountered” except if the leading-numeric string contained only trailing whitespace. And the various cases which currently emit an E_WARNING will be promoted to TypeErrors.
One exception to this are type declarations as they only accept proper numeric strings, thus some E_NOTICE will result in a TypeError. See below for an example.
For string offsets accessed using numeric strings the following changes will be made:
TypeError? Our position is that this case should be a TypeError, as it simplifies the implementation and is consistent with the handling of other strings (see this commit).The following cases will produce this behaviour under the proposal:
function foo(int $i) { var_dump($i); } foo("123 "); // int(123) foo("123abc"); // TypeError
\is_numeric will return true for numeric strings with trailing whitespacevar_dump(is_numeric("123 ")); // bool(true)
++ and -- operators would convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules$d = "5 "; var_dump(++$d); // int(6)
var_dump("123" == "123 "); // bool(true)
These changes will be accomplished by modifying the is_numeric_string C function (and its variants) in the Zend Engine.
For the string offset behaviour changes the following C Zend engine function and their JIT equivalent will be modified zend_check_string_offset() and zend_fetch_dimension_address_read().
The PHP language specification's definition of str-numeric would be modified by the addition of str-whitespaceopt after str-number and the removal of the following sentence: “A leading-numeric string is a string whose initial characters follow the requirements of a numeric string, and whose trailing characters are non-numeric”.
There are three backward incompatible changes:
'' (an empty string) evaluates to 0 for arithmetic/bitwise operations.
The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous is_numeric() behaviour:
if (is_numeric($str) && strlen($str) === strlen(rtrim($str)) ){...}
Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.:
var_dump((int) "2px"); // int(2) var_dump((float) "2px"); // float(2) var_dump((int) "2.5px"); // int(2) var_dump((float) "2.5px"); // float(2.5)
The third reason already emitted an E_WARNING. We considered special-casing this to evaluate to 0, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError. Therefore a TypeError will also be emitted in this case. The error can be avoided by explicitly checking for an empty string and changing it to 0.
PHP 8.0.
Any extension using the C is_numeric_string, its variants, or other functions which themselves use it, will be affected.
None that I am aware of.
This does not affect the filter extension, which handles numeric strings itself in a different fashion.
\is_numeric to accept or reject numeric strings with leading/trailing whitespacePer the Voting RFC, there is a single Yes/No vote requiring a 2/3 majority for the main proposal. A secondary Yes/No vote requiring a 50%+1 majority will decide whether float strings used as string offsets should continue to produce a warning (with different wording) instead of consistently becoming a TypeError.
Primary vote:
Secondary vote:
A pull request for a complete PHP interpreter patch, including test files, can be found here: https://github.com/php/php-src/pull/5762
A language specification patch still needs to be done.
A possible documentation patch still needs to be done.
After the project is implemented, this section should contain
To Andrea Faulds for the PHP RFC: Permit trailing whitespace in numeric strings on which this RFC and patch is based of.
To Theodore Brown and Larry Garfield for reviewing the RFC.