This is an old revision of the document!

PHP RFC: Saner numeric strings

Technical Background

The PHP language has a concept of numeric strings, strings which can be interpreted as numbers.

A string can be categorised in three ways according to its numeric-ness, as described by the language specification:

  • A numeric string is a string containing only a number, optionally preceded by white-space characters. For example, "123" or " 1.23e2".
  • A leading-numeric string is a string that begins with a numeric string but is followed by non-number characters (including white-space characters). For example, "123abc" or "123 ".
  • A non-numeric string is a string which is neither a numeric string nor a leading-numeric string.

A fourth way PHP might deal with numeric strings is when using an integer string for an array index. An integer string is stricter than a numeric string as it has the following additional constraints:

  • It doesn't accept leading white-spaces
  • It doesn't accept leading zeros (0)

How PHP deals with array indexes is shown in the following code snippet:

$a = [
    "4" => "Integer index",
    "03" => "Integer index with leading 0/octal",
    "2str" => "leading numeric string",
    " 1" => "leading white-space",
    "5.5" => "Float",

Which results in the following output:

array(5) {
  string(13) "Integer index"
  string(34) "Integer index with leading 0/octal"
  string(22) "leading numeric string"
  [" 1"]=>
  string(19) "leading white-space"
  string(5) "Float"

This RFC does not affect how array indexes behave, and thus won't mention them again.

One final behaviour of PHP which needs to be presented is how PHP performs weak comparisons, i.e. a comparison with one of the following binary operators: ==, !=, <>, <, >, <=, and >=, in the string-to-string case and in the string-to-int/float case.

String-to-string comparisons are performed numerically if and only if both strings are numeric strings.

String-to-int/float are always performed numerically, therefore the string will be type-juggled silently regardless of its numeric-ness.

This RFC does not propose to modify this behaviour, see PHP RFC: Saner string to number comparisons instead.

The concept of numeric strings is used in a few places, and the distinction between a numeric string and a leading-numeric string is significant as certain operations distinguish between these:

  • Explicit conversions of strings to number types, such as (int) and (float) type casts or settype(), convert numeric and leading-numeric strings and produce 0 for non-numeric strings silently, e.g.:
    var_dump((float) "123");    // float(123)
    var_dump((float) "   123"); // float(123)
    var_dump((float) "123   "); // float(123)
    var_dump((float) "123abc"); // float(123)
    var_dump((float) "string"); // float(0)
  • Implicit conversions of strings to number types in weak typing mode (i.e. no strict_type declare statement or strict_types=0) due to type declarations [note: internal functions behave similarly in PHP 8], e.g.
    function foo(int $i) { var_dump($i); }
    foo("123");    // int(123)
    foo("   123"); // int(123)
    foo("123   "); // int(123) with E_NOTICE "A non well formed numeric value encountered"
    foo("123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered"
    foo("string"); // TypeError
  • \is_numeric() returns true only for numeric strings, e.g.
    var_dump(is_numeric("123"));     // bool(true)
    var_dump(is_numeric("   123"));  // bool(true)
    var_dump(is_numeric("123   "));  // bool(false)
    var_dump(is_numeric("123abc"));  // bool(false)
  • String offsets, e.g.
    $str = 'The world';
    var_dump($str['4']);      // string(1) "w"
    var_dump($str['04']);     // string(1) "w"
    var_dump($str['4str']);   // string(1) "w" with E_NOTICE "A non well formed numeric value encountered"
    var_dump($str[' 4']);     // string(1) "w"
    var_dump($str['4.5']);    // string(1) "w" with E_WARNING "Illegal string offset '4.5'"
    var_dump($str['string']); // string(1) "T" with E_WARNING "Illegal string offset 'string'"
  • Arithmetic operations, i.e. -, +, *, /, %, or **, strings will be converted to int/float but will emit the E_NOTICE/E_WARNING as needed, e.g.
    var_dump(123 + "123");    // int(246)
    var_dump(123 + "   123"); // int(246)
    var_dump(123 + "123   "); // int(246) with E_NOTICE "A non well formed numeric value encountered"
    var_dump(123 + "123abc"); // int(246) with E_NOTICE "A non well formed numeric value encountered"
    var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"
  • Increment/Decrement operators, i.e. ++ and --, e.g.
    $a = "5";
    var_dump(++$a); // int(6)
    $b = " 5";
    var_dump(++$b); // int(6)
    $c = "5z";
    var_dump(++$c); // string(2) "6a"
    $d = "5 ";
    var_dump(++$d); // string(2) "5 "
  • String-to-string comparisons, e.g.
    var_dump("123" == "123.0");  // bool(true)
    var_dump("123" == "   123"); // bool(true)
    var_dump("123" == "123   "); // bool(false)
    var_dump("123" == "123abc"); // bool(false)

The Problem

The current behaviour of numerical strings has various issues:

  • Numeric strings with leading whitespace are considered more numeric than numeric strings with trailing whitespace.
  • Strings which happen to start with a digit, e.g. hashes, may at times be interpreted as numbers, which can lead to bugs.
  • \is_numeric() is misleading, as it will reject values that a weak-mode parameter check will accept.
  • Leading-numeric strings is a rather strange concept with unintuitive/surprising behaviour.


Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing white-spaces allowed.

This means, all strings which currently emit the E_NOTICE “A non well formed numeric value encountered” will emit the E_WARNING “A non-numeric value encountered” except if the leading-numeric string contained only trailing white-spaces.

For string offsets accessed using numeric strings the following changes will be made:

  • Leading numeric strings will emit the “Illegal string offset” instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values.
  • Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError.
  • A secondary implementation vote will decide if: numeric strings which correspond to well formed floats will emit the more usual “String offset cast occurred” warning instead of the “Illegal string offset” warning.

The following cases will produce this behaviour under the proposal:

  • Type declarations
    function foo(int $i) { var_dump($i); }
    foo("123   "); // int(123)
    foo("123abc"); // TypeError
  • \is_numeric will return true for numeric strings with trailing white-spaces
    var_dump(is_numeric("123   "));  // bool(true)
  • String offsets
    $str = 'The world';
    var_dump($str['4str']);   // string(1) "w" with E_WARNING "Illegal string offset '4str'"
    var_dump($str['4.5']);    // string(1) "w" with E_WARNING "String offset cast occurred" if the secondary vote is accepted
    var_dump($str['string']); // TypeError
  • Arithmetic operations
    var_dump(123 + "123   "); // int(246)
    var_dump(123 + "123abc"); // int(123) with E_WARNING "A non-numeric value encountered"
    var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"
  • The ++ and -- operators would convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules
    $d = "5 ";
    var_dump(++$d); // int(6)
  • String-to-string comparisons
    var_dump("123" == "123   "); // bool(true)

These changes will be accomplished by modifying the is_numeric_string C function (and its variants) in the Zend Engine.

For the string offset behaviour changes the following C Zend engine function and their JIT equivalent will be modified zend_check_string_offset() and zend_fetch_dimension_address_read().

The PHP language specification's definition of str-numeric would be modified by the addition of str-whitespaceopt after str-number and the removal of the following sentence: “A leading-numeric string is a string whose initial characters follow the requirements of a numeric string, and whose trailing characters are non-numeric”.

Backward Incompatible Changes

There are two backward incompatible changes:

  • Code relying on numerical strings with trailing white-spaces to be considered non-well-formed.
  • Code with liberal use of leading-numeric strings might need to use explicit type casts.

The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous is_numeric() behaviour:

if (is_numeric($str) && strlen($str) === strlen(rtrim($str)) ){...}

Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.:

var_dump((int) "2px"); // int(2)
var_dump((float) "2px"); // float(2)
var_dump((int) "2.5px"); // int(2)
var_dump((float) "2.5px"); // float(2.5)

Proposed PHP Version

PHP 8.0.

RFC Impact

To Existing Extensions

Any extension using the C is_numeric_string, its variants, or other functions which themselves use it, will be affected.

To Opcache

None that I am aware of.

Unaffected PHP Functionality

This does not affect the filter extension, which handles numeric strings itself in a different fashion.

Future Scope

  • Adding an E_NOTICE for numerical strings with leading/trailing white-spaces
  • Adding a flag to \is_numeric to accept or reject numerical strings with leading/trailing white-spaces
  • Align string offset behaviour with array offsets
  • Promote remaining “Illegal string offset” warnings to Type Errors in PHP 9
  • Warn on illegal offsets when used within isset() or empty()

Proposed Voting Choices

Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority.

Patches and Tests

A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/5762

A language specification patch still needs to be done.

A possible documentation patch still needs to be done.


After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)


To Andrea Faulds for the PHP RFC: Permit trailing whitespace in numeric strings on which this RFC and patch is based of.

To Theodore Brown and Larry Garfield for reviewing the RFC.


  • 2020-07-10: Major rewrite
  • 2020-07-02: Explain difference between array and string offsets, and how the RFC will impact string offsets
  • 2020-07-01: Add explicit cast behaviour for leading numeric strings
  • 2020-06-28: Initial version
rfc/saner-numeric-strings.1594564785.txt.gz · Last modified: 2020/07/12 14:39 by theodorejb