Table of Contents

PHP RFC: Permit trailing whitespace in numeric strings

Technical Background

The PHP language has a concept of numeric strings, strings which can be interpreted as numbers. This concept is used in a few places:

A string can be categorised in three ways according to its numericness, as described by the language specification:

The difference between a numeric string and a leading-numeric string is significant, because certain operations distinguish between these:

It is notable that while a numeric string may contain leading whitespace, only a leading-numeric string may contain trailing whitespace.

The Problem

The current behaviour of treating strings with leading whitespace as more numeric than strings with trailing whitespace is inconsistent and has no obvious benefit. It is an unintuitive, surprising behaviour.

The inconsistency itself can require more work from the programmer. If rejecting number strings from user input that contain whitespace is useful to your application — perhaps it must be passed on to a back-end system that cannot handle whitespace — you cannot rely on e.g. is_numeric() to make sure of this for you, it only rejects trailing whitespace; yet simultaneously, if accepting number strings from user input that contain whitespace is useful to your application — perhaps to tolerate accidentally copied-and-pasted spaces — you cannot rely on e.g. $a + $b to make sure of this for you, it only accepts leading whitespace.

Beyond the inconsistency, the current rejection of trailing whitespace is annoying for programs reading data from files or similar whitespace-separated data streams:

<?php
 
$total = 0;
foreach (file("numbers.txt") as $number) {
    $total += $number; // Currently produces “Notice: A non well formed numeric value encountered” on every iteration, because $number ends in "\n"
}
?>

Finally, the current behaviour makes potential simplifications to numeric string handling less palatable if they make leading-numeric strings be tolerated in less places, because of a perception that a lot of existing code may rely on the tolerance of trailing whitespace.

Proposal

For the next PHP 7.x (currently PHP 7.4), this RFC proposes that trailing whitespace be accepted in numeric strings just as leading whitespace is.

For the PHP interpreter, this would be accomplished by modifying the is_numeric_string C function (and its variants) in the Zend Engine. This would therefore affect PHP features which make use of this function, including:

The PHP language specification's definition of str-numeric would be modified by the addition of str-whitespaceopt after str-number.

This change would be almost completely backwards-compatible, as no string that was previously accepted would now be rejected. However, if an application relies on trailing whitespace not being considered well-formed, it would need updating.

RFC Impact

To Existing Extensions

Any extension using is_numeric_string, its variants, or other functions which themselves use it, will be affected.

To Opcache

In the patch, all tests pass with Opcache enabled. I am not aware of any issues arising here.

Unaffected PHP Functionality

This does not affect the filter extension, which handles numeric strings itself in a different fashion.

Future Scope

If adopted, this would make Nikita Popov's PHP RFC: Saner string to number comparisons look more reasonable.

I would also plan a second RFC in a similar vein to Nikita's, which would simplify things by removing the concept of leading-numeric strings: strings are either numeric and accepted, or non-numeric and not accepted.

Proposed Voting Choices

Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority.

Patches and Tests

A pull request for a complete PHP interpreter patch, including a test file, can be found here: https://github.com/php/php-src/pull/2317

I do not yet have a language specification patch.

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

Changelog