Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
rfc:saner-numeric-strings [2020/07/14 14:19] – Nits girgias | rfc:saner-numeric-strings [2020/07/24 06:54] – Better credit to Andrea Faulds girgias |
---|
* Version: 1.4 | * Version: 1.4 |
* Date: 2020-06-28 | * Date: 2020-06-28 |
| * Original Author: Andrea Faulds <ajf@ajf.me> |
| * Original RFC: [[http://wiki.php.net/rfc/trailing_whitespace_numerics|PHP RFC: Permit trailing whitespace in numeric strings]] |
* Author: George Peter Banyard <girgias@php.net> | * Author: George Peter Banyard <girgias@php.net> |
* Status: Under Discussion | * Status: Voting |
* First Published at: http://wiki.php.net/rfc/saner-numeric-strings | * First Published at: http://wiki.php.net/rfc/saner-numeric-strings |
* Implementation: https://github.com/php/php-src/pull/5762 | * Implementation: https://github.com/php/php-src/pull/5762 |
A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]: | A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]: |
| |
* A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by white-space characters. For example, <php>"123"</php> or <php>" 1.23e2"</php>. | * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by whitespace characters. For example, <php>"123"</php> or <php>" 1.23e2"</php>. |
* A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters (including white-space characters). For example, <php>"123abc"</php> or <php>"123 "</php>. | * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters (including whitespace characters). For example, <php>"123abc"</php> or <php>"123 "</php>. |
* A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string. | * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string. |
| |
A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index. | A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index. |
An integer string is stricter than a numeric string as it has the following additional constraints: | An integer string is stricter than a numeric string as it has the following additional constraints: |
* It doesn't accept leading white-spaces | * It doesn't accept leading whitespace |
* It doesn't accept leading zeros (''0'') | * It doesn't accept leading zeros (''0'') |
| |
"03" => "Integer index with leading 0/octal", | "03" => "Integer index with leading 0/octal", |
"2str" => "leading numeric string", | "2str" => "leading numeric string", |
" 1" => "leading white-space", | " 1" => "leading whitespace", |
"5.5" => "Float", | "5.5" => "Float", |
]; | ]; |
string(22) "leading numeric string" | string(22) "leading numeric string" |
[" 1"]=> | [" 1"]=> |
string(19) "leading white-space" | string(19) "leading whitespace" |
["5.5"]=> | ["5.5"]=> |
string(5) "Float" | string(5) "Float" |
var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered" | var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered" |
</PHP> | </PHP> |
* Increment/Decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP> | * Increment/decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP> |
$a = "5"; | $a = "5"; |
var_dump(++$a); // int(6) | var_dump(++$a); // int(6) |
* Bitwise operations, e.g.<PHP> | * Bitwise operations, e.g.<PHP> |
var_dump(123 & "123"); // int(123) | var_dump(123 & "123"); // int(123) |
| var_dump(123 & " 123"); // int(123) |
var_dump(123 & "123 "); // int(123) with E_NOTICE "A non well formed numeric value encountered" | var_dump(123 & "123 "); // int(123) with E_NOTICE "A non well formed numeric value encountered" |
var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" | var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" |
| |
===== Proposal ===== | ===== Proposal ===== |
Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing white-spaces allowed. Any other type of string is non-numeric and will throw <php>TypeError</php>s when used in a numeric context. | Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing whitespace allowed. Any other type of string is non-numeric and will throw <php>TypeError</php>s when used in a numeric context. |
| |
This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will de reclassified into the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing white-spaces. And the various cases which currently emit an <php>E_WARNING</php> will be promoted to <php>TypeError</php>s. | This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will be reclassified into the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing whitespace. And the various cases which currently emit an <php>E_WARNING</php> will be promoted to <php>TypeError</php>s. |
| |
One exception to this are type declarations as they only accept proper numeric strings, thus some <php>E_NOTICE</php> will result in a <php>TypeError</php>. See below for an example. | One exception to this are type declarations as they only accept proper numeric strings, thus some <php>E_NOTICE</php> will result in a <php>TypeError</php>. See below for an example. |
| |
For string offsets accessed using numeric strings the following changes will be made: | For string offsets accessed using numeric strings the following changes will be made: |
* Leading numeric strings will emit the “Illegal string offset” instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values. | * Leading numeric strings will emit the “Illegal string offset” warning instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values. |
* Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError. | * Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError. |
* A secondary implementation vote will decide if: numeric strings which correspond to well formed floats will remain a warning by emit the more usual “String offset cast occurred” warning instead of the current “Illegal string offset” warning which is being promoted to <php>TypeError</php>, the reason for this is adjusting this behaviour requires some additional boilerplate code in the Engine, as can mostly be seen in this [[https://github.com/php/php-src/pull/5762/commits/788a6963c1343d53dadc23fb2983224be9ba4c04|commit]]. | * There is a secondary implementation vote to decide the following: should numeric strings which correspond to well-formed floats remain a warning (by emitting the same “String offset cast occurred” warning that occurs when a float is used for a string offset), or should the current “Illegal string offset” warning simply be promoted to a <php>TypeError</php>? Our position is that this case should be a TypeError, as it simplifies the implementation and is consistent with the handling of other strings (see this [[https://github.com/php/php-src/pull/5762/commits/897c37727b1ee393f04f57a88fc48d69c3cf0d1d|commit]]). |
| |
| |
foo("123abc"); // TypeError | foo("123abc"); // TypeError |
</PHP> | </PHP> |
* <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing white-spaces<PHP> | * <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing whitespace<PHP> |
var_dump(is_numeric("123 ")); // bool(true) | var_dump(is_numeric("123 ")); // bool(true) |
</PHP> | </PHP> |
* String offsets<PHP> | * String offsets<PHP> |
var_dump(123 + "string"); // TypeError | var_dump(123 + "string"); // TypeError |
</PHP> | </PHP> |
* The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing white-space to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP> | * The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP> |
$d = "5 "; | $d = "5 "; |
var_dump(++$d); // int(6) | var_dump(++$d); // int(6) |
| |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== |
There are two backward incompatible changes: | There are three backward incompatible changes: |
* Code relying on numerical strings with trailing white-spaces to be considered non-well-formed. | * Code relying on numerical strings with trailing whitespace to be considered non-well-formed. |
* Code with liberal use of leading-numeric strings might need to use explicit type casts. | * Code with liberal use of leading-numeric strings might need to use explicit type casts. |
* Code relying on the fact that <php>''</php> (an empty string) evaluates to <php>0</php> for arithmetic/bitwise operations | * Code relying on the fact that <php>''</php> (an empty string) evaluates to <php>0</php> for arithmetic/bitwise operations. |
| |
The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour: | The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour: |
Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.: | Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.: |
<PHP> | <PHP> |
var_dump((int) "2px"); // int(2) | var_dump((int) "2px"); // int(2) |
var_dump((float) "2px"); // float(2) | var_dump((float) "2px"); // float(2) |
var_dump((int) "2.5px"); // int(2) | var_dump((int) "2.5px"); // int(2) |
var_dump((float) "2.5px"); // float(2.5) | var_dump((float) "2.5px"); // float(2.5) |
</PHP> | </PHP> |
| |
The third reason already emitted an <php>E_WARNING</php>, it was considered to special case this to evaluate to <php>0</php>, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError, therefore a TypeError will also be emitted in this case. This can be mitigated by checking beforehand for an empty string value and change it to <php>0</php>. | The third reason already emitted an <php>E_WARNING</php>. We considered special-casing this to evaluate to <php>0</php>, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError. Therefore a TypeError will also be emitted in this case. The error can be avoided by explicitly checking for an empty string and changing it to <php>0</php>. |
| |
===== Proposed PHP Version ===== | ===== Proposed PHP Version ===== |
===== Future Scope ===== | ===== Future Scope ===== |
* Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]] | * Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]] |
* Adding an E_NOTICE for numerical strings with leading/trailing white-spaces | * Adding an E_NOTICE for numerical strings with leading/trailing whitespace |
* Adding a flag to <php>\is_numeric</php> to accept or reject numerical strings with leading/trailing white-spaces | * Adding a flag to <php>\is_numeric</php> to accept or reject numeric strings with leading/trailing whitespace |
* Align string offset behaviour with array offsets | * Align string offset behaviour with array offsets |
* Promote remaining warnings to Type Errors in PHP 9 | * Promote remaining warnings to Type Errors in PHP 9 |
* Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php> | * Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php> |
| |
===== Proposed Voting Choices ===== | ===== Vote ===== |
Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority for the main proposal. And a secondary Yes/No vote requiring a 50%+1 majority for the implementation vote about float strings for strings offsets. | Per the Voting RFC, there is a single Yes/No vote requiring a 2/3 majority for the main proposal. A secondary Yes/No vote requiring a 50%+1 majority will decide whether float strings used as string offsets should continue to produce a warning (with different wording) instead of consistently becoming a TypeError. |
| |
| Primary vote: |
| <doodle title="Accept Saner numeric string RFC proposal" auth="girgias" voteType="single" closed="false"> |
| * Yes |
| * No |
| </doodle> |
| |
| Secondary vote: |
| <doodle title="Should valid float strings for string offsets remain a warning" auth="girgias" voteType="single" closed="false"> |
| * Yes |
| * No |
| </doodle> |
| |
===== Patches and Tests ===== | ===== Patches and Tests ===== |