rfc:saner-numeric-strings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:saner-numeric-strings [2020/07/12 14:30] – Improve grammar theodorejbrfc:saner-numeric-strings [2020/11/25 12:46] (current) – Add implentation version number girgias
Line 1: Line 1:
 ====== PHP RFC: Saner numeric strings ====== ====== PHP RFC: Saner numeric strings ======
-  * Version: 1.3+  * Version: 1.4
   * Date: 2020-06-28   * Date: 2020-06-28
 +  * Original Author: Andrea Faulds <ajf@ajf.me>
 +  * Original RFC: [[http://wiki.php.net/rfc/trailing_whitespace_numerics|PHP RFC: Permit trailing whitespace in numeric strings]]
   * Author: George Peter Banyard <girgias@php.net>   * Author: George Peter Banyard <girgias@php.net>
-  * Status: Under Discussion+  * Status: Implemented in PHP 8.0
   * First Published at: http://wiki.php.net/rfc/saner-numeric-strings   * First Published at: http://wiki.php.net/rfc/saner-numeric-strings
   * Implementation: https://github.com/php/php-src/pull/5762   * Implementation: https://github.com/php/php-src/pull/5762
Line 12: Line 14:
 A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]: A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]:
  
-  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by white-space characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>+  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by whitespace characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>
-  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including white-space characters). For example, <php>"123abc"</php> or <php>"123 "</php>.+  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including whitespace characters). For example, <php>"123abc"</php> or <php>"123 "</php>.
   * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.   * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.
  
 A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index. A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index.
 An integer string is stricter than a numeric string as it has the following additional constraints: An integer string is stricter than a numeric string as it has the following additional constraints:
-  * It doesn't accept leading white-spaces+  * It doesn't accept leading whitespace
   * It doesn't accept leading zeros (''0'')   * It doesn't accept leading zeros (''0'')
  
Line 27: Line 29:
     "03" => "Integer index with leading 0/octal",     "03" => "Integer index with leading 0/octal",
     "2str" => "leading numeric string",     "2str" => "leading numeric string",
-    " 1" => "leading white-space",+    " 1" => "leading whitespace",
     "5.5" => "Float",     "5.5" => "Float",
 ]; ];
Line 43: Line 45:
   string(22) "leading numeric string"   string(22) "leading numeric string"
   [" 1"]=>   [" 1"]=>
-  string(19) "leading white-space"+  string(19) "leading whitespace"
   ["5.5"]=>   ["5.5"]=>
   string(5) "Float"   string(5) "Float"
Line 50: Line 52:
  
 This RFC does not affect how array indexes behave, and thus won't mention them again. This RFC does not affect how array indexes behave, and thus won't mention them again.
 +
 +Another aspect which should be noted is that arithmetic/bitwise operators will convert all operands to their numeric/integer equivalent and emit a notice/warning on malformed/invalid numeric string, except for the <php>&</php>, <php>|</php>, and <php>^</php> bitwise operators when both operands are strings and the <php>~</php> operator, in which case it will perform the operation on the ASCII values of the characters that make up the strings and the result will be a string, as per the [[https://www.php.net/manual/en/language.operators.bitwise.php|documentation on bitwise operators]].
  
 One final behaviour of PHP which needs to be presented is how PHP performs weak comparisons, i.e. a comparison with one of the following binary operators: <php>==</php>, <php>!=</php>, <php><></php>, <php><</php>, <php>></php>, <php><=</php>, and <php>>=</php>, in the string-to-string case and in the string-to-int/float case. One final behaviour of PHP which needs to be presented is how PHP performs weak comparisons, i.e. a comparison with one of the following binary operators: <php>==</php>, <php>!=</php>, <php><></php>, <php><</php>, <php>></php>, <php><=</php>, and <php>>=</php>, in the string-to-string case and in the string-to-int/float case.
Line 99: Line 103:
 var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered" var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"
 </PHP> </PHP>
-  * Increment/Decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP>+  * Increment/decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP>
 $a = "5"; $a = "5";
 var_dump(++$a); // int(6) var_dump(++$a); // int(6)
Line 115: Line 119:
 var_dump("123" == "123abc"); // bool(false) var_dump("123" == "123abc"); // bool(false)
 </PHP> </PHP>
 +  * Bitwise operations, e.g.<PHP> 
 +var_dump(123 & "123");    // int(123) 
 +var_dump(123 & "  123");  // int(123) 
 +var_dump(123 & "123  ");  // int(123) with E_NOTICE "A non well formed numeric value encountered" 
 +var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" 
 +var_dump(123 & "abc");    // int(0) with E_WARNING "A non-numeric value encountered" 
 +</PHP>
  
 ===== The Problem ===== ===== The Problem =====
Line 126: Line 136:
  
 ===== Proposal ===== ===== Proposal =====
-Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing white-spaces allowed.+Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing whitespace allowed. Any other type of string is non-numeric and will throw <php>TypeError</php>s when used in a numeric context. 
 + 
 +This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will be reclassified into the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing whitespace. And the various cases which currently emit an <php>E_WARNING</php> will be promoted to <php>TypeError</php>s. 
 + 
 +One exception to this are type declarations as they only accept proper numeric strings, thus some <php>E_NOTICE</php> will result in a <php>TypeError</php>. See below for an example.
  
-This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will emit the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing white-spaces. 
  
 For string offsets accessed using numeric strings the following changes will be made: For string offsets accessed using numeric strings the following changes will be made:
-  * Leading numeric strings will emit the “Illegal string offset” instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values. +  * Leading numeric strings will emit the “Illegal string offset” warning instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values. 
-  * Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError +  * Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError. 
-  * secondary implementation vote will decide if: numeric strings which correspond to well formed floats will emit the more usual “String offset cast occurred” warning instead of the “Illegal string offset” warning. +  * There is a secondary implementation vote to decide the followingshould numeric strings which correspond to well-formed floats remain a warning (by emitting the same “String offset cast occurred” warning that occurs when a float is used for a string offset), or should the current “Illegal string offset” warning simply be promoted to a <php>TypeError</php>? Our position is that this case should be a TypeError, as it simplifies the implementation and is consistent with the handling of other strings (see this [[https://github.com/php/php-src/pull/5762/commits/897c37727b1ee393f04f57a88fc48d69c3cf0d1d|commit]]).
  
  
Line 142: Line 155:
 foo("123abc"); // TypeError foo("123abc"); // TypeError
 </PHP> </PHP>
-  * <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing white-spaces<PHP> +  * <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing whitespace<PHP> 
-var_dump(is_numeric("123   "));  // bool(true)+var_dump(is_numeric("123   ")); // bool(true)
 </PHP> </PHP>
   * String offsets<PHP>   * String offsets<PHP>
 $str = 'The world'; $str = 'The world';
-var_dump($str['2str']);   // string(1) "w" with E_WARNING "Illegal string offset '4str'" +var_dump($str['4str']);   // string(1) "w" with E_WARNING "Illegal string offset '4str'" 
-var_dump($str['4.5']);    // string(1) "w" with E_WARNING "String offset cast occurred" if the secondary vote is accepted+var_dump($str['4.5']);    // string(1) "w" with E_WARNING "String offset cast occurred" if the secondary vote is accepted otherwise TypeError
 var_dump($str['string']); // TypeError var_dump($str['string']); // TypeError
 </PHP> </PHP>
   * Arithmetic operations<PHP>   * Arithmetic operations<PHP>
 var_dump(123 + "123   "); // int(246) var_dump(123 + "123   "); // int(246)
-var_dump(123 + "123abc"); // int(123) with E_WARNING "A non-numeric value encountered" +var_dump(123 + "123abc"); // int(246) with E_WARNING "A non-numeric value encountered" 
-var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"+var_dump(123 + "string"); // TypeError
 </PHP> </PHP>
-  * The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing white-space to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP>+  * The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP>
 $d = "5 "; $d = "5 ";
 var_dump(++$d); // int(6) var_dump(++$d); // int(6)
Line 162: Line 175:
   * String-to-string comparisons<PHP>   * String-to-string comparisons<PHP>
 var_dump("123" == "123   "); // bool(true) var_dump("123" == "123   "); // bool(true)
 +</PHP>
 +  * Bitwise operations, e.g.<PHP>
 +var_dump(123 & "123  ");  // int(123)
 +var_dump(123 & "123abc"); // int(123) with E_WARNING "A non-numeric value encountered"
 +var_dump(123 & "abc");    // TypeError
 </PHP> </PHP>
  
Line 171: Line 189:
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-There are two backward incompatible changes: +There are three backward incompatible changes: 
-  * code relying on numerical strings with trailing white-spaces to be considered non-well-formed +  * Code relying on numerical strings with trailing whitespace to be considered non-well-formed. 
-  * code with liberal use of leading-numerical strings might need to use explicit type casts+  * Code with liberal use of leading-numeric strings might need to use explicit type casts
 +  * Code relying on the fact that <php>''</php> (an empty string) evaluates to <php>0</php> for arithmetic/bitwise operations.
  
 The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour: The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour:
Line 180: Line 199:
 Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.: Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.:
 <PHP> <PHP>
-var_dump((int) "2px"); // int(2) +var_dump((int) "2px");     // int(2) 
-var_dump((float) "2px"); // float(2) +var_dump((float) "2px");   // float(2) 
-var_dump((int) "2.5px"); // int(2)+var_dump((int) "2.5px");   // int(2)
 var_dump((float) "2.5px"); // float(2.5) var_dump((float) "2.5px"); // float(2.5)
 </PHP> </PHP>
 +
 +The third reason already emitted an <php>E_WARNING</php>. We considered special-casing this to evaluate to <php>0</php>, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError. Therefore a TypeError will also be emitted in this case. The error can be avoided by explicitly checking for an empty string and changing it to <php>0</php>.
  
 ===== Proposed PHP Version ===== ===== Proposed PHP Version =====
Line 201: Line 222:
 ===== Future Scope ===== ===== Future Scope =====
   * Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]]   * Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]]
-  * Adding an E_NOTICE for numerical strings with leading/trailing white-spaces +  * Adding an E_NOTICE for numerical strings with leading/trailing whitespace 
-  * Adding a flag to <php>\is_numeric</php> to accept or reject numerical strings with leading/trailing white-spaces+  * Adding a flag to <php>\is_numeric</php> to accept or reject numeric strings with leading/trailing whitespace
   * Align string offset behaviour with array offsets   * Align string offset behaviour with array offsets
-  * Promote remaining "Illegal string offset" warnings to Type Errors in PHP 9+  * Promote remaining warnings to Type Errors in PHP 9
   * Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php>   * Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php>
  
-===== Proposed Voting Choices ===== +===== Vote ===== 
-Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority.+Per the Voting RFC, there is a single Yes/No vote requiring a 2/3 majority for the main proposalA secondary Yes/No vote requiring a 50%+1 majority will decide whether float strings used as string offsets should continue to produce a warning (with different wording) instead of consistently becoming a TypeError. 
 + 
 +Primary vote: 
 +<doodle title="Accept Saner numeric string RFC proposal" auth="girgias" voteType="single" closed="true"> 
 +   * Yes 
 +   * No 
 +</doodle> 
 + 
 +Secondary vote: 
 +<doodle title="Should valid float strings for string offsets remain a warning" auth="girgias" voteType="single" closed="true"> 
 +   * Yes 
 +   * No 
 +</doodle>
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
-A pull request for a complete PHP interpreter patch, including test file, can be found here: https://github.com/php/php-src/pull/5762+A pull request for a complete PHP interpreter patch, including test files, can be found here: https://github.com/php/php-src/pull/5762
  
 A language specification patch still needs to be done. A language specification patch still needs to be done.
Line 230: Line 263:
  
 ===== Changelog ===== ===== Changelog =====
 +  * 2020-07-13: Tweak inconsistency in regards to Arithmetic/Bitwise ops
   * 2020-07-10: Major rewrite   * 2020-07-10: Major rewrite
   * 2020-07-02: Explain difference between array and string offsets, and how the RFC will impact string offsets   * 2020-07-02: Explain difference between array and string offsets, and how the RFC will impact string offsets
   * 2020-07-01: Add explicit cast behaviour for leading numeric strings   * 2020-07-01: Add explicit cast behaviour for leading numeric strings
   * 2020-06-28: Initial version   * 2020-06-28: Initial version
rfc/saner-numeric-strings.1594564259.txt.gz · Last modified: 2020/07/12 14:30 by theodorejb