rfc:saner-numeric-strings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:saner-numeric-strings [2020/07/14 14:19] – Nits girgiasrfc:saner-numeric-strings [2020/11/25 12:46] (current) – Add implentation version number girgias
Line 2: Line 2:
   * Version: 1.4   * Version: 1.4
   * Date: 2020-06-28   * Date: 2020-06-28
 +  * Original Author: Andrea Faulds <ajf@ajf.me>
 +  * Original RFC: [[http://wiki.php.net/rfc/trailing_whitespace_numerics|PHP RFC: Permit trailing whitespace in numeric strings]]
   * Author: George Peter Banyard <girgias@php.net>   * Author: George Peter Banyard <girgias@php.net>
-  * Status: Under Discussion+  * Status: Implemented in PHP 8.0
   * First Published at: http://wiki.php.net/rfc/saner-numeric-strings   * First Published at: http://wiki.php.net/rfc/saner-numeric-strings
   * Implementation: https://github.com/php/php-src/pull/5762   * Implementation: https://github.com/php/php-src/pull/5762
Line 12: Line 14:
 A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]: A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]:
  
-  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by white-space characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>+  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by whitespace characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>
-  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including white-space characters). For example, <php>"123abc"</php> or <php>"123 "</php>.+  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including whitespace characters). For example, <php>"123abc"</php> or <php>"123 "</php>.
   * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.   * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.
  
 A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index. A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index.
 An integer string is stricter than a numeric string as it has the following additional constraints: An integer string is stricter than a numeric string as it has the following additional constraints:
-  * It doesn't accept leading white-spaces+  * It doesn't accept leading whitespace
   * It doesn't accept leading zeros (''0'')   * It doesn't accept leading zeros (''0'')
  
Line 27: Line 29:
     "03" => "Integer index with leading 0/octal",     "03" => "Integer index with leading 0/octal",
     "2str" => "leading numeric string",     "2str" => "leading numeric string",
-    " 1" => "leading white-space",+    " 1" => "leading whitespace",
     "5.5" => "Float",     "5.5" => "Float",
 ]; ];
Line 43: Line 45:
   string(22) "leading numeric string"   string(22) "leading numeric string"
   [" 1"]=>   [" 1"]=>
-  string(19) "leading white-space"+  string(19) "leading whitespace"
   ["5.5"]=>   ["5.5"]=>
   string(5) "Float"   string(5) "Float"
Line 101: Line 103:
 var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered" var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"
 </PHP> </PHP>
-  * Increment/Decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP>+  * Increment/decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP>
 $a = "5"; $a = "5";
 var_dump(++$a); // int(6) var_dump(++$a); // int(6)
Line 119: Line 121:
   * Bitwise operations, e.g.<PHP>   * Bitwise operations, e.g.<PHP>
 var_dump(123 & "123");    // int(123) var_dump(123 & "123");    // int(123)
 +var_dump(123 & "  123");  // int(123)
 var_dump(123 & "123  ");  // int(123) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 & "123  ");  // int(123) with E_NOTICE "A non well formed numeric value encountered"
 var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered"
Line 133: Line 136:
  
 ===== Proposal ===== ===== Proposal =====
-Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing white-spaces allowed. Any other type of string is non-numeric and will throw <php>TypeError</php>s when used in a numeric context.+Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing whitespace allowed. Any other type of string is non-numeric and will throw <php>TypeError</php>s when used in a numeric context.
  
-This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will de reclassified into the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing white-spaces. And the various cases which currently emit an <php>E_WARNING</php> will be promoted to <php>TypeError</php>s.+This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will be reclassified into the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing whitespace. And the various cases which currently emit an <php>E_WARNING</php> will be promoted to <php>TypeError</php>s.
  
 One exception to this are type declarations as they only accept proper numeric strings, thus some <php>E_NOTICE</php> will result in a <php>TypeError</php>. See below for an example. One exception to this are type declarations as they only accept proper numeric strings, thus some <php>E_NOTICE</php> will result in a <php>TypeError</php>. See below for an example.
Line 141: Line 144:
  
 For string offsets accessed using numeric strings the following changes will be made: For string offsets accessed using numeric strings the following changes will be made:
-  * Leading numeric strings will emit the “Illegal string offset” instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values.+  * Leading numeric strings will emit the “Illegal string offset” warning instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values.
   * Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError.   * Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError.
-  * secondary implementation vote will decide if: numeric strings which correspond to well formed floats will remain a warning by emit the more usual “String offset cast occurred” warning instead of the current “Illegal string offset” warning which is being promoted to <php>TypeError</php>, the reason for this is adjusting this behaviour requires some additional boilerplate code in the Engine, as can mostly be seen in this [[https://github.com/php/php-src/pull/5762/commits/788a6963c1343d53dadc23fb2983224be9ba4c04|commit]]. +  * There is a secondary implementation vote to decide the followingshould numeric strings which correspond to well-formed floats remain a warning (by emitting the same “String offset cast occurred” warning that occurs when a float is used for a string offset), or should the current “Illegal string offset” warning simply be promoted to <php>TypeError</php>? Our position is that this case should be a TypeError, as it simplifies the implementation and is consistent with the handling of other strings (see this [[https://github.com/php/php-src/pull/5762/commits/897c37727b1ee393f04f57a88fc48d69c3cf0d1d|commit]]).
  
  
Line 152: Line 155:
 foo("123abc"); // TypeError foo("123abc"); // TypeError
 </PHP> </PHP>
-  * <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing white-spaces<PHP> +  * <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing whitespace<PHP> 
-var_dump(is_numeric("123   "));  // bool(true)+var_dump(is_numeric("123   ")); // bool(true)
 </PHP> </PHP>
   * String offsets<PHP>   * String offsets<PHP>
Line 166: Line 169:
 var_dump(123 + "string"); // TypeError var_dump(123 + "string"); // TypeError
 </PHP> </PHP>
-  * The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing white-space to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP>+  * The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP>
 $d = "5 "; $d = "5 ";
 var_dump(++$d); // int(6) var_dump(++$d); // int(6)
Line 186: Line 189:
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-There are two backward incompatible changes: +There are three backward incompatible changes: 
-  * Code relying on numerical strings with trailing white-spaces to be considered non-well-formed.+  * Code relying on numerical strings with trailing whitespace to be considered non-well-formed.
   * Code with liberal use of leading-numeric strings might need to use explicit type casts.   * Code with liberal use of leading-numeric strings might need to use explicit type casts.
-  * Code relying on the fact that <php>''</php> (an empty string) evaluates to <php>0</php> for arithmetic/bitwise operations+  * Code relying on the fact that <php>''</php> (an empty string) evaluates to <php>0</php> for arithmetic/bitwise operations.
  
 The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour: The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour:
Line 196: Line 199:
 Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.: Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.:
 <PHP> <PHP>
-var_dump((int) "2px"); // int(2) +var_dump((int) "2px");     // int(2) 
-var_dump((float) "2px"); // float(2) +var_dump((float) "2px");   // float(2) 
-var_dump((int) "2.5px"); // int(2)+var_dump((int) "2.5px");   // int(2)
 var_dump((float) "2.5px"); // float(2.5) var_dump((float) "2.5px"); // float(2.5)
 </PHP> </PHP>
  
-The third reason already emitted an <php>E_WARNING</php>, it was considered to special case this to evaluate to <php>0</php>, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError, therefore a TypeError will also be emitted in this case. This can be mitigated by checking beforehand for an empty string value and change it to <php>0</php>.+The third reason already emitted an <php>E_WARNING</php>. We considered special-casing this to evaluate to <php>0</php>, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError. Therefore a TypeError will also be emitted in this case. The error can be avoided by explicitly checking for an empty string and changing it to <php>0</php>.
  
 ===== Proposed PHP Version ===== ===== Proposed PHP Version =====
Line 219: Line 222:
 ===== Future Scope ===== ===== Future Scope =====
   * Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]]   * Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]]
-  * Adding an E_NOTICE for numerical strings with leading/trailing white-spaces +  * Adding an E_NOTICE for numerical strings with leading/trailing whitespace 
-  * Adding a flag to <php>\is_numeric</php> to accept or reject numerical strings with leading/trailing white-spaces+  * Adding a flag to <php>\is_numeric</php> to accept or reject numeric strings with leading/trailing whitespace
   * Align string offset behaviour with array offsets   * Align string offset behaviour with array offsets
   * Promote remaining warnings to Type Errors in PHP 9   * Promote remaining warnings to Type Errors in PHP 9
   * Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php>   * Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php>
  
-===== Proposed Voting Choices ===== +===== Vote ===== 
-Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority for the main proposal. And a secondary Yes/No vote requiring a 50%+1 majority for the implementation vote about float strings for strings offsets.+Per the Voting RFC, there is a single Yes/No vote requiring a 2/3 majority for the main proposal. secondary Yes/No vote requiring a 50%+1 majority will decide whether float strings used as string offsets should continue to produce a warning (with different wording) instead of consistently becoming a TypeError. 
 + 
 +Primary vote
 +<doodle title="Accept Saner numeric string RFC proposal" auth="girgias" voteType="single" closed="true"> 
 +   * Yes 
 +   * No 
 +</doodle> 
 + 
 +Secondary vote: 
 +<doodle title="Should valid float strings for string offsets remain a warning" auth="girgias" voteType="single" closed="true"> 
 +   * Yes 
 +   * No 
 +</doodle>
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
rfc/saner-numeric-strings.1594736373.txt.gz · Last modified: 2020/07/14 14:19 by girgias