rfc:saner-numeric-strings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
rfc:saner-numeric-strings [2020/07/10 02:04] – Fix typo theodorejbrfc:saner-numeric-strings [2020/07/31 12:00] – Close vote girgias
Line 1: Line 1:
 ====== PHP RFC: Saner numeric strings ====== ====== PHP RFC: Saner numeric strings ======
-  * Version: 1.3+  * Version: 1.4
   * Date: 2020-06-28   * Date: 2020-06-28
 +  * Original Author: Andrea Faulds <ajf@ajf.me>
 +  * Original RFC: [[http://wiki.php.net/rfc/trailing_whitespace_numerics|PHP RFC: Permit trailing whitespace in numeric strings]]
   * Author: George Peter Banyard <girgias@php.net>   * Author: George Peter Banyard <girgias@php.net>
-  * Status: Under Discussion+  * Status: Voting
   * First Published at: http://wiki.php.net/rfc/saner-numeric-strings   * First Published at: http://wiki.php.net/rfc/saner-numeric-strings
   * Implementation: https://github.com/php/php-src/pull/5762   * Implementation: https://github.com/php/php-src/pull/5762
Line 12: Line 14:
 A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]: A string can be categorised in three ways according to its numeric-ness, as [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#the-string-type|described by the language specification]]:
  
-  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by white-space characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>+  * A //numeric string// is a string containing only a [[https://github.com/php/php-langspec/blob/be010b4435e7b0801737bb66b5bbdd8f9fb51dde/spec/05-types.md#grammar-str-number|number]], optionally preceded by whitespace characters. For example, <php>"123"</php> or <php>"  1.23e2"</php>
-  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including white-space characters). For example, <php>"123abc"</php> or <php>"123 "</php>.+  * A //leading-numeric string// is a string that begins with a numeric string but is followed by non-number characters  (including whitespace characters). For example, <php>"123abc"</php> or <php>"123 "</php>.
   * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.   * A //non-numeric string// is a string which is neither a numeric string nor a leading-numeric string.
  
 A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index. A fourth way PHP might deal with numeric strings is when using an //integer// string for an array index.
 An integer string is stricter than a numeric string as it has the following additional constraints: An integer string is stricter than a numeric string as it has the following additional constraints:
-  * It doesn't accept leading white-spaces+  * It doesn't accept leading whitespace
   * It doesn't accept leading zeros (''0'')   * It doesn't accept leading zeros (''0'')
  
Line 27: Line 29:
     "03" => "Integer index with leading 0/octal",     "03" => "Integer index with leading 0/octal",
     "2str" => "leading numeric string",     "2str" => "leading numeric string",
-    " 1" => "leading white-space",+    " 1" => "leading whitespace",
     "5.5" => "Float",     "5.5" => "Float",
 ]; ];
Line 43: Line 45:
   string(22) "leading numeric string"   string(22) "leading numeric string"
   [" 1"]=>   [" 1"]=>
-  string(19) "leading white-space"+  string(19) "leading whitespace"
   ["5.5"]=>   ["5.5"]=>
   string(5) "Float"   string(5) "Float"
Line 50: Line 52:
  
 This RFC does not affect how array indexes behave, and thus won't mention them again. This RFC does not affect how array indexes behave, and thus won't mention them again.
 +
 +Another aspect which should be noted is that arithmetic/bitwise operators will convert all operands to their numeric/integer equivalent and emit a notice/warning on malformed/invalid numeric string, except for the <php>&</php>, <php>|</php>, and <php>^</php> bitwise operators when both operands are strings and the <php>~</php> operator, in which case it will perform the operation on the ASCII values of the characters that make up the strings and the result will be a string, as per the [[https://www.php.net/manual/en/language.operators.bitwise.php|documentation on bitwise operators]].
  
 One final behaviour of PHP which needs to be presented is how PHP performs weak comparisons, i.e. a comparison with one of the following binary operators: <php>==</php>, <php>!=</php>, <php><></php>, <php><</php>, <php>></php>, <php><=</php>, and <php>>=</php>, in the string-to-string case and in the string-to-int/float case. One final behaviour of PHP which needs to be presented is how PHP performs weak comparisons, i.e. a comparison with one of the following binary operators: <php>==</php>, <php>!=</php>, <php><></php>, <php><</php>, <php>></php>, <php><=</php>, and <php>>=</php>, in the string-to-string case and in the string-to-int/float case.
Line 86: Line 90:
 $str = 'The world'; $str = 'The world';
 var_dump($str['4']);      // string(1) "w" var_dump($str['4']);      // string(1) "w"
-var_dump($str['03']);     // string(1) " " +var_dump($str['04']);     // string(1) "w
-var_dump($str['2str']);   // string(1) "e" with E_NOTICE "A non well formed numeric value encountered" +var_dump($str['4str']);   // string(1) "w" with E_NOTICE "A non well formed numeric value encountered" 
-var_dump($str[' 1']);     // string(1) "h+var_dump($str[' 4']);     // string(1) "w
-var_dump($str['5.5']);    // string(1) "o" with E_WARNING "Illegal string offset '5.5'"+var_dump($str['4.5']);    // string(1) "w" with E_WARNING "Illegal string offset '4.5'"
 var_dump($str['string']); // string(1) "T" with E_WARNING "Illegal string offset 'string'" var_dump($str['string']); // string(1) "T" with E_WARNING "Illegal string offset 'string'"
 </PHP> </PHP>
Line 99: Line 103:
 var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered" var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"
 </PHP> </PHP>
-  * Increment/Decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP>+  * Increment/decrement operators, i.e. <php>++</php> and <php>--</php>, e.g.<PHP>
 $a = "5"; $a = "5";
 var_dump(++$a); // int(6) var_dump(++$a); // int(6)
Line 115: Line 119:
 var_dump("123" == "123abc"); // bool(false) var_dump("123" == "123abc"); // bool(false)
 </PHP> </PHP>
 +  * Bitwise operations, e.g.<PHP> 
 +var_dump(123 & "123");    // int(123) 
 +var_dump(123 & "  123");  // int(123) 
 +var_dump(123 & "123  ");  // int(123) with E_NOTICE "A non well formed numeric value encountered" 
 +var_dump(123 & "123abc"); // int(123) with E_NOTICE "A non well formed numeric value encountered" 
 +var_dump(123 & "abc");    // int(0) with E_WARNING "A non-numeric value encountered" 
 +</PHP>
  
 ===== The Problem ===== ===== The Problem =====
  
 The current behaviour of numerical strings has various issues: The current behaviour of numerical strings has various issues:
-  * numeric strings with leading white-space are considered more numeric than numeric strings with trailing white-space +  * Numeric strings with leading whitespace are considered more numeric than numeric strings with trailing whitespace. 
-  * strings which happen to start with a digit, e.g. hashes, may at times be interpreted as numbers, which can lead to bugs +  * Strings which happen to start with a digit, e.g. hashes, may at times be interpreted as numbers, which can lead to bugs. 
-  * <php>\is_numeric()</php> is misleading, as it will reject values that a weak-mode parameter check will accept +  * <php>\is_numeric()</php> is misleading, as it will reject values that a weak-mode parameter check will accept. 
-  * leading-numeric strings is a rather strange concept and an unintuitive/surprising behaviour.+  * Leading-numeric strings is a rather strange concept with unintuitive/surprising behaviour.
  
 ===== Proposal ===== ===== Proposal =====
-Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing white-spaces allowed.+Unify the various numeric string modes into a single concept: Numeric characters only with both leading and trailing whitespace allowed. Any other type of string is non-numeric and will throw <php>TypeError</php>s when used in a numeric context. 
 + 
 +This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will be reclassified into the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing whitespace. And the various cases which currently emit an <php>E_WARNING</php> will be promoted to <php>TypeError</php>s. 
 + 
 +One exception to this are type declarations as they only accept proper numeric strings, thus some <php>E_NOTICE</php> will result in a <php>TypeError</php>. See below for an example.
  
-This means, all strings which currently emit the <php>E_NOTICE</php> “A non well formed numeric value encountered” will emit the <php>E_WARNING</php> “A non-numeric value encountered” //except// if the leading-numeric string contained only trailing white-spaces. 
  
 For string offsets accessed using numeric strings the following changes will be made: For string offsets accessed using numeric strings the following changes will be made:
-  * Leading numeric strings will emit the “Illegal string offset” instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values. +  * Leading numeric strings will emit the “Illegal string offset” warning instead of the “A non well formed numeric value encountered” notice, and continue to evaluate to their respective values. 
-  * Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError +  * Non-numeric strings which emitted the “Illegal string offset” warning will throw an “Illegal offset type” TypeError. 
-  * secondary implementation vote will decide if: numeric strings which correspond to well formed floats will emit the more usual “String offset cast occurred” warning instead of the “Illegal string offset” warning. +  * There is a secondary implementation vote to decide the followingshould numeric strings which correspond to well-formed floats remain a warning (by emitting the same “String offset cast occurred” warning that occurs when a float is used for a string offset), or should the current “Illegal string offset” warning simply be promoted to a <php>TypeError</php>? Our position is that this case should be a TypeError, as it simplifies the implementation and is consistent with the handling of other strings (see this [[https://github.com/php/php-src/pull/5762/commits/897c37727b1ee393f04f57a88fc48d69c3cf0d1d|commit]]).
  
  
Line 142: Line 155:
 foo("123abc"); // TypeError foo("123abc"); // TypeError
 </PHP> </PHP>
-  * <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing white-spaces<PHP> +  * <php>\is_numeric</php> will return <php>true</php> for numeric strings with trailing whitespace<PHP> 
-var_dump(is_numeric("123   "));  // bool(true)+var_dump(is_numeric("123   ")); // bool(true)
 </PHP> </PHP>
   * String offsets<PHP>   * String offsets<PHP>
 $str = 'The world'; $str = 'The world';
-var_dump($str['2str']);   // string(1) "e" with E_WARNING "Illegal string offset '2str'" +var_dump($str['4str']);   // string(1) "w" with E_WARNING "Illegal string offset '4str'" 
-var_dump($str['5.5']);    // string(1) "o" with E_WARNING "String offset cast occurred" if the secondary vote is accepted+var_dump($str['4.5']);    // string(1) "w" with E_WARNING "String offset cast occurred" if the secondary vote is accepted otherwise TypeError
 var_dump($str['string']); // TypeError var_dump($str['string']); // TypeError
 </PHP> </PHP>
   * Arithmetic operations<PHP>   * Arithmetic operations<PHP>
 var_dump(123 + "123   "); // int(246) var_dump(123 + "123   "); // int(246)
-var_dump(123 + "123abc"); // int(123) with E_WARNING "A non-numeric value encountered" +var_dump(123 + "123abc"); // int(246) with E_WARNING "A non-numeric value encountered" 
-var_dump(123 + "string"); // int(123) with E_WARNING "A non-numeric value encountered"+var_dump(123 + "string"); // TypeError
 </PHP> </PHP>
-  * The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing white-space to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP>+  * The <php>++</php> and <php>--</php> operators would convert numeric strings with trailing whitespace to integers or floats, as appropriate, rather than applying the alphanumeric increment rules<PHP>
 $d = "5 "; $d = "5 ";
 var_dump(++$d); // int(6) var_dump(++$d); // int(6)
Line 162: Line 175:
   * String-to-string comparisons<PHP>   * String-to-string comparisons<PHP>
 var_dump("123" == "123   "); // bool(true) var_dump("123" == "123   "); // bool(true)
 +</PHP>
 +  * Bitwise operations, e.g.<PHP>
 +var_dump(123 & "123  ");  // int(123)
 +var_dump(123 & "123abc"); // int(123) with E_WARNING "A non-numeric value encountered"
 +var_dump(123 & "abc");    // TypeError
 </PHP> </PHP>
  
Line 171: Line 189:
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-There are two backward incompatible changes: +There are three backward incompatible changes: 
-  * code relying on numerical strings with trailing white-spaces to be considered non-well-formed +  * Code relying on numerical strings with trailing whitespace to be considered non-well-formed. 
-  * code with liberal use of leading-numerical strings might need to use explicit type casts+  * Code with liberal use of leading-numeric strings might need to use explicit type casts
 +  * Code relying on the fact that <php>''</php> (an empty string) evaluates to <php>0</php> for arithmetic/bitwise operations.
  
 The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour: The first reason is a precise requirement and therefore should be checked explicitly. A small poly-fill to check for the previous <php>is_numeric()</php> behaviour:
Line 180: Line 199:
 Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.: Breaking the second reason will allow to catch various bugs ahead of time, and the previous behaviour can be obtained by adding explicit casts, e.g.:
 <PHP> <PHP>
-var_dump((int) "2px"); // int(2) +var_dump((int) "2px");     // int(2) 
-var_dump((float) "2px"); // float(2) +var_dump((float) "2px");   // float(2) 
-var_dump((int) "2.5px"); // int(2)+var_dump((int) "2.5px");   // int(2)
 var_dump((float) "2.5px"); // float(2.5) var_dump((float) "2.5px"); // float(2.5)
 </PHP> </PHP>
 +
 +The third reason already emitted an <php>E_WARNING</php>. We considered special-casing this to evaluate to <php>0</php>, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError. Therefore a TypeError will also be emitted in this case. The error can be avoided by explicitly checking for an empty string and changing it to <php>0</php>.
  
 ===== Proposed PHP Version ===== ===== Proposed PHP Version =====
Line 201: Line 222:
 ===== Future Scope ===== ===== Future Scope =====
   * Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]]   * Nikita Popov's [[rfc:string_to_number_comparison|PHP RFC: Saner string to number comparisons]]
-  * Adding an E_NOTICE for numerical strings with leading/trailing white-spaces +  * Adding an E_NOTICE for numerical strings with leading/trailing whitespace 
-  * Adding a flag to <php>\is_numeric</php> to accept or reject numerical strings with leading/trailing white-spaces+  * Adding a flag to <php>\is_numeric</php> to accept or reject numeric strings with leading/trailing whitespace
   * Align string offset behaviour with array offsets   * Align string offset behaviour with array offsets
-  * Promote remaining "Illegal string offset" warnings to Type Errors in PHP 9+  * Promote remaining warnings to Type Errors in PHP 9
   * Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php>   * Warn on illegal offsets when used within <php>isset()</php> or <php>empty()</php>
  
-===== Proposed Voting Choices ===== +===== Vote ===== 
-Per the Voting RFC, there would be a single Yes/No vote requiring a 2/3 majority.+Per the Voting RFC, there is a single Yes/No vote requiring a 2/3 majority for the main proposalA secondary Yes/No vote requiring a 50%+1 majority will decide whether float strings used as string offsets should continue to produce a warning (with different wording) instead of consistently becoming a TypeError. 
 + 
 +Primary vote: 
 +<doodle title="Accept Saner numeric string RFC proposal" auth="girgias" voteType="single" closed="true"> 
 +   * Yes 
 +   * No 
 +</doodle> 
 + 
 +Secondary vote: 
 +<doodle title="Should valid float strings for string offsets remain a warning" auth="girgias" voteType="single" closed="true"> 
 +   * Yes 
 +   * No 
 +</doodle>
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
-A pull request for a complete PHP interpreter patch, including test file, can be found here: https://github.com/php/php-src/pull/5762+A pull request for a complete PHP interpreter patch, including test files, can be found here: https://github.com/php/php-src/pull/5762
  
 A language specification patch still needs to be done. A language specification patch still needs to be done.
Line 230: Line 263:
  
 ===== Changelog ===== ===== Changelog =====
 +  * 2020-07-13: Tweak inconsistency in regards to Arithmetic/Bitwise ops
   * 2020-07-10: Major rewrite   * 2020-07-10: Major rewrite
   * 2020-07-02: Explain difference between array and string offsets, and how the RFC will impact string offsets   * 2020-07-02: Explain difference between array and string offsets, and how the RFC will impact string offsets
   * 2020-07-01: Add explicit cast behaviour for leading numeric strings   * 2020-07-01: Add explicit cast behaviour for leading numeric strings
   * 2020-06-28: Initial version   * 2020-06-28: Initial version
rfc/saner-numeric-strings.txt · Last modified: 2020/11/25 12:46 by girgias