rfc:coercive_sth

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
rfc:coercive_sth [2015/03/10 11:49]
zeev Minor tweaks
rfc:coercive_sth [2020/08/01 23:52] (current)
carusogabriel Status is "Declined"
Line 1: Line 1:
 ====== PHP RFC: Coercive Types for Function Arguments ====== ====== PHP RFC: Coercive Types for Function Arguments ======
-  * Version: 0.50+  * Version: 0.60
   * Date: 2015-02-27   * Date: 2015-02-27
   * Authors: Zeev Suraski <​zeev@php.net>,​ Francois Laupretre <​francois@php.net>,​ Dmitry Stogov <​dmitry@php.net>​   * Authors: Zeev Suraski <​zeev@php.net>,​ Francois Laupretre <​francois@php.net>,​ Dmitry Stogov <​dmitry@php.net>​
-  * Status: ​Under discussion+  * Status: ​Declined
   * First Published at: http://​wiki.php.net/​rfc/​coercive_sth   * First Published at: http://​wiki.php.net/​rfc/​coercive_sth
- 
-===== Preamble ===== 
- 
-This version is published as '​near-final',​ and with the exception of potential minor modifications,​ is what will be proposed for a vote. 
  
 ===== Background & Summary ===== ===== Background & Summary =====
Line 85: Line 81:
   - Unlike user-land scalar type hints, internal functions will accept nulls as valid scalars. ​ Based on preliminary testing, this is an extremely common use case, most often used in conjunction with uninitialized values. Disallowing it - language-wide for all internal functions - may be too big of a shift. ​ Therefore, internal functions receiving a NULL (non-)value for a scalar will accept it, and convert it to 0 or an empty string in the same way PHP 5 does.  Note that this discrepancy may be resolved in the future - by introducing new options for arguments that would explicitly reject NULL values, in the same manner user-land STH do.  However, since this requires substantial auditing of internal functions - especially ones that have default values but don't explicitly declare themselves as accepting NULLs - it's outside the scope of this RFC and will be revisited for 7.1.  Note that if the [[Nullable Types RFC|https://​wiki.php.net/​rfc/​nullable_types]] is accepted it will further reduce this discrepancy,​ by allowing user-land functions and internal functions the same level of granularity in terms of accepting or rejecting NULL values for function arguments.   - Unlike user-land scalar type hints, internal functions will accept nulls as valid scalars. ​ Based on preliminary testing, this is an extremely common use case, most often used in conjunction with uninitialized values. Disallowing it - language-wide for all internal functions - may be too big of a shift. ​ Therefore, internal functions receiving a NULL (non-)value for a scalar will accept it, and convert it to 0 or an empty string in the same way PHP 5 does.  Note that this discrepancy may be resolved in the future - by introducing new options for arguments that would explicitly reject NULL values, in the same manner user-land STH do.  However, since this requires substantial auditing of internal functions - especially ones that have default values but don't explicitly declare themselves as accepting NULLs - it's outside the scope of this RFC and will be revisited for 7.1.  Note that if the [[Nullable Types RFC|https://​wiki.php.net/​rfc/​nullable_types]] is accepted it will further reduce this discrepancy,​ by allowing user-land functions and internal functions the same level of granularity in terms of accepting or rejecting NULL values for function arguments.
   - As we don't clearly define a date for switching E_DEPRECATE to fatal errors, the RFC states that such decision cannot, in any cases, be made before a delay of 5 years after the first public release of a PHP distribution containing the STH features described here. This statement is voted upon as the rest of the RFC. So, it cannot be violated without a new vote on this specific subject. This statement is provided as a guarantee to the developers that they will have ample time to fix their code.   - As we don't clearly define a date for switching E_DEPRECATE to fatal errors, the RFC states that such decision cannot, in any cases, be made before a delay of 5 years after the first public release of a PHP distribution containing the STH features described here. This statement is voted upon as the rest of the RFC. So, it cannot be violated without a new vote on this specific subject. This statement is provided as a guarantee to the developers that they will have ample time to fix their code.
 +
 +=== Impact on Real World Applications ===
 +
 +The patch has been tested with numerous real world apps and frameworks, to attempt to gauge the impact the changes to the internal functions rules would have:
 +
 +  * Drupal 7 homepage: ​ One new E_DEPRECATED warning, which seems to catch faulty-looking code
 +  * Drupal 7 admin interface (across the all pages): ​ One  new E_DEPRECATED warning, which again seems to catch a real bug – stripslsahes() operating on a boolean.
 +  * Magento 1.9 homepage (w/ Magento'​s huge sample DB):  One new E_DEPRECATED warning, again, seems to be catching a real bug of ‘false’ being fed as argument 1 of in json_decode() – which is expecting a string full of json there.
 +  * WordPress 3.4 homepage: ​ One new E_DEPRECATED warning, again, seems to be catching a real bug of ‘false’ being fed as argument 1 of substr().
 +  * Zend Framework 2 skeleton app:  Zero new E_DEPRECATED warnings.
 +  * Symfony ACME app:  Zero new E_DEPRECATED warnings (across the app).
 +  * PHPUnit: ​ Several E_DEPRECATED issues that were fixed in a matter of hours.
 +
 +The negative impact on real world apps appears to be very, very limited - which is consistent with the premise that the Coercive STH RFC aims to allow the conversions which are common and most likely sensible, and block the ones which are likely faulty - which means we shouldn'​t see too many of those in real world apps.
 +
 +In addition, the patch was tested with numerous unit-test suites; ​ PHP's test suite shows a lot of new errors, however, the majority of them stem from tests purposely designed to check '​insensible'​ conversions (e.g. readgzfile($filename,​ -10.5)), and not code blocks that we're ever likely to bump into in the real world.
 +
 +The Symfony and Zend Framework test suites were also run and showed new deprecation errors; ​ Based on very preliminary analysis, it seems that most of them either fall into the same bucket as the PHP unit tests above (purposely designed to test insensible conversions),​ or seem to point out issues that may translate into real world bugs, and that can be fixed in a relatively small number of changes.
 +
 +All in all, the signal to noise ratio of turning the new coercive rules for the entirety of PHP seems to be very good.
  
 === Difference from PHP 5 === === Difference from PHP 5 ===
Line 92: Line 108:
 ^        ^  Value Type  ^^^^ ^        ^  Value Type  ^^^^
 ^ Hint   ^ boolean ​  ^ int       ^ float      ^  string ​   ^ ^ Hint   ^ boolean ​  ^ int       ^ float      ^  string ​   ^
-^ bool   | Unchanged | Unchanged | Reject ​    ​| ​Reject ​    +^ bool   | Unchanged | Unchanged | Reject ​    ​| ​Unchanged  ​
-^ int    | Reject ​   | Unchanged | Restrict* ​ | Restrict | +^ int    | Reject ​   | Unchanged | Restrict* ​ | Restrict†  | 
-^ float  | Reject ​   | Unchanged | Unchanged ​ | Unchanged ​ |+^ float  | Reject ​   | Unchanged | Unchanged ​ | Restrict† ​ |
 ^ string | Reject ​   | Unchanged | Unchanged ​ | Unchanged ​ | ^ string | Reject ​   | Unchanged | Unchanged ​ | Unchanged ​ |
  
 * Only accept inputs that contain no significant digits after the decimal point. * Only accept inputs that contain no significant digits after the decimal point.
  
-And the changes in the way numeric ​strings are handled :+† Numeric ​strings ​that have non-blank alphanumeric characters (e.g., "7 dogs", "3.14 pizzas"​) ​are no longer accepted.
  
-  * Trailing blanks are supported as before, +Note that in all caseswhen conversion occurs ​its rules are identical ​to those in PHP 5.  PHP 7 will accept fewer types of inputs as valid, but will apply the same conversion rules to the ones that are accepted.
-  * but non-blank trailing characters now cause the data to be rejected ("7 dogs" case) +
- +
-Please note this key point: Compared ​to PHP 5 ruleswe restrict the number of accepted cases but we never change ​the way data is converted. In other words, when a value is accepted, it is handled exactly the same way it would have been in PHP 5.+
  
  
Line 111: Line 124:
 Here are examples of conversions which, while still providing the same results as in PHP 5, now also raise an E_DEPRECATED error : Here are examples of conversions which, while still providing the same results as in PHP 5, now also raise an E_DEPRECATED error :
  
-  false -> int        # No more conversion from bool +  false -> int           ​# No more conversion from bool 
-  true -> string ​     # No more conversion from bool +  true -> string ​        ​# No more conversion from bool 
-  7.5 -> int          # 7.5 cannot be converted to an integer without data loss +  7.5 -> int             ​# 7.5 cannot be converted to an integer without data loss 
-  "​8.2"​ -> int        # "​8.2"​ cannot be converted to an integer without data loss +  "​8.2"​ -> int           ​# "​8.2"​ cannot be converted to an integer without data loss 
-  4.3 -> bool         ​# No more conversion from float to bool +  4.3 -> bool            # No more conversion from float to bool 
-  "​foo"​ -> bool       # No more conversion from string ​to bool +  "7 dogs" -> int        Non-blank trailing ​characters no longer supported 
-  "7 dogs" -> int     ​Trailing ​characters no longer supported +  "3.14 pizzas" -> float Non-blank trailing characters ​no longer supported  
-  " ​ 0" -> int        ​Leading spaces ​no longer supported +
-  "​12 " -> int        # Trailing spaces no longer supported+
  
 ==== Potential Future Changes to Implicit Casting Rules ==== ==== Potential Future Changes to Implicit Casting Rules ====
  
 While outside the scope of this RFC, the introduction of the new coercive-yet-more-restrictive rule-set may be considered for additional areas in PHP, most notably implicit casting. While outside the scope of this RFC, the introduction of the new coercive-yet-more-restrictive rule-set may be considered for additional areas in PHP, most notably implicit casting.
-For example, today, the result of "​Apples"​ + "​Oranges"​ is 0, because the + operator implicitly casts anything into a number. ​ It could be imagined that in the future, the + operator will accept only values that would fit into an int or float STH, and reject ​others. ​ Users would still be able to use permissive explicit casting ($foo = (int) "​Apples";​ would still assign 0 into $foo), but the risk sometimes associated with implicit casting will be eliminated.+For example, today, the result of "​Apples"​ + "​Oranges"​ is 0, because the + operator implicitly casts anything into a number. ​ It could be imagined that in the future, the + operator will accept only values that would fit into an int or float STH, and warn users about others ​(realistically,​ most probably through E_STRICT).  Users would still be able to use permissive explicit casting ($foo = (int) "​Apples";​ would still assign 0 into $foo), but the risk sometimes associated with implicit casting will be eliminated.
  
  
 ===== Ability to add Strict STH in the future ===== ===== Ability to add Strict STH in the future =====
  
-It should be noted that nothing in this RFC conflicts with the ability to add Strict STH in the future. ​ The ability to add a 2nd mode via declare() will always be there. ​ We do believe that demand for it will greatly diminish with the introduction of these scalar type hints, but in case it doesn'​t - there'​ll be no technical blocks preventing us from adding it in the future, even in the 7.x lifetime.+It should be noted that nothing in this RFC conflicts with the ability to add Strict STH in the future. ​ The ability to add a 2nd mode via declare() ​or some other mechanism ​will always be there. ​ We do believe that demand for it will greatly diminish with the introduction of these scalar type hints, but in case it doesn'​t - there'​ll be no technical blocks preventing us from adding it in the future, even in the 7.x lifetime.
  
  
Line 136: Line 148:
 Numerous community members have invested substantial effort into creating another comprehensive RFC, that proposes to introduce STH into PHP [[https://​wiki.php.net/​rfc/​scalar_type_hints_v5|Scalar Type Hints RFC v0.5 ("Dual Mode RFC"​)]]. ​ However, we believe the proposal in this RFC is better, for several different reasons: Numerous community members have invested substantial effort into creating another comprehensive RFC, that proposes to introduce STH into PHP [[https://​wiki.php.net/​rfc/​scalar_type_hints_v5|Scalar Type Hints RFC v0.5 ("Dual Mode RFC"​)]]. ​ However, we believe the proposal in this RFC is better, for several different reasons:
  
-  - **Single Mode.** ​ Even though the Dual Mode RFC presents a novel idea about how to allow developers to choose which mode they'd like to use, and use different modes in different parts of the app, it still introduces the burden of two different modes. ​ Two different rule-sets that need to be learned, may increase the language'​s complexity. ​ Further, the two sets can cause the same functions to behave differently depending on where they'​re being called, and potentially a new class of bugs stemming from developers not realizing which mode they'​re in in a particular file.  This RFC is unaffected by these issues, as it presents a single, composite rule set.+  - **Single Mode.**  Thanks to the fact this RFC proposes a single mode and enables it across the board for any code that's run, you get the benefit of stricter-yet-sensible rules overnight. ​ Thanks to the good signal-to-noise ratio, moving to PHP 7 is likely to help people find real world bugs, without having to proactively enable stricter modes and without having to introduce a lot of changes to their code. 
 +  - **Smaller cognitive burden.**  Even though the Dual Mode RFC presents a novel idea about how to allow developers to choose which mode they'd like to use, and use different modes in different parts of the app, it still introduces the burden of two different modes. ​ Two different rule-sets that need to be learned increase the language'​s complexity. ​ Further, the two sets can cause the same functions to behave differently depending on where they'​re being called, and potentially a new class of bugs stemming from developers not realizing which mode they'​re in in a particular file.  This RFC is unaffected by these issues, as it presents a single, composite rule set.
   - **Too strict may lead to too lax.** In the Dual Mode RFC, when in Strict mode, in many cases, functions would reject values that, semantically,​ are acceptable. ​ For example, a "​32"​ (string) value coming back from an integer column in a database table, would not be accepted as valid input for a function expecting an integer. ​ Since semantically the developer is interested in this argument-passing succeeding, they would have the choice of either removing the integer STH altogether, or, more likely, explicitly casting the value into an integer. ​ This would have the opposite of the desired outcome of strict STHs - as explicit casts ($foo = (int) $foo;) always succeed, and would happily convert "100 dogs", "​Apples"​ and even arrays and booleans into an integer. ​ Further, since already today, internal functions employ coercion rules that are more restrictive than PHP's explicit casting, pushing people towards explicit casting will actually make things **worse** in case developers opt for explicit casting as they pass values in an internal function call.    - **Too strict may lead to too lax.** In the Dual Mode RFC, when in Strict mode, in many cases, functions would reject values that, semantically,​ are acceptable. ​ For example, a "​32"​ (string) value coming back from an integer column in a database table, would not be accepted as valid input for a function expecting an integer. ​ Since semantically the developer is interested in this argument-passing succeeding, they would have the choice of either removing the integer STH altogether, or, more likely, explicitly casting the value into an integer. ​ This would have the opposite of the desired outcome of strict STHs - as explicit casts ($foo = (int) $foo;) always succeed, and would happily convert "100 dogs", "​Apples"​ and even arrays and booleans into an integer. ​ Further, since already today, internal functions employ coercion rules that are more restrictive than PHP's explicit casting, pushing people towards explicit casting will actually make things **worse** in case developers opt for explicit casting as they pass values in an internal function call. 
   - **Smooth integration with Data Sources**. ​ PHP uses strings extensively across the language, and in most cases, data sources always feed data into PHP as strings. ​ PHP applications rely extensively on internal type juggling to convert that string-based data according to the needed context. ​ Strict zval.type based STH effectively eliminates this behavior, moving the burden of worrying about type conversion to the user.  The solution proposed in this RFC allows code that relies on type coercion to Just Work when the values are sensible, but fail (and appropriately warn the developer) otherwise.   - **Smooth integration with Data Sources**. ​ PHP uses strings extensively across the language, and in most cases, data sources always feed data into PHP as strings. ​ PHP applications rely extensively on internal type juggling to convert that string-based data according to the needed context. ​ Strict zval.type based STH effectively eliminates this behavior, moving the burden of worrying about type conversion to the user.  The solution proposed in this RFC allows code that relies on type coercion to Just Work when the values are sensible, but fail (and appropriately warn the developer) otherwise.
 +  - **Forward compatibility for internal function calls**. ​ Codebases which will be tested & improved to work on PHP 7, can run without a problem on PHP 5 and benefit from the improved code quality.
  
  
Line 163: Line 176:
   * **Static Analysis** - Analyzing code without running it, attempting to derive conclusions about security, performance,​ etc.   * **Static Analysis** - Analyzing code without running it, attempting to derive conclusions about security, performance,​ etc.
  
-===== Proposed Voting Choices ​===== +===== Vote ===== 
-The voting choices ​would be yes/no.+The voting choices ​are yes (in favor for accepting this RFC for PHP 7) or no (against it).
 The RFC proposes a very substantial change to PHP's coercion rules, which may evolve to affect implicit typing in the future. The RFC proposes a very substantial change to PHP's coercion rules, which may evolve to affect implicit typing in the future.
-It absolutely requires a 2/3 majority, with the hope of reaching as close as possible to consensus.+It absolutely requires a 2/3 majority, with the hope of reaching as close as possible to consensus. ​ The vote starts on March 11th, and will end two weeks later, on March 25th. 
 + 
 +<doodle title="​coercive_sth"​ auth="​zeev"​ voteType="​single"​ closed="​true">​ 
 +   * Yes 
 +   * No 
 +</​doodle>​
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
 https://​github.com/​php/​php-src/​pull/​1125/​files https://​github.com/​php/​php-src/​pull/​1125/​files
rfc/coercive_sth.1425988159.txt.gz · Last modified: 2017/09/22 13:28 (external edit)