rfc:typecheckingstrictandweak

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
rfc:typecheckingstrictandweak [2010/01/08 21:11] – external edit 127.0.0.1rfc:typecheckingstrictandweak [2012/03/26 13:33] – [Request for Comments: Strict and weak parameter type checking] add link to discussion on internals danielc
Line 2: Line 2:
   * Version: 0.5   * Version: 0.5
   * Date: 2009-06-03   * Date: 2009-06-03
-  * Author: Lukas Smith <smith@pooteeweet.org>+  * Author: Lukas Smith <smith@pooteeweet.org>, Zeev Suraski <zeev@php.net>
   * Status: In discussion   * Status: In discussion
   * First Published at: http://wiki.php.net/rfc/typechecking   * First Published at: http://wiki.php.net/rfc/typechecking
 +  * Discussion: http://thread.gmane.org/gmane.comp.php.devel/61324
  
-This RFC is provide a proposal for both weak and strict parameter type checking for function/method parameters and why providing only strict type checking would be a mistake.+This RFC provides a proposal for auto-converting parameter type checking for function/method parameters and the disadvantages of introducing strict scalar type hinting to PHP.
  
 ===== Introduction ===== ===== Introduction =====
Line 13: Line 14:
 ===== Why is strict type checking problematic? ===== ===== Why is strict type checking problematic? =====
  
-Strict type checking does have the advantage that subtle bugs will be noticed more quickly and that function/method signatures will become yet more self documenting and therefore more expressiveAlso doing these type checks based on the signature also means less code and better performance over having to hand code the validation+PHP'type system was designed from the ground up so that scalars auto-convert depending on the context That feature became an inherent property of the language, and other than a couple of exceptions - the internal type of a scalar value is not exposed to end users.  The most important exception is the === operator - however, this operator is used in very specific situations, and obviously only in the context of comparisons.  While there are other exceptions (e.g. gettype()) - in the vast majority of scenarios in PHP, scalar types auto-convert to the necessary type depending on the context.
  
-Proponents of [[rfc:typecheckingstrictonly|only providing strict type checking]] say that for the most part variables are defined with the proper type unless they come from an outside source, which usually requires validation anyways, which is a perfect opportunity to type cast.+For that reason, developers - even seasoned ones - will feel very comfortable sending the string "123" to a function that semantically expects an integer.  If they know how PHP works internally - they rely on the fact the function will auto-convert the type to an integer.  If they don't (and many don't) - they don't even think about the fact that their "123" is a string.  It's a meaningless implementation detail.
  
-That is to define a variable that contains a boolean, developer will probably do "$is_foo = true" and not "$is_foo = 0". While this may be true, it does means that developers using such strict type checking API'now require that users understand data types, which currently beginning developers do not necessarily need to.+For these reasons - strict type checking is an alien concept to PHP.  It goes against PHP'type system by making the implementation detail (zval.type) become much more of a front-stage actor.
  
-Furthermore quite often developers need to parse content out of strings and pass this to other methods. With strict type checking one is now forced to explicitly type castWhile its certainly doableits also additional work that needs to be done while writing the code ("$foo_int = (int)substr($bar, 3, 10)")Then again some might argue that this makes the code clearer.+In addition, strict type checking puts the burden of validating input on the callers of an API, instead of the API itself.  Since typically functions are designed so that they're called numerous times - requiring the user to do necessary conversions on the input before calling the function is counterintuitive and inefficient It makes much more senseand it'also much more efficient - to move the conversions to be the responsibility of the called function instead It's also more likely that the author of the function, the one choosing to use scalar type hints in the first place - would be more knowledgeable about PHP's types than those using his API.
  
-It also means that users of such strict typed API's will tend to simply cast and due to laziness (PHP is used for rapid development after all) might forgo validating first if the content is really what they expected. Without type checking the burden would be with the developer providing the APISince its usually expected that an API is fairly oftenit seems illogic to move this burden to the API usersMore over due to thisa new kind of bug will be introduced due to over use of cast instead of hand coded parameter validation as is currently necessary. This could lead to even higher bug rates.+Finally, strict type checking is inconsistent with the way internal (C-based) functions typically behave For examplestrlen(123) returns 3, exactly like strlen('123') sqrt('9') also return 3exactly like sqrt(9) Why would userland functions (PHP-based) behave any different?
  
-As for outside sources needing validation. This is not always the case as most people do trust that the data returned from a database is in the expected format, even though for most RDBMS it will always be returned as stringSame applies to configuration files, which if defined in something else than PHP code will most likely only return strings, but who's values will usually not be validated+Proponents of strict type hinting often argue that input coming from end users (forms) should be filtered and sanitized anyway, and that this makes for a great opportunity to do necessary type conversions.  While that may be true, it covers a small subset of type checking scenarios.  For example, it doesn't cover input coming from 'trusted' sources like a database or files.  It also doesn't account for the many developers who are simply unaware of PHP's internal type systemor that presently don't see the need to explicitly do type conversions even if they do sanitize their input Not to mention those that don't sanitize their input at all..
-===== Introducing weak type checking =====+===== Introducing 'weak' or auto-converting type hinting =====
  
-In Ilia's recent [[http://news.php.net/php.internals/44573|strict type checking proposal]], he did include a "numeric" and a "scalar" data type, which tried to reducing the above noted issues with strict type checkingThe "numeric" type would behave similar to the "is_numeric()" function in that it would not check the type, but would also accept a string with only numbers or a float ([[http://php.net/is_numeric|see the documentation]] for the exact definition). In the same way "scalar" would simply check if the parameter is not an array, object or resource.+The proposed solution implements a 'weaker' kind of type hinting - which arguably is more consistent with the rest of PHP'type system. 
 +Instead of validating the zval.type property only - it uses rules in line with the spirit of PHP and it's auto-conversion system to look into the value in question, and determine whether it 'makes sense' in the required context If it does - it will be converted to the required type (if it isn't already of that type);  If it doesn't - an error will be generated.
  
-However it does not cover all specific data types. Moreover "numeric" is not known data type and is also significantly longer to type than "int". As a result it seems likely that "int" will be used by many developers even where "numeric" would sufficeAs a result [[http://news.php.net/php.internals/44619|a new concept was introduced]] to simply allow syntax to define if the check should be strict or weak.+For example, consider function getUserById() that expects an integer value With [[http://news.php.net/php.internals/44573|strict type hinting]], if you feed it with $id, which happens to hold piece of data from the database with the string value "42", it will be rejected.  With auto-converting type hinting, PHP will determine that $id is a string that has an integer format - and it is therefore suitable to be fed into getUserById().  It will then convert the value it to an integer, and pass it on to getUserById().  That means that getUserById() can rely that it will **always** get its input as an integer - but the caller will still have the luxury of sending non-integer but integer-formatted input to it.
  
-A weak check would examine the content of the variable in a way that would be more strict than the standard type jugglingIf the weak check passes, the value would be type casted appropriately. If the weak check fails it would trigger an E_RECOVERABLE_ERROR just as in the strict case.+The key advantages of the proposed solutions are that there's less burden on those calling APIs (fail only when really necessary)It should be noted that most of the time coding is spend consuming existing API's and not creating new ones. Furthermore it's consistent with the rest of PHP in the sense that most of PHP does not care about exact matching zval types, and perhaps most importantly - it does not require everyone to become intimately familiar with PHP's type system.
  
-Here is short list of examples to illustrate the weak type checking. Note that just like the current array/object hintsa NULL is only allowed if the parameter defaults to NULL.+Furthermore, weak type hinting may be step on the way to create generic type casting magic methods along the lines of %%__toString()%%allowing objects to auto-convert to scalar types as necessary (TBD).
  
 +===== Option (1): current type juggeling rules with E_STRICT on data loss =====
 +
 +The auto-conversion would follow the current type juggeling rules. However in case of a cast that leads to data loss (like casting from '123abc' to an integer leading to 123 an E_STRICT notice would be raised.
 +
 +For reference, here's the current behavior of zend_parse_parameters, used in most internal functions.
 +
 +^ value                     ^ string ^ float  ^ int    ^ bool   ^ array  ^
 +^ true (boolean)            | pass   | pass   | pass   | pass   | fail   |
 +^ false (boolean)           | pass   | pass   | pass   | pass   | fail   |
 +^ 0 (integer)               | pass   | pass   | pass   | pass   | fail   |
 +^ 1 (integer)               | pass   | pass   | pass   | pass   | fail   |
 +^ 12 (integer)              | pass   | pass   | pass   | pass   | fail   |
 +^ 12 (double)               | pass   | pass   | pass   | pass   | fail   |
 +^ 12.34 (double)            | pass   | pass   | pass   | pass   | fail   |
 +^ 'true' (string)           | pass   | fail   | fail   | pass   | fail   |
 +^ 'false' (string)          | pass   | fail   | fail   | pass   | fail   |
 +^ '0' (string)              | pass   | pass   | pass   | pass   | fail   |
 +^ '1' (string)              | pass   | pass   | pass   | pass   | fail   |
 +^ '12' (string)             | pass   | pass   | pass   | pass   | fail   |
 +^ '12abc' (string)          | pass   | pass   | pass   | pass   | fail   |
 +^ '12.0' (string)           | pass   | pass   | pass   | pass   | fail   |
 +^ '12.34' (string)          | pass   | pass   | pass   | pass   | fail   |
 +^ 'foo' (string)            | pass   | fail   | fail   | pass   | fail   |
 +^ array   (array)           | fail   | fail   | fail   | fail   | pass   |
 +^ array(0=>12) (array)      | fail   | fail   | fail   | fail   | pass   |
 +^ NULL (NULL)               | pass   | pass   | pass   | pass   | fail   |
 +^ %%''%% (string)               | pass   | fail   | fail   | pass   | fail   |
 +===== Option (2): new type juggeling rules with E_STRICT on data loss =====
 +
 +The conversion rules proposed here are slightly stricter than PHP's auto-conversion rules.  Mainly, the string "abc" will be rejected as valid input for an integer type-hinted argument, and not be passed-on as zero and it would not auto-convert from/to array's.
 +
 +An E_STRICT would be raised if due to auto-conversion there would be data loss. So for example "2", 2 as well as 2.5 would convert to a float if one is expected. However 2.5 would not silently convert to an integer if one is expected. Similarly "123abc" would not convert to an integer or float. This might also be a potential approach to type juggling in general in some future version of PHP.
 +
 +Here is a short list of examples to illustrate the weak type hinting. Note that just like the current array/object hints, a NULL is only allowed if the parameter defaults to NULL.
 +
 +(Note the following table should probably be reviewed in light of recent updates to this RFC)
 ^ value                   ^ string ^ float ^ int   ^ numeric ^ scalar ^ bool ^ array ^ ^ value                   ^ string ^ float ^ int   ^ numeric ^ scalar ^ bool ^ array ^
 ^ true (boolean)          | fail   | fail  | fail  | fail    | pass   | pass | fail  | ^ true (boolean)          | fail   | fail  | fail  | fail    | pass   | pass | fail  |
Line 54: Line 92:
 ^ array (0 => 12) (array) | fail   | fail  | fail  | fail    | fail   | fail | pass  | ^ array (0 => 12) (array) | fail   | fail  | fail  | fail    | fail   | fail | pass  |
 ^ NULL (NULL)             | fail   | fail  | fail  | fail    | fail   | fail | fail  | ^ NULL (NULL)             | fail   | fail  | fail  | fail    | fail   | fail | fail  |
-^ '' (string)             | pass   | fail  | fail  | fail    | pass   | fail | fail  | +%%''%% (string)         | pass   | fail  | fail  | fail    | pass   | fail | fail  |
- +
-Further more weak type checking could also be useful once we have generic type casting support via some magic type cast method along the lines of __toString(). In this case the weak type checking would also allow an object to pass if it provides the relevant casting method, though it would then of course automatically cast the object to the given type. +
- +
-===== Proposed API ===== +
- +
-<code php> +
- +
-// "+' denotes strict and "-" denotes weak type checking +
-function add_user(+string name, +string phone_number, -int age) { .. } +
- +
-// "!" denotes strict type checking and "?" denotes weak type checking +
-function add_user(string name, !string phone_number, ?int age) { .. } +
- +
-// "~" denotes weak type checking +
-function add_user(string name, string phone_number, ~int age) { .. } +
- +
-// "()" denotes weak type checking +
-function add_user(string name, string phone_number, (int) age) { .. }+
  
-// Keep in mind that the "modifier" can either be placed at the start or end +===== Option (3): current type juggeling rules with E_FATAL on data loss =====
-function add_user(string! name, string! phone_number, int? age{ .. }+
  
-// Furthermore one of the two modifiers could be the default +The auto-conversion would follow the current type juggeling rules. However in case of a cast that leads to data loss (like casting from '123abc' to an integer leading to 123 an E_FATAL notice would be raised
-// optionally + is the default +===== Patch =====
-function add_user(+string name, string phone_number, -int age) { .. } +
-// optionally - is the default +
-function add_user(+string name, +string phone_number, int age) { .. }+
  
-</code>+  * {{:rfc:auto_converting_type_hinting.diff.txt}} - presently implements auto-converting type hinting without any warning on data loss.
  
 ===== Changelog ===== ===== Changelog =====
 +  * restructured to provide 3 options (two with current type juggeling rules and E_STRICT or E_FATAL on data loss conversion and one with new type juggeling rules and E_STRICT on data loss.
rfc/typecheckingstrictandweak.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1