rfc:typecheckingweak

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
rfc:typecheckingweak [2009/07/12 09:37] – More initial work (conversion logic) zeevrfc:typecheckingweak [2009/07/12 11:18] – Getting closer to 0.2... zeev
Line 52: Line 52:
  
 ^ value                   ^ string       ^ float      ^ int       ^ numeric   ^ bool      ^ ^ value                   ^ string       ^ float      ^ int       ^ numeric   ^ bool      ^
-^ true (boolean)          | <font color="red">fail</font>     | 1.0        | 1         | 1         | //as-is// |+^ true (boolean)          | //fail/    | 1.0        | 1         | 1         | //as-is// |
 ^ false (boolean)         | //fail//     | 0.0        | 0         | 0         | //as-is// | ^ false (boolean)         | //fail//     | 0.0        | 0         | 0         | //as-is// |
 ^ 0 (integer)             | '0'          | 0.0        | //as-is// | //as-is// | false     | ^ 0 (integer)             | '0'          | 0.0        | //as-is// | //as-is// | false     |
Line 72: Line 72:
 ^ array (0 => 12) (array) | //fail//     | //fail//   | //fail//  | //fail//  | //fail//  | ^ array (0 => 12) (array) | //fail//     | //fail//   | //fail//  | //fail//  | //fail//  |
 ^ NULL (NULL)             | empty string | 0.0        | 0         | 0         | false     | ^ NULL (NULL)             | empty string | 0.0        | 0         | 0         | false     |
- 
  
 //fail//  - designates failure, either emitting an error or throwing an exception //fail//  - designates failure, either emitting an error or throwing an exception
Line 78: Line 77:
 //as-is// - designates that the value is passed as-is, without conversion //as-is// - designates that the value is passed as-is, without conversion
  
 +In a nutshell, the conversion logic is quite similar to the one employed by internal functions, with one key difference - it is designed to fail in case of a conversion that is unlikely to 'make sense' Specifically, it breaks away from PHP's internal function behavior in two key places:
  
-Several people still have asked to expand array/object type hinting to cover other data types, which mostly ask for similar strict type checking (without any type juggling) as for arrays and objects, while also triggering an E_RECOVERABLE_ERROR for failed checks. However this means that the burden for explicit type casting is now on the user of the function/method. This RFC tries to address this issue. +  - String to int/float conversions - these will fail unless the string 'looks like an integeror 'looks like a float'
-===== Why is strict type checking problematic? ===== +  - Non-numeric strings cannot be converted to booleans.
- +
-Strict type checking does have the advantage that subtle bugs will be noticed more quickly and that function/method signatures will become yet more self documenting and therefore more expressive. Also doing these type checks based on the signature also means less code and better performance over having to hand code the validation +
- +
-Proponents of [[rfc:typecheckingstrictonly|only providing strict type checking]] say that for the most part variables are defined with the proper type unless they come from an outside source, which usually requires validation anyways, which is a perfect opportunity to type cast. +
- +
-That is to define a variable that contains a boolean, developer will probably do "$is_foo = true" and not "$is_foo = 0". While this may be true, it does means that developers using such strict type checking API's now require that users understand data types, which currently beginning developers do not necessarily need to. +
- +
-Furthermore quite often developers need to parse content out of strings and pass this to other methods. With strict type checking one is now forced to explicitly type cast. While its certainly doable, its also additional work that needs to be done while writing the code ("$foo_int = (int)substr($bar, 3, 10)"). Then again some might argue that this makes the code clearer. +
- +
-It also means that users of such strict typed API's will tend to simply cast and due to laziness (PHP is used for rapid development after all) might forgo validating first if the content is really what they expected. Without type checking the burden would be with the developer providing the API. Since its usually expected that an API is fairly often, it seems illogic to move this burden to the API users. More over due to this, a new kind of bug will be introduced due to over use of cast instead of hand coded parameter validation as is currently necessary. This could lead to even higher bug rates. +
- +
-As for outside sources needing validation. This is not always the case as most people do trust that the data returned from a database is in the expected format, even though for most RDBMS it will always be returned as string. Same applies to configuration files, which if defined in something else than PHP code will most likely only return strings, but who's values will usually not be validated. +
-===== Introducing weak type checking ===== +
- +
-In Ilia's recent [[http://news.php.net/php.internals/44573|strict type checking proposal]], he did include a "numeric" and a "scalar" data type, which tried to reducing the above noted issues with strict type checking. The "numeric" type would behave similar to the "is_numeric()" function in that it would not check the type, but would also accept a string with only numbers or a float ([[http://php.net/is_numeric|see the documentation]] for the exact definition). In the same way "scalar" would simply check if the parameter is not an array, object or resource+
- +
-However it does not cover all specific data types. Moreover "numeric" is not a known data type and is also significantly longer to type than "int". As a result it seems likely that "int" will be used by many developers even where "numeric" would suffice. As a result [[http://news.php.net/php.internals/44619|a new concept was introduced]] to simply allow a syntax to define if the check should be strict or weak. +
- +
-A weak check would examine the content of the variable in a way that would be more strict than the standard type juggling. If the weak check passes, the value would be type casted appropriately. If the weak check fails it would trigger an E_RECOVERABLE_ERROR just as in the strict case. +
- +
-Here is a short list of examples to illustrate the weak type checking. Note that just like the current array/object hints, a NULL is only allowed if the parameter defaults to NULL. +
- +
-^ value                   ^ string ^ float ^ int   ^ numeric ^ scalar ^ bool ^ array ^ +
-^ true (boolean)          | fail   | fail  | fail  | fail    | pass   | pass | fail  | +
-^ false (boolean)         | fail   | fail  | fail  | fail    | pass   | pass | fail  | +
-^ 0 (integer)             | fail   | pass  | pass  | pass    | pass   | pass | fail  | +
-^ 1 (integer)             | fail   | pass  | pass  | pass    | pass   | pass | fail  | +
-^ 12 (integer)            | fail   | pass  | pass  | pass    | pass   | fail | fail  | +
-^ 12 (double)             | fail   | pass  | fail  | pass    | pass   | fail | fail  | +
-^ 12.34 (double)          | fail   | pass  | fail  | pass    | pass   | fail | fail  | +
-^ 'true' (string)         | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ 'false' (string)        | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ '0' (string)            | pass   | fail  | fail  | pass    | pass   | pass | fail  | +
-^ '1' (string)            | pass   | fail  | fail  | pass    | pass   | pass | fail  | +
-^ '12' (string)           | pass   | fail  | fail  | pass    | pass   | fail | fail  | +
-^ '12abc' (string)        | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ '12.0' (string)         | pass   | fail  | fail  | pass    | pass   | fail | fail  | +
-^ '12.34' (string)        | pass   | fail  | fail  | pass    | pass   | fail | fail  | +
-^ 'foo' (string)          | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ array () (array)        | fail   | fail  | fail  | fail    | fail   | fail | pass  | +
-^ array (0 => 12) (array) | fail   | fail  | fail  | fail    | fail   | fail | pass  | +
-^ NULL (NULL)             | fail   | fail  | fail  | fail    | fail   | fail | fail  | +
-^ '' (string)             | pass   | fail  | fail  | fail    | pass   | fail | fail  |+
  
-Further more weak type checking could also be useful once we have generic type casting support via some magic type cast method along the lines of __toString(). In this case the weak type checking would also allow an object to pass if it provides the relevant casting method, though it would then of course automatically cast the object to the given type. 
  
-===== Proposed API =====+===== Benefits =====
  
-<code php>+There are numerous benefits to introducing type-checking for scalar types in PHP:
  
-// "+' denotes strict and "-" denotes weak type checking +  - **Simplication of parameter sanitizing**.  The need for explicitly casting arguments ($arg = (int) $arg;) or conditional type-check failures (if (!is_numeric($arg)) {...}) will be much reduced, and may be eliminated. 
-function add_user(+string name, +string phone_number, -int age) { .. }+  **Code readability**.  Reading the implementation code may be easier with the clear knowledge that an argument is of a certain type. 
 +  - **Clearer contract between caller and callee**.  By the function signature alone - it will be possible for the caller to know what kind of value is expected by the called function. 
 +  - **Optimization**.  Using the information about typed argumentsand the fact they are always ensured to be of that type it may be possible to use this information to perform certain opcode-level optimizations. 
 +  - **Security**.  In certain cases, using typed arguments may help discover and prevent security issues.
  
-// "!" denotes strict type checking and "?" denotes weak type checking 
-function add_user(string name, !string phone_number, ?int age) { .. } 
  
-// "~" denotes weak type checking +===== Comparison with Strict Typing =====
-function add_user(string name, string phone_number, ~int age) { .. }+
  
-// "()" denotes weak type checking +The main 'contender' to this weak typing RFC is the Strict Typing RFC - which is based on a strict comparison of the zval.type value.  As such, it introduces an entirely new semantics to PHP, especially around parameter passing.  Today, the zval.type is used only by a handful of functions (is_int() et algettype())and the identity operator.  The former are much more rarely than their more 'lax' siblings (is_numeric()) which are typically more appropriate;  While the latter is typically used for specialized cases, e.g. when dealing with a function returning an integer, and having to tell boolean false apart.  It is therefore argued that extending a zval.type-based checks into parameter passing - a center-piece of the language - will inadvertently change the theme of the language, and the expected 'lax' type checking behavior expected from it today.
-function add_user(string namestring phone_number, (intage.. }+
  
-// Keep in mind that the "modifier" can either be placed at the start or end +In that context, it's important to mention that the two most common sources for data going into PHP - input data (_GET_POSTetc.and data coming from the database - are almost exclusively typed as strings While some do type conversion during the input sanitizing phase - that is not always the case, especially with data coming from the database.  Strict Typing is inherently incompatible with this concept, in the sense that it assumes the underlying data type (zval.type) is identical to the semantics of the value.  It does not come to say that the two cannot be used together - but they are a pretty bad fit.
-function add_user(string! namestring! phone_numberint? age.. }+
  
-// Furthermore one of the two modifiers could be the default +Furthermore - it is important to notice that the sole difference between Strict Typing and this proposed solution has to do with what happens **outside** the scope of the type-argumented function In other words all the benefits for the function code itself (readabilitycode reductionoptimization, etc.is 100.0% identical.  The semantics of what happens during the parameter-passing stage is what's different.
-// optionally + is the default +
-function add_user(+string name, string phone_number, -int age) { .+
-// optionally is the default +
-function add_user(+string name+string phone_numberint age.. }+
  
-</code>+Arguably - because Strict Typing is likely to cause a lot of 'false positives', i.e. - failures in pieces of code that actually have nothing wrong in them - it is also likely that these would be solved by explicit casting during the function call;  Since PHP's casting will happily convert just about any type to any other type - this solution would be inferior to the solution proposed here - that is more likely to encourage code without explicit casting and therefore help weed out more issues and bugs.
  
 ===== Changelog ===== ===== Changelog =====
rfc/typecheckingweak.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1