rfc:typecheckingweak

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:typecheckingweak [2009/07/12 09:37] – More initial work (conversion logic) zeevrfc:typecheckingweak [2017/09/22 13:28] (current) – external edit 127.0.0.1
Line 1: Line 1:
-====== Request for Comments: Weak parameter type checking and type ensurance  ======+====== Request for Comments: Parameter Type Enforcement ======
   * Version: 0.1   * Version: 0.1
   * Date: 2009-07-12   * Date: 2009-07-12
-  * Author: Zeev Suraski <zeev@zend.com> Lukas Smith <smith@pooteeweet.org>+  * Author: Zeev Suraski <zeev@zend.com>, Guillaume Rossolini <g.rossolini@gmail.com>, Lukas Smith <smith@pooteeweet.org>
   * Status: In discussion   * Status: In discussion
   * First Published at: http://wiki.php.net/rfc/typechecking   * First Published at: http://wiki.php.net/rfc/typechecking
  
 This RFC suggests a system designed to enable functions to denote designated types for arguments, and conversion rules (as well as pass/fail criteria) for when calling code passes arguments from different types. This RFC suggests a system designed to enable functions to denote designated types for arguments, and conversion rules (as well as pass/fail criteria) for when calling code passes arguments from different types.
-The rationale behind using such 'weak' type checking system (in general, and in particular when compared with 'strict' type checking) is also discussed.+The rationale behind using such type-enforcement system (in general, and in particular when compared with 'strict' type checking) is also discussed.
  
 ===== Background ===== ===== Background =====
  
 Circa 2002, 'type hints' were added to the then-in-development PHP 5.0.  Type hints (possibly misnamed) were designed to allow functions and methods denote the specific kinds of objects they can handle - primarily to accommodate for PHP's much more complex, advanced OO system.  Adding these type hints meant that the growing number of functions designed to work on specialized objects, would not have to spend the first few lines of their implementation verifying is_a relationships - but could do that easily within the function signature. Circa 2002, 'type hints' were added to the then-in-development PHP 5.0.  Type hints (possibly misnamed) were designed to allow functions and methods denote the specific kinds of objects they can handle - primarily to accommodate for PHP's much more complex, advanced OO system.  Adding these type hints meant that the growing number of functions designed to work on specialized objects, would not have to spend the first few lines of their implementation verifying is_a relationships - but could do that easily within the function signature.
-The possibility of supporting type hints for 'native' PHP types was discussed shortly afterwords;  It was decided against it primarily on the premise that scalar types in PHP convert on-the-fly depending on the contextand there's no logic behind forcing their type at the calling stage.  One notable exception was 'array' - for which support was added, with the rationale being that functions which expect array arguments, would probably find any other type quite useless.+The possibility of supporting type hints for 'native' PHP types was discussed shortly afterwords;  Consensus was not reached and it never made it to the language, primarily on the premise that scalar types in PHP convert on-the-fly depending on the context and there's no logic behind forcing their type at the calling stage.  One notable exception was 'array' - for which support was added, with the rationale being that functions which expect array arguments, would probably find any other type quite useless.
    
 ===== Introduction ===== ===== Introduction =====
Line 32: Line 32:
  
 <code php> <code php>
-function foo(int x) {} +function foo(int $x) {} 
-function bar(x, y, float z) {} +function bar($x, string $y) {} 
-function baz(int x, float y, string z) {} +function baz(int $x, float $y, string $z) {} 
-function foobar(int &x) {}+function foobar(int &$x) {}
 </code> </code>
  
 Once a function argument has been designated a scalar type hint - the function author is completely relieved of any further checks and conversions, and is assured that his or hers code will always be supplied with an argument of the designated type. Once a function argument has been designated a scalar type hint - the function author is completely relieved of any further checks and conversions, and is assured that his or hers code will always be supplied with an argument of the designated type.
 +
 +<code php>
 +foo(100);               // will succeed silently
 +foo(3.14);              // argument will be trimmed to 3(int) before being passed to foo()
 +foo('19');              // argument will be converted to 19(int) before being passed to foo()
 +foo('hey!');            // will fail
 +bar(123, 'yo');         // success
 +bar('whatever', 17.5);  // argument will be converted to a string '17.5' before being passed to bar()
 +foobar(17.5);           // will fail (scalar value cannot be passed by reference)
 +$x=17.5;  foobar($x);   // $x will be converted to 17(int), and then passed to foobar(); $x remains 17(int) after the call to foobar()
 +</code>
  
 During the parameter passing stage, PHP will ensure that values passed as arguments tagged with type requirements - are actually of that designated type.  The following algorithm will be employed: During the parameter passing stage, PHP will ensure that values passed as arguments tagged with type requirements - are actually of that designated type.  The following algorithm will be employed:
Line 46: Line 57:
   - Emit an error or throw an exception.   - Emit an error or throw an exception.
  
-**Note:**  In step 2, if the argument is designated as a pass-by-reference argument - the conversion will apply to the variable being passed.  This is consistent with the expectation that arguments passed by reference may be modified by the function they're sent to. 
  
  
 ===== Conversion Logic ===== ===== Conversion Logic =====
  
-^ value                   ^ string       ^ float      ^ int       ^ numeric   ^ bool      +^ value                   ^ string       ^ float       ^ int        ^ numeric    ^ bool       
-^ true (boolean)          | <font color="red">fail</font>     | 1.0        | 1         | 1         | //as-is// | +^ true (boolean)          | //fail/    | 1.0         | 1          | 1          | //as-is//  
-^ false (boolean)         | //fail//     | 0.0        | 0         | 0         | //as-is// | +^ false (boolean)         | //fail//     | 0.0         | 0          | 0          | //as-is//  
-^ 0 (integer)             | '0'          | 0.0        | //as-is// | //as-is// | false     +^ 0 (integer)             | '0'          | 0.0         | //as-is//  | //as-is//  | false      
-^ 1 (integer)             | '1'          | 1.0        | //as-is// | //as-is// | true      +^ 1 (integer)             | '1'          | 1.0         | //as-is//  | //as-is//  | true       
-^ 12 (integer)            | '12'         | 12.0       | //as-is// | //as-is// | true      +^ 12 (integer)            | '12'         | 12.0        | //as-is//  | //as-is//  | true       
-^ 12.0 (double)           | '12.0'       | //as-is//  | 12        | //as-is// | true      +^ 12.0 (double)           | '12.0'       | //as-is//   | 12         | //as-is//  | true       
-^ 12.34 (double)          | '12.34'      | //as-is//  | 12        | //as-is// | true      +^ 12.34 (double)          | '12.34'      | //as-is//   | 12         | //as-is//  | true       
-^ 'true' (string)         | //as-is//    | //fail//   | //fail//  | //fail//  | //fail//  +^ 'true' (string)         | //as-is//    | //fail//    | //fail//   | //fail//   | //fail//   
-^ 'false' (string)        | //as-is//    | //fail//   | //fail//  | //fail//  | //fail//  +^ 'false' (string)        | //as-is//    | //fail//    | //fail//   | //fail//   | //fail//   
-^ '0' (string)            | //as-is//    | 0.0        | 0         | 0         | false     +^ '0' (string)            | //as-is//    | 0.0         | 0          | 0          | false      
-^ '1' (string)            | //as-is//    | 1.0        | 1         | 1         | true      +^ '1' (string)            | //as-is//    | 1.0         | 1          | 1          | true       
-^ '12' (string)           | //as-is//    | 12.0       | 12        | 12        | true      +^ '12' (string)           | //as-is//    | 12.0        | 12         | 12         | true       | 
-^ '12abc' (string)        | //as-is//    | //fail//   | //fail//  | //fail//  | //fail//  +^ '0xA' (string)          | //as-is//    | 10.0        | 10         | 10         | true       
-^ '12.0' (string)         | //as-is//    | 12.0       | 12        | 12.0      | true      +^ '12abc' (string)        | //as-is//    | //fail//    | //fail//   | //fail//   | //fail//   
-^ '12.34' (string)        | //as-is//    | 12.34      | 12        | 12.34     | true      +^ '12.0' (string)         | //as-is//    | 12.0        | 12         | 12.0       | true       
-^ 'foo' (string)          | //as-is//    | //fail//   | //fail//  | //fail//  | //fail//  +^ '12.34' (string)        | //as-is//    | 12.34       | 12         | 12.34      | true       
-^ empty string (TBD)      | //as-is//    | //fail//   | //fail//  | //fail//  | //fail//  +^ 'foo' (string)          | //as-is//    | //fail//    | //fail//   | //fail//   | //fail//   
-^ array () (array)        | //fail//     | //fail//   | //fail//  | //fail//  | //fail//  +^ empty string (TBD)      | //as-is//    | //fail//    | //fail//   | //fail//   | //fail//   
-^ array (0 => 12) (array) | //fail//     | //fail//   | //fail//  | //fail//  | //fail//  +^ array () (array)        | //fail//     | //fail//    | //fail//   | //fail//   | //fail//   
-^ NULL (NULL)             | empty string | 0.0        | 0         | 0         | false     |+^ array (0 => 12) (array) | //fail//     | //fail//    | //fail//   | //fail//   | //fail//   
 +^ NULL (NULL)             | empty string | 0.0         | 0          | 0          | false      | 
 +^ object                  | //fail++//   | //fail++//  | //fail++// | //fail++// | //fail++// |
  
 +//as-is//  - designates that the value is passed as-is, without conversion
  
-//fail//  - designates failure, either emitting an error or throwing an exception+//fail//   - designates failure, either emitting an error or throwing an exception
  
-//as-is// - designates that the value is passed as-iswithout conversion+//fail++// - failunless a matching conversion function exists (e.g. __toString()) - in which case it will be called and used
  
  
-Several people still have asked to expand array/object type hinting to cover other data typeswhich mostly ask for similar strict type checking (without any type juggling) as for arrays and objectswhile also triggering an E_RECOVERABLE_ERROR for failed checks. However this means that the burden for explicit type casting is now on the user of the function/methodThis RFC tries to address this issue. +**Note:**  'scalar' and 'arraytype hints remain unchanged - an array typed argument will only accept arraysand will otherwise fail;  A scalar typed argument will accept any kind of scalar argumentbut will fail on objects and arrays.
-===== Why is strict type checking problematic? =====+
  
-Strict type checking does have the advantage that subtle bugs will be noticed more quickly and that function/method signatures will become yet more self documenting and therefore more expressive. Also doing these type checks based on the signature also means less code and better performance over having to hand code the validation+In a nutshell, the conversion logic is quite similar to the one employed by internal functions, with one key difference - it is designed to fail in case of a conversion that is unlikely to 'make sense' Specifically, it breaks away from PHP's internal function behavior in two key places:
  
-Proponents of [[rfc:typecheckingstrictonly|only providing strict type checking]] say that for the most part variables are defined with the proper type unless they come from an outside source, which usually requires validation anyways, which is perfect opportunity to type cast.+  - String to int/float conversions - these will fail unless the string 'looks like an integer' or 'looks like float'
 +  - Non-numeric strings cannot be converted to booleans.
  
-That is to define a variable that contains a boolean, developer will probably do "$is_foo = true" and not "$is_foo = 0". While this may be true, it does means that developers using such strict type checking API's now require that users understand data types, which currently beginning developers do not necessarily need to. 
  
-Furthermore quite often developers need to parse content out of strings and pass this to other methods. With strict type checking one is now forced to explicitly type cast. While its certainly doable, its also additional work that needs to be done while writing the code ("$foo_int (int)substr($bar, 3, 10)"). Then again some might argue that this makes the code clearer.+===== Benefits =====
  
-It also means that users of such strict typed API's will tend to simply cast and due to laziness (PHP is used for rapid development after all) might forgo validating first if the content is really what they expected. Without type checking the burden would be with the developer providing the API. Since its usually expected that an API is fairly often, it seems illogic to move this burden to the API users. More over due to this, a new kind of bug will be introduced due to over use of cast instead of hand coded parameter validation as is currently necessary. This could lead to even higher bug rates.+There are numerous benefits to introducing type-checking for scalar types in PHP:
  
-As for outside sources needing validationThis is not always the case as most people do trust that the data returned from a database is in the expected format, even though for most RDBMS it will always be returned as stringSame applies to configuration files, which if defined in something else than PHP code will most likely only return stringsbut who's values will usually not be validated+  - **Simplication of parameter sanitizing**.  The need for explicitly casting arguments ($arg = (int) $arg;) or conditional type-check failures (if (!is_numeric($arg)) {...}) will be much reduced, and may be eliminated. 
-===== Introducing weak type checking =====+  - **Code readability**.  Reading the implementation code may be easier with the clear knowledge that an argument is of a certain type. 
 +  - **Clearer contract between caller and callee**.  By the function signature alone - it will be possible for the caller to know what kind of value is expected by the called function. 
 +  - **IDE enablement**.  IDEs will be able to have better insight into the behavior of the code, and potentially translate it into better tooling
 +  - **Optimization**.  Using the information about typed arguments, and the fact they are always ensured to be of that type - it may be possible to use this information to perform certain opcode-level optimizations. 
 +  - **Security**.  In certain cases, using typed arguments may help discover and prevent security issues.
  
-In Ilia's recent [[http://news.php.net/php.internals/44573|strict type checking proposal]], he did include a "numeric" and a "scalar" data type, which tried to reducing the above noted issues with strict type checking. The "numeric" type would behave similar to the "is_numeric()" function in that it would not check the type, but would also accept a string with only numbers or a float ([[http://php.net/is_numeric|see the documentation]] for the exact definition). In the same way "scalar" would simply check if the parameter is not an array, object or resource. 
  
-However it does not cover all specific data types. Moreover "numeric" is not a known data type and is also significantly longer to type than "int". As a result it seems likely that "int" will be used by many developers even where "numeric" would suffice. As a result [[http://news.php.net/php.internals/44619|a new concept was introduced]] to simply allow a syntax to define if the check should be strict or weak.+===== Comparison with Strict Typing =====
  
-A weak check would examine the content of the variable in way that would be more strict than the standard type jugglingIf the weak check passes, the value would be type casted appropriatelyIf the weak check fails it would trigger an E_RECOVERABLE_ERROR just as in the strict case.+The main 'contender' to this RFC is the Strict Typing RFC.  Unlike Type Enforcement, Strict Typing is based on a strict comparison of the zval.type value As such, it introduces an entirely new semantics to PHP, especially around parameter passing.  Today, the zval.type is used only by a handful of functions (is_int() et al, gettype()), and the identity operator These functions are much more rarely used than their more 'lax' siblings (is_numeric()) which are typically more appropriate;  While the identity operator is typically used for specialized cases, e.g. when dealing with a function returning an integer, and having to tell boolean false apart.  It is therefore argued that extending a zval.type-based checks into parameter passing - a center-piece of the language - will inadvertently change the theme of the language, and the expected 'lax' type checking behavior expected from it today.
  
-Here is a short list of examples to illustrate the weak type checkingNote that just like the current array/object hintsa NULL is only allowed if the parameter defaults to NULL.+In that context, it's important to mention that the two most common sources for data going into PHP - input data (_GET, _POST, etc.) and data coming from external resources (e.g. databases, config files, memcached, etc.) - are almost exclusively typed as strings.  While some do type conversion during the input sanitizing phase - that is not always the caseespecially with data coming from the database.  Strict Typing is inherently incompatible with this concept, in the sense that it assumes the underlying data type (zval.type) is identical to the semantics of the value.  It does not come to say that the two cannot be used together - but they are a pretty bad fit.
  
-^ value                   ^ string ^ float ^ int   ^ numeric ^ scalar ^ bool ^ array ^ +Furthermore - it is important to notice that the sole difference between Strict Typing and this proposed solution has to do with what happens **outside** the scope of the type-argumented function.  In other words - all the benefits for the function code itself (readability, code reduction, optimization, etc.is 100.0% identical.  The semantics of what happens during the parameter-passing stage is what's different.
-^ true (boolean)          | fail   | fail  | fail  | fail    | pass   | pass | fail  | +
-^ false (boolean        | fail   | fail  | fail  | fail    | pass   | pass | fail  | +
-^ 0 (integer)             | fail   | pass  | pass  | pass    | pass   | pass | fail  | +
-^ 1 (integer)             | fail   | pass  | pass  | pass    | pass   | pass | fail  | +
-^ 12 (integer)            | fail   | pass  | pass  | pass    | pass   | fail | fail  | +
-^ 12 (double)             | fail   | pass  | fail  | pass    | pass   | fail | fail  | +
-^ 12.34 (double)          | fail   | pass  | fail  | pass    | pass   | fail | fail  | +
-^ 'true' (string)         | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ 'false' (string)        | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ '0' (string)            | pass   | fail  | fail  | pass    | pass   | pass | fail  | +
-^ '1' (string)            | pass   | fail  | fail  | pass    | pass   | pass | fail  | +
-^ '12' (string)           | pass   | fail  | fail  | pass    | pass   | fail | fail  | +
-^ '12abc' (string)        | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ '12.0' (string)         | pass   | fail  | fail  | pass    | pass   | fail | fail  | +
-'12.34' (string)        | pass   | fail  | fail  | pass    | pass   | fail | fail  | +
-^ 'foo' (string)          | pass   | fail  | fail  | fail    | pass   | fail | fail  | +
-^ array () (array)        | fail   | fail  | fail  | fail    | fail   | fail | pass  | +
-^ array (0 => 12) (array) | fail   | fail  | fail  | fail    | fail   | fail | pass  | +
-^ NULL (NULL)             | fail   | fail  | fail  | fail    | fail   | fail | fail  | +
-^ '' (string)             | pass   | fail  | fail  | fail    | pass   | fail | fail  |+
  
-Further more weak type checking could also be useful once we have generic type casting support via some magic type cast method along the lines of __toString(). In this case the weak type checking would also allow an object to pass if it provides the relevant casting methodthough it would then of course automatically cast the object to the given type. +Interestingly, the benefits from both Strict Typing and Type Enforcement typing are quite similar - primarily since they are virtually identical as far as the called-function is concerned, and only differ in the semantics of the parameter-passing phase.  The same benefits mentioned above are mostly all relevant to Strict Typing as well.  If any, Type Enforcement holds an edge in code readability and reliability as far as the calling code is concerned.  Because Strict Typing is likely to cause a lot of 'false positives'i.e. - failures in pieces of code that actually have nothing wrong in them - it is also likely that these would be solved by explicit casting during the function call;  Since PHP's casting will happily convert just about any type to any other type - this solution would be inferior to the solution proposed here - that is more likely to encourage code without explicit casting and therefore help weed out more issues and bugs.
- +
-===== Proposed API =====+
  
 <code php> <code php>
 +function baz(int $x, float $y, string $z) {}
  
-// "+' denotes strict and "-" denotes weak type checking +// Strict type checking 
-function add_user(+string name, +string phone_number, -int age) { .. } +baz((int) $_GET['x'], (float$_GET['y'], (string) $_GET['z'])//explicit conversion requiredeven 'illogical' conversions will be applied without warning
- +
-// "!" denotes strict type checking and "?" denotes weak type checking +
-function add_user(string name, !string phone_number, ?int age{ .. } +
- +
-// "~" denotes weak type checking +
-function add_user(string namestring phone_number, ~int age) { .. } +
- +
-// "()" denotes weak type checking +
-function add_user(string name, string phone_number, (intage{ .. } +
- +
-// Keep in mind that the "modifier" can either be placed at the start or end +
-function add_user(string! namestring! phone_number, int? age) { .. } +
- +
-// Furthermore one of the two modifiers could be the default +
-// optionally + is the default +
-function add_user(+string name, string phone_number, -int age) { .. } +
-// optionally - is the default +
-function add_user(+string name, +string phone_number, int age) { .. }+
  
 +// Type enforcement
 +baz($_GET['x'], $_GET['y'], $_GET['z']);  // on-the-fly conversion, with 'safety net' against illogical conversion
 </code> </code>
  
 ===== Changelog ===== ===== Changelog =====
rfc/typecheckingweak.1247391463.txt.gz · Last modified: 2017/09/22 13:28 (external edit)