rfc:typecheckingweak

Request for Comments: Parameter Type Enforcement

This RFC suggests a system designed to enable functions to denote designated types for arguments, and conversion rules (as well as pass/fail criteria) for when calling code passes arguments from different types. The rationale behind using such type-enforcement system (in general, and in particular when compared with 'strict' type checking) is also discussed.

Background

Circa 2002, 'type hints' were added to the then-in-development PHP 5.0. Type hints (possibly misnamed) were designed to allow functions and methods denote the specific kinds of objects they can handle - primarily to accommodate for PHP's much more complex, advanced OO system. Adding these type hints meant that the growing number of functions designed to work on specialized objects, would not have to spend the first few lines of their implementation verifying is_a relationships - but could do that easily within the function signature. The possibility of supporting type hints for 'native' PHP types was discussed shortly afterwords; Consensus was not reached and it never made it to the language, primarily on the premise that scalar types in PHP convert on-the-fly depending on the context and there's no logic behind forcing their type at the calling stage. One notable exception was 'array' - for which support was added, with the rationale being that functions which expect array arguments, would probably find any other type quite useless.

Introduction

Recently, the case for having ways for functions to designate what kind of scalar values (e.g. int, float, string) they expect has been brought up again for discussion. It appears that while in many cases PHP's scalar auto-conversion is sufficient for 'hiding' this bit of complexity from functions, there are other cases where functions want to force specific types on their arguments - for a variety of reasons (sanitizing, coding style, readability, reflection, etc.). Presently, there is overwhelming support to add a mechanism to PHP that would enable developers to automatically sanitize the types of function arguments - and several methods for doing that have been suggested. This RFC focuses on a mechanism modeled closely after the same type-conversion mechanism used by internal functions with few modifications.

Goals

  1. Satisfy the key requirements required from the mechanism
  2. Minimize the amount of new semantics introduced to PHP
    1. To retain consistency
    2. To keep PHP's learning curve shallow
  3. Suggest a mechanism that can be implemented without a severe impact on performance

Proposed Solution

Conceptually, user functions will be able to denote that they are expecting a specific type of scalar value, using syntax similar to that of class type hinting. This notation will be optional; If absent - the existing behavior will continue.

function foo(int $x) {}
function bar($x, string $y) {}
function baz(int $x, float $y, string $z) {}
function foobar(int &$x) {}

Once a function argument has been designated a scalar type hint - the function author is completely relieved of any further checks and conversions, and is assured that his or hers code will always be supplied with an argument of the designated type.

foo(100);               // will succeed silently
foo(3.14);              // argument will be trimmed to 3(int) before being passed to foo()
foo('19');              // argument will be converted to 19(int) before being passed to foo()
foo('hey!');            // will fail
bar(123, 'yo');         // success
bar('whatever', 17.5);  // argument will be converted to a string '17.5' before being passed to bar()
foobar(17.5);           // will fail (scalar value cannot be passed by reference)
$x=17.5;  foobar($x);   // $x will be converted to 17(int), and then passed to foobar(); $x remains 17(int) after the call to foobar()

During the parameter passing stage, PHP will ensure that values passed as arguments tagged with type requirements - are actually of that designated type. The following algorithm will be employed:

  1. Does the value to-be-passed have the type required by the function code? If so, pass it on as-is. If not - move to step 2.
  2. Can the value be converted to the type required by the function (as per the Conversion Logic below)? If so - convert and pass it on. If not - move to step 3.
  3. Emit an error or throw an exception.

Conversion Logic

value string float int numeric bool
true (boolean) fail 1.0 1 1 as-is
false (boolean) fail 0.0 0 0 as-is
0 (integer) '0' 0.0 as-is as-is false
1 (integer) '1' 1.0 as-is as-is true
12 (integer) '12' 12.0 as-is as-is true
12.0 (double) '12.0' as-is 12 as-is true
12.34 (double) '12.34' as-is 12 as-is true
'true' (string) as-is fail fail fail fail
'false' (string) as-is fail fail fail fail
'0' (string) as-is 0.0 0 0 false
'1' (string) as-is 1.0 1 1 true
'12' (string) as-is 12.0 12 12 true
'0xA' (string) as-is 10.0 10 10 true
'12abc' (string) as-is fail fail fail fail
'12.0' (string) as-is 12.0 12 12.0 true
'12.34' (string) as-is 12.34 12 12.34 true
'foo' (string) as-is fail fail fail fail
empty string (TBD) as-is fail fail fail fail
array () (array) fail fail fail fail fail
array (0 ⇒ 12) (array) fail fail fail fail fail
NULL (NULL) empty string 0.0 0 0 false
object fail++ fail++ fail++ fail++ fail++

as-is - designates that the value is passed as-is, without conversion

fail - designates failure, either emitting an error or throwing an exception

fail++ - fail, unless a matching conversion function exists (e.g. __toString()) - in which case it will be called and used

Note: 'scalar' and 'array' type hints remain unchanged - an array typed argument will only accept arrays, and will otherwise fail; A scalar typed argument will accept any kind of scalar argument, but will fail on objects and arrays.

In a nutshell, the conversion logic is quite similar to the one employed by internal functions, with one key difference - it is designed to fail in case of a conversion that is unlikely to 'make sense'. Specifically, it breaks away from PHP's internal function behavior in two key places:

  1. String to int/float conversions - these will fail unless the string 'looks like an integer' or 'looks like a float'.
  2. Non-numeric strings cannot be converted to booleans.

Benefits

There are numerous benefits to introducing type-checking for scalar types in PHP:

  1. Simplication of parameter sanitizing. The need for explicitly casting arguments ($arg = (int) $arg;) or conditional type-check failures (if (!is_numeric($arg)) {…}) will be much reduced, and may be eliminated.
  2. Code readability. Reading the implementation code may be easier with the clear knowledge that an argument is of a certain type.
  3. Clearer contract between caller and callee. By the function signature alone - it will be possible for the caller to know what kind of value is expected by the called function.
  4. IDE enablement. IDEs will be able to have better insight into the behavior of the code, and potentially translate it into better tooling.
  5. Optimization. Using the information about typed arguments, and the fact they are always ensured to be of that type - it may be possible to use this information to perform certain opcode-level optimizations.
  6. Security. In certain cases, using typed arguments may help discover and prevent security issues.

Comparison with Strict Typing

The main 'contender' to this RFC is the Strict Typing RFC. Unlike Type Enforcement, Strict Typing is based on a strict comparison of the zval.type value. As such, it introduces an entirely new semantics to PHP, especially around parameter passing. Today, the zval.type is used only by a handful of functions (is_int() et al, gettype()), and the identity operator. These functions are much more rarely used than their more 'lax' siblings (is_numeric()) which are typically more appropriate; While the identity operator is typically used for specialized cases, e.g. when dealing with a function returning an integer, and having to tell boolean false apart. It is therefore argued that extending a zval.type-based checks into parameter passing - a center-piece of the language - will inadvertently change the theme of the language, and the expected 'lax' type checking behavior expected from it today.

In that context, it's important to mention that the two most common sources for data going into PHP - input data (_GET, _POST, etc.) and data coming from external resources (e.g. databases, config files, memcached, etc.) - are almost exclusively typed as strings. While some do type conversion during the input sanitizing phase - that is not always the case, especially with data coming from the database. Strict Typing is inherently incompatible with this concept, in the sense that it assumes the underlying data type (zval.type) is identical to the semantics of the value. It does not come to say that the two cannot be used together - but they are a pretty bad fit.

Furthermore - it is important to notice that the sole difference between Strict Typing and this proposed solution has to do with what happens outside the scope of the type-argumented function. In other words - all the benefits for the function code itself (readability, code reduction, optimization, etc.) is 100.0% identical. The semantics of what happens during the parameter-passing stage is what's different.

Interestingly, the benefits from both Strict Typing and Type Enforcement typing are quite similar - primarily since they are virtually identical as far as the called-function is concerned, and only differ in the semantics of the parameter-passing phase. The same benefits mentioned above are mostly all relevant to Strict Typing as well. If any, Type Enforcement holds an edge in code readability and reliability as far as the calling code is concerned. Because Strict Typing is likely to cause a lot of 'false positives', i.e. - failures in pieces of code that actually have nothing wrong in them - it is also likely that these would be solved by explicit casting during the function call; Since PHP's casting will happily convert just about any type to any other type - this solution would be inferior to the solution proposed here - that is more likely to encourage code without explicit casting and therefore help weed out more issues and bugs.

function baz(int $x, float $y, string $z) {}
 
// Strict type checking
baz((int) $_GET['x'], (float) $_GET['y'], (string) $_GET['z']); //explicit conversion required, even 'illogical' conversions will be applied without warning
 
// Type enforcement
baz($_GET['x'], $_GET['y'], $_GET['z']);  // on-the-fly conversion, with 'safety net' against illogical conversion

Changelog

rfc/typecheckingweak.txt · Last modified: 2011/04/06 12:59 (external edit)