rfc:typecheckingweak

This is an old revision of the document!


Request for Comments: Weak parameter type checking and type ensurance

This RFC suggests a system designed to enable functions to denote designated types for arguments, and conversion rules (as well as pass/fail criteria) for when calling code passes arguments from different types. The rationale behind using such 'weak' type checking system (in general, and in particular when compared with 'strict' type checking) is also discussed.

Background

Circa 2002, 'type hints' were added to the then-in-development PHP 5.0. Type hints (possibly misnamed) were designed to allow functions and methods denote the specific kinds of objects they can handle - primarily to accommodate for PHP's much more complex, advanced OO system. Adding these type hints meant that the growing number of functions designed to work on specialized objects, would not have to spend the first few lines of their implementation verifying is_a relationships - but could do that easily within the function signature. The possibility of supporting type hints for 'native' PHP types was discussed shortly afterwords; It was decided against it primarily on the premise that scalar types in PHP convert on-the-fly depending on the context, and there's no logic behind forcing their type at the calling stage. One notable exception was 'array' - for which support was added, with the rationale being that functions which expect array arguments, would probably find any other type quite useless.

Introduction

Recently, the case for having ways for functions to designate what kind of scalar values (e.g. int, float, string) they expect has been brought up again for discussion. It appears that while in many cases PHP's scalar auto-conversion is sufficient for 'hiding' this bit of complexity from functions, there are other cases where functions want to force specific types on their arguments - for a variety of reasons (sanitizing, coding style, readability, reflection, etc.). Presently, there is overwhelming support to add a mechanism to PHP that would enable developers to automatically sanitize the types of function arguments - and several methods for doing that have been suggested. This RFC focuses on a mechanism modeled closely after the same type-conversion mechanism used by internal functions with few modifications.

Goals

  1. Satisfy the key requirements required from the mechanism
  2. Minimize the amount of new semantics introduced to PHP
    1. To retain consistency
    2. To keep PHP's learning curve shallow
  3. Suggest a mechanism that can be implemented without a severe impact on performance

Suggested Solution

Conceptually, user functions will be able to denote that they are expecting a specific type of scalar value, using syntax similar to that of class type hinting. This notation will be optional; If absent - the existing behavior will continue.

function foo(int x) {} function bar(x, y, float z) {} function baz(int x, float y, string z) {}

Once a function argument has been designated a scalar type hint - the function author is completely relieved of any further checks and conversions, and is assured that his or hers code will always be supplied with an argument of the designated type.

During the parameter passing stage, PHP will ensure that values passed as arguments tagged with type requirements - are actually of that designated type. The following algorithm will be employed:

  1. Does the value to-be-passed have the type required by the function code? If so, pass it on as-is. If not - move to step 2.
  2. Can the value be converted to the type required by the function (as per the conversion table below)? If so - convert and pass it on. If not - move to step 3.
  3. Emit an error or throw an exception.

Several people still have asked to expand array/object type hinting to cover other data types, which mostly ask for similar strict type checking (without any type juggling) as for arrays and objects, while also triggering an E_RECOVERABLE_ERROR for failed checks. However this means that the burden for explicit type casting is now on the user of the function/method. This RFC tries to address this issue.

Why is strict type checking problematic?

Strict type checking does have the advantage that subtle bugs will be noticed more quickly and that function/method signatures will become yet more self documenting and therefore more expressive. Also doing these type checks based on the signature also means less code and better performance over having to hand code the validation

Proponents of only providing strict type checking say that for the most part variables are defined with the proper type unless they come from an outside source, which usually requires validation anyways, which is a perfect opportunity to type cast.

That is to define a variable that contains a boolean, developer will probably do “$is_foo = true” and not “$is_foo = 0”. While this may be true, it does means that developers using such strict type checking API's now require that users understand data types, which currently beginning developers do not necessarily need to.

Furthermore quite often developers need to parse content out of strings and pass this to other methods. With strict type checking one is now forced to explicitly type cast. While its certainly doable, its also additional work that needs to be done while writing the code (“$foo_int = (int)substr($bar, 3, 10)”). Then again some might argue that this makes the code clearer.

It also means that users of such strict typed API's will tend to simply cast and due to laziness (PHP is used for rapid development after all) might forgo validating first if the content is really what they expected. Without type checking the burden would be with the developer providing the API. Since its usually expected that an API is fairly often, it seems illogic to move this burden to the API users. More over due to this, a new kind of bug will be introduced due to over use of cast instead of hand coded parameter validation as is currently necessary. This could lead to even higher bug rates.

As for outside sources needing validation. This is not always the case as most people do trust that the data returned from a database is in the expected format, even though for most RDBMS it will always be returned as string. Same applies to configuration files, which if defined in something else than PHP code will most likely only return strings, but who's values will usually not be validated.

Introducing weak type checking

In Ilia's recent strict type checking proposal, he did include a “numeric” and a “scalar” data type, which tried to reducing the above noted issues with strict type checking. The “numeric” type would behave similar to the “is_numeric()” function in that it would not check the type, but would also accept a string with only numbers or a float (see the documentation for the exact definition). In the same way “scalar” would simply check if the parameter is not an array, object or resource.

However it does not cover all specific data types. Moreover “numeric” is not a known data type and is also significantly longer to type than “int”. As a result it seems likely that “int” will be used by many developers even where “numeric” would suffice. As a result a new concept was introduced to simply allow a syntax to define if the check should be strict or weak.

A weak check would examine the content of the variable in a way that would be more strict than the standard type juggling. If the weak check passes, the value would be type casted appropriately. If the weak check fails it would trigger an E_RECOVERABLE_ERROR just as in the strict case.

Here is a short list of examples to illustrate the weak type checking. Note that just like the current array/object hints, a NULL is only allowed if the parameter defaults to NULL.

value string float int numeric scalar bool array
true (boolean) fail fail fail fail pass pass fail
false (boolean) fail fail fail fail pass pass fail
0 (integer) fail pass pass pass pass pass fail
1 (integer) fail pass pass pass pass pass fail
12 (integer) fail pass pass pass pass fail fail
12 (double) fail pass fail pass pass fail fail
12.34 (double) fail pass fail pass pass fail fail
'true' (string) pass fail fail fail pass fail fail
'false' (string) pass fail fail fail pass fail fail
'0' (string) pass fail fail pass pass pass fail
'1' (string) pass fail fail pass pass pass fail
'12' (string) pass fail fail pass pass fail fail
'12abc' (string) pass fail fail fail pass fail fail
'12.0' (string) pass fail fail pass pass fail fail
'12.34' (string) pass fail fail pass pass fail fail
'foo' (string) pass fail fail fail pass fail fail
array () (array) fail fail fail fail fail fail pass
array (0 => 12) (array) fail fail fail fail fail fail pass
NULL (NULL) fail fail fail fail fail fail fail
'' (string) pass fail fail fail pass fail fail

Further more weak type checking could also be useful once we have generic type casting support via some magic type cast method along the lines of __toString(). In this case the weak type checking would also allow an object to pass if it provides the relevant casting method, though it would then of course automatically cast the object to the given type.

Proposed API

// "+' denotes strict and "-" denotes weak type checking
function add_user(+string name, +string phone_number, -int age) { .. }
 
// "!" denotes strict type checking and "?" denotes weak type checking
function add_user(string name, !string phone_number, ?int age) { .. }
 
// "~" denotes weak type checking
function add_user(string name, string phone_number, ~int age) { .. }
 
// "()" denotes weak type checking
function add_user(string name, string phone_number, (int) age) { .. }
 
// Keep in mind that the "modifier" can either be placed at the start or end
function add_user(string! name, string! phone_number, int? age) { .. }
 
// Furthermore one of the two modifiers could be the default
// optionally + is the default
function add_user(+string name, string phone_number, -int age) { .. }
// optionally - is the default
function add_user(+string name, +string phone_number, int age) { .. }

Changelog

rfc/typecheckingweak.1247387331.txt.gz · Last modified: 2017/09/22 13:28 (external edit)