rfc:named_params

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revisionBoth sides next revision
rfc:named_params [2013/09/06 16:34] – created nikicrfc:named_params [2022/05/26 15:30] – Add alternatives section iquito
Line 1: Line 1:
-====== PHP RFC: Named Parameters ====== +====== PHP RFC: Stricter implicit boolean coercions ====== 
-  * Version: 0.9 +  * Version: 1.7 
-  * Date: 2013-09-06 +  * Date: 2022-05-16 
-  * Author: Nikita Popov <nikic@php.net>+  * Author: Andreas Leathley, <a.leathley@gmx.net>
   * Status: Under Discussion   * Status: Under Discussion
-  * Proposed forPHP 5.6+  * First Published athttp://wiki.php.net/rfc/stricter_implicit_boolean_coercions
  
-===== State of this RFC =====+===== Introduction =====
  
-This is a preliminary RFC for named parameters. It's purpose is to find out if we want to support them in the next PHP version and if sohow the implementation should work. The syntax and behavior described here are just basic ideas that still need to be fleshed out.+When not using strict_types in PHP, scalar type coercions have become less lossy/surprising in the last years - non-number-strings cannot be passed to an int type (leads to a TypeError), floats (or float-strings) with a fractional part cannot be passed to an int type (leads to a deprecation notice since 8.1 because it loses information). The big exception so far are booleans: you can give a typed boolean any value and it will convert any non-zero (and non-empty-string) value to true without any notice.
  
-The implementation that accompanies this proposal is not complete yet. As this is a very complicated feature I do not wish to spend time finishing it without knowing that we actually want this feature.+Some examples where this might lead to unexpected outcomes:
  
-The implementation is based off and includes the [[rfc:variadics]] and [[rfc:argument_unpacking]] RFCs. I think they are essential if we implement named params (otherwise we couldn't have good error handling for unknown named params and no unpacking support at all, unless we break BC in ''call_user_func_array''). The choice of how they should be implemented somewhat depends on whether we want to support named params, so I'm doing this proposal first. +<PHP> 
-   +function toBool(bool $a) 
-===== What are named arguments? =====+{ 
 +  var_dump($a); 
 +}
  
-Named arguments are a way to pass arguments to a function, which makes use of the parameter names rather than the position of the parameters:+toBool('0'); // bool(false) 
 +toBool(-0); // bool(false) 
 +toBool('-0'); // bool(true) 
 +toBool(0.0); // bool(false) 
 +toBool('0.0'); // bool(true) 
 +toBool(0.1); // bool(true) 
 +toBool(-37593); // bool(true) 
 +toBool('inactive'); // bool(true) 
 +toBool('false'); // bool(true) 
 +</PHP>
  
-<code php> +===== Proposal =====
-// Using positional arguments: +
-array_fill(0, 100, 42); +
-// Using named arguments: +
-array_fill(start_index => 0, num => 100, value => 42); +
-</code>+
  
-The order in which the named arguments are passed does not matter. The above example passes them in the same order as they are declared in the function signaturebut any other order is possible too:+In coercive typing mode, limit the allowed scalar values for typed boolean arguments, boolean return types and boolean class properties to the following:
  
-<code php> +  * 0 (and -0) integer (= false) 
-array_fill(value => 42, num => 100, start_index =0); +  * 0.0 (and -0.0) float (= false) 
-</code>+  * "0" string (false) 
 +  * "" (empty) string (false) 
 +  * 1 integer (= true) 
 +  * 1.float (= true
 +  * "1" string (= true)
  
-It is possible to combine named arguments with normalpositional arguments and it is also possible to specify only some of the optional arguments of a function, irregardless of their order:+Any other integersfloats and strings are always coerced to true (no behavior change) but will emit an ''E_DEPRECATED'' notice:
  
-<code php> +  * For coercions from string the deprecation notice is: Implicit conversion from string "%s" to trueonly """0" or "1" are allowed 
-htmlspecialchars($string, double_encode => false); +  * For coercions from int the deprecation notice is: Implicit conversion from int %d to trueonly 0 or 1 are allowed 
-// Same as +  * For coercions from float the deprecation notice is: Implicit conversion from float %f to true, only 0 or 1 are allowed
-htmlspecialchars($string, ENT_COMPAT | ENT_HTML401'UTF-8'false); +
-</code>+
  
-===== What are the benefits of named arguments? =====+These would be the notices generated for the examples in the introduction:
  
-One obvious benefit of named arguments can be seen in the last code sample (using ''htmlspecialchars''): You no longer have to specify all defaults until the one you want to changeNamed args allow you to directly overwrite only those defaults that you wish to change.+<PHP> 
 +toBool('0'); 
 +toBool(-0); 
 +toBool('-0'); // Implicit conversion from string "-0" to true, only "", "0" or "1" are allowed 
 +toBool(0.0); 
 +toBool('0.0'); // Implicit conversion from string "0.0" to true, only "", "0" or "1" are allowed 
 +toBool(0.1); // Implicit conversion from float 0.to true, only 0 or 1 are allowed 
 +toBool(-37593); // Implicit conversion from int -37593 to true, only 0 or 1 are allowed 
 +toBool('inactive'); // Implicit conversion from string "inactive" to true, only "", "0" or "1" are allowed 
 +toBool('false'); // Implicit conversion from string "false" to true, only "", "0" or "1" are allowed 
 +</PHP>
  
-This is also possible with the [[rfc:skipparams]] RFCbut named args make the intended behavior a lot more clearCompare:+In the long-term these deprecations should be raised to a warning or to a ''TypeError''. The earliest for that is the next major version (PHP 9.0). At that time there will have been multiple years of experience with these new deprecation noticesmaking it easier to decide on how to continue and the impact in the PHP ecosystem.
  
-<code php> +===== Rationale =====
-htmlspecialchars($string, default, default, false); +
-// vs +
-htmlspecialchars($string, double_encode => false); +
-</code>+
  
-Seeing the first line you will not know what the ''false'' argument does (unless you happen to know the ''htmlspecialchars'' signature by heart), whereas the ''%%double_encode => false%%'' variant makes the intention clear.+This RFC boils down to these questions:
  
-The benefit of making code self-documenting obviously even applies when you are not skipping optional arguments. E.g. compare the following two lines:+  * Are you losing information when you reduce a value like -375, "false" or NaN to true for a typed boolean? 
 +  * Would you want to know when a value like -375, "false" or NaN is given to a typed boolean in a codebase? 
 +  * How likely is it that such a coercion is unintended? 
 +  * What about other boolean coercions in PHP? (this is handled in the next section)
  
-<code php> +The main motivation for this RFC is to reduce the possibility of errors when using the boolean type in a similar way that you cannot give a typed int a non-number string - if you provide "falseor "49xto an int argument it will result in a TypeError. It will not be silently coerced to 0 (or 49)as that loses information and can lead to subtle bugs. This RFC does the same thing for boolean types:
-$str->contains("foo", true); +
-// vs +
-$str->contains("foo", caseInsensitive => true); +
-</code>+
  
-Currently you can already get something similar to named arguments by taking an ''$options'' array as parameter, which would be used like this:+  * Avoid losing information when an unusual value is coerced to boolean type 
 +  * Make the boolean type and the type juggling system safer and more consistent 
 +  * Setting up only 7 scalar values as unambiguous boolean values is easy to document and reason about
  
-<code php+When implementing this feature I found two bugs in php-src tests and the test runner that are most likely typical cases:
-htmlspecialchars($string, ['double_encode' => false]); +
-</code>+
  
-Using an ''$options'' array is not much more verbose at the call-site than named arguments, but it has several drawbacks which make it a lot less practical than actual named args:+  * In the PHP test runner the strings "success" and "failed" were given to a boolean function argument called $status. Maybe that argument was a string previously and changed to a boolean by mistake, but it clearly was a bug that has never been noticed so far. 
 +  * In an IMAP test a boolean argument $simpleMessages always got the string "multipart". I found out that there was another function definition which had the argument $new_mailbox at that position. This was most likely a copy-paste error or the wrong function was looked up when writing the test. 
 +   
 +Changing the type of an argument, return or property in a codebase happens often, and because the boolean type accepts everything with no complaint it makes it easy to miss problems when changing type to bool. In current PHP codebases there are likely a few of these unintended coercions to booleans which would be easy to fix if a developer noticed that an unusual value is coerced to true.
  
-  * The available options are not documented in the signature. You have to look into the code to find out. +While using strict_types is an option to avoid unintended type coercions, the goal of this RFC is to make coercions less error-prone when not using strict_typesSilently coercing "failed" (or -37486or 0.01) to true seems like an invitation to unexpected behaviorBy introducing this deprecation notice users will have the chance of finding surprising boolean coercions in their code while the coercion behavior will remain the same.
-  * Handling ''$options'' requires more code in the implementationbecause default values have to be merged and values extractedEspecially if you also want to throw an error if an unknown option is specified things get complicated. +
-  * Something like ''$options'' always needs to be explicitly implemented, whereas named arguments always workIn particular they will also be usable for existing (and internal) functions. All functions will be able to benefit from the readability improvements.+
  
-Lastly, named arguments allow a new sort of variadic function, one which can take not just an ordered list of values, but also a list of key-value pairs. Sample application is the ''%%$db->query()%%'' method, which would now be able to support named parameters too:+===== Other boolean coercions in PHP =====  
  
-<code php> +Typed booleans (argumentsreturnspropertiesas discussed in this RFC are not the only part of PHP where implicit boolean coercions happen. They also occur in expressions like ''if'', the ternary operator ''?:'', or logical operators ''&&'' ''||''. Whenever an expression in that context is not clearly true or false it is implicitly coerced to true or false.
-// currently possible: +
-$db->query( +
-    'SELECT * from users where firstName = ? AND lastName = ? AND age > ?', +
-    $firstName$lastName$minAge +
-)+
-// named args additionally allow: +
-$db->query( +
-    'SELECT * from users where firstName = :firstName AND lastName = :lastName AND age > :minAge', +
-    firstName => $firstName, lastName => $lastName, minAge => $minAge +
-); +
-</code>+
  
-===== Implementation =====+However in these expressions you can use any values and are not restricted to scalar types like with typed booleans:
  
-==== Internally ====+<PHP> 
 +if ($variable) { // identical to if ($variable == true) 
 +  // the $variable in the if statement is coerced in the following way: 
 +  // - true for a string if it is not empty and not '0' 
 +  // - true for an int if it is not zero 
 +  // - true for a float if it is not zero 
 +  // - true for an array if it is not empty 
 +  // - always true for a resource 
 +  // - always true for an object 
 +  // - always false for null 
 +}
  
-Named args are internally passed the same way as other arguments (via the VM stack). They differ in that positional arguments are always passed on top of the stack whereas named arguments can be inserted into the "stack" in any order. Stack positions that are not used contain the value ''NULL''. The argument count that is pushed after the arguments includes the ''NULL'' arguments in the count.+if ($array
 +  // executed for a non-empty array 
 +}
  
-==== Errors ====+toBool($array); // TypeError, must be of type bool, array given 
 +</PHP>
  
-While it is possible to mix positional and named arguments, the named arguments always have to come last. Otherwise a compile error is thrown:+Typed booleans behave differently compared to these expressions because they do not accept arrays, resources, objects and null. Further restricting typed booleans is therefore not a change which makes the language more inconsistenton the contrary, it could be an opportunity to differentiate these two use cases from each other, as they often have different expectations already:
  
-<code php+<PHP
-strpos(haystack => "foobar""bar"); +// often used to check if $string is not emptyand it is reasonably clear 
-// Fatal error: Cannot pass positional arguments after named arguments +if ($string{ 
-</code>+  // do something with $string here 
 +}
  
-If a named argument is not known (parameter with that name does not exist) and the function is not variadic (more on that later) fatal error is thrown:+$obj->boolProperty = $string; // did you want to check if $string is not empty here? 
 +                              // is it value from a form, API or DB that should be '', '0' or '1'? 
 +                              // or is it mistake because something is missing? 
 +</PHP>
  
-<code php> +When giving a typed boolean a scalar value you are reducing an int, float or string to a boolean, possibly losing information, and not evaluating an expression where there is follow-up code to do something more as is the case with ''if''''?:'' or ''&&'' ''||''. By limiting the values of a typed boolean the previous example becomes less ambiguous:
-strpos(hasytack => "foobar"needle => "bar"); +
-// Fatal errorUnknown named argument $hasytack +
-</code>+
  
-When named arguments are in usedit can happen that the same parameter is set twice. In this case the newer value will overwrite the older one and warning is thrown:+<PHP> 
 +$obj->boolProperty = $string; // $string must be '''0' or '1', otherwise we get deprecation notice 
 +$obj->boolProperty = strlen($string) > 0; // instead check that $string is not empty 
 +</PHP>
  
-<code php> +===== filter extension =====
-function test($a, $b) { var_dump($a, $b); }+
  
-test(1, 1, a => 2); // 2, 1 +The filter extension has its own way to validate booleans (FILTER_VALIDATE_BOOLEAN):
-// WarningOverwriting already passed parameter 1 ($a) +
-test(a => 1, b => 1, a => 2); // 2, 1 +
-// Warning: Overwriting already passed parameter 1 ($a) +
-</code>+
  
-==== Collecting unknown named arguments ====+  * "1", "true", "on" and "yes" evaluate to true, everything else to false 
 +  * if FILTER_NULL_ON_FAILURE is also used, only "0", "false", "off", "no" and "" evaluate to false, everything else to null 
 +   
 +This behavior is incompatible with how PHP handles boolean coercions, making it impossible to resolve the behaviors without massive BC breaks. But it does add another argument in favor of this RFC - somebody switching from the filter extension to built-in boolean types has a big chance of accidentally introducing behavior changes in their application:
  
-Functions declared as variadic using ''%%...$args%%'' syntax will also collect unknown named arguments into ''$args''The unknown named arguments will always follow after any positional arguments and will be in the order in which they were passed.+  * PHP converts most values to true, while the filter extension converts these values to false (or null) - for example "success", "false", "off", "5", or "-30" 
 +  * The deprecation notice would make all these occurences visible and easy to fix 
 +   
 +Usages of FILTER_VALIDATE_BOOLEAN are otherwise not affected by this RFC - that behavior remains unchanged.
  
-Example of the behavior:+===== Considered alternatives =====
  
-<code php> +It was briefly considered to allow more values for typed booleans instead of only 0, 1 and an empty string - for example the string "on"But it would be difficult and a bit arbitrary to determine where to draw the line for possible values, and an important goal of this RFC is for the coercion behavior to be simple and intuitive to understand0 and 1 are common alternative values to express a boolean in many programming languages, in databases and in APIs. Other values are not as widely used and would only make the coercion behavior more difficult to understand.
-function test(...$args) { var_dump($args); }+
  
-test(1, 2, 3, a => 'a', b => 'b'); +Another possibility would have been to also change the behavior of boolean coercionsfor example coerce the string "falseto false instead of true. Yet this would be quite substantial BC break with no obvious benefits. With this RFC there will be a deprecation notice when coercing "falseto true in order for such behavior to be noticed instead of having to change it.
-// [1, 2, 3, "a" => "a", "b" => "b"+
-</code>+
  
-An example usage is the ''%%$db->query()%%'' method already mentioned above, which could now also work with named bound parameters.+===== Implementation notes =====
  
-This feature is known as ''%%**kwargs%%'' in Python.+As this is my first RFC and my first contribution to php-src, I mimicked the code from the "Deprecate implicit non-integer-compatible float to int conversions" RFC (https://github.com/php/php-src/pull/6661). I added some tests and made sure the existing tests still pass. There might be some room for improvements on my implementation though, so any feedback is welcome!
  
-==== Unpacking named arguments ====+===== Backward Incompatible Changes =====
  
-The ''%%foo(...$args)%%'' unpacking syntax from the [[rfc:argument_unpacking]] RFC also supports unpacking named parameters:+The following operations will now emit an ''E_DEPRECATED'' if any scalar value other than "", "0", "1", 0, 1, 0.0, 1.0 is used:
  
-<code php> +  * Assignment to a typed property of type ''bool'' in coercive typing mode 
-$params = ['needle' => 'bar', 'haystack' => 'barfoobar', 'offset' => 3]; +  * Argument for a parameter of type ''bool'' for both internal and userland functions in coercive typing mode 
-strpos(...$params); // int(6) +  * Returning such a value for userland functions declared with a return type of ''bool'' in coercive typing mode
-</code> +
- +
-Any value with a string key is unpacked as a named parameter. Other key types (for arrays only integers) are treated as normal positional arguments. +
- +
-It's possible to unpack both positional and named args in one go, but named arguments have to strictly follow any positional arguments. If positional argument is encountered after a named argument a warning is thrown and the unpacking operation aborted. +
- +
-==== func_* and call_user_func_array ==== +
- +
-If (due to the usage of named arguments) some arguments are missing (''NULL'' on the stack) the ''func_*'' functions behave as follows: +
- +
-  * ''func_get_args()'' will not include the missing offsets in the resulting array +
-  * ''func_get_arg($n)'' will throw its usual "Argument %ld not passed to function" warning and return ''false'' +
-  * ''func_num_args()'' returns the number of arguments including the ''NULL''s.+
      
-All three functions are also oblivious to the collection of unknown named arguments by variadics. ''func_get_args()'' will not return the collected values and ''func_num_args()'' will not include them in the argument count. +The actual conversion to a boolean value remains unchanged - anything that was coerced to false before will still be coerced to false, and anything coerced to true will still be coerced to true.
-   +
-The ''call_user_func_array'' function will continue behaving exactly as is - it does not support named parameters. Unpacking of named parameters is only supported using the ''%%...$options%%'' syntax. (Adding support to ''call_user_func_array'' would break any code that's passing an array with string keys.)+
  
-Generally: The ''func_*'' and ''call_user_func_array'' functions are designed to stay close to their old behavior+The following shows typical ways to avoid a deprecation notice:
  
-===== Open questions =====+<PHP> 
 +// Resolution 1: Check for an expected value or range 
 +toBool($number > 0); 
 +toBool($int === 5); 
 +toBool($string === 'success'); 
 +toBool(strlen($string) > 0); 
 +  
 +// Resolution 2: Check for truthiness 
 +toBool($scalar == true); 
 +  
 +// Resolution 3: Explicitly cast the argument 
 +toBool((bool) $scalar); 
 +</PHP>
  
-==== Syntax ====+With the many deprecation notices that appeared in PHP 8.0 and 8.1 there is some wariness if more deprecation notices are worth it. These are the arguments why the RFC author thinks it will be worth it without too much pain:
  
-The current implementation (and proposal) support the following two syntaxes for named parameters: +  * Each individual case is easy to fix, the easiest (but also least useful) is to loosly compare value to true ($value == trueinstead of directly giving the value to a typed bool 
- +  * Most of the coercions that will lead to a deprecation notice are likely to be unintended and the information given in the notice should make it reasonably clear to a developer whether it is a bug and how to fix it 
-<code php> +  * bool arguments for internal functions are usually optional, less numerous and are much more likely to be set by a constant expression than a variable 
-test(foo => "oof", bar => "rab")+  * deprecation notices do not demand immediate attentionand the "disadvantageof the high number of deprecation notices with 8.0 and 8.1 should be that most tooling and codebases have gotten more used to dealing with them in their own time and not see them as an immediate call to action
-test("foo" => "oof", "bar" => "rab"); +
-</code> +
- +
-The second syntax is supported in order to allow named arguments where the parameter name is reserved keyword: +
- +
-<code php> +
-test(array => [1, 2, 3]);   // syntax error +
-test("array" => [1, 2, 3]); // works +
-</code> +
- +
-The choice of this syntax is mostly arbitrary, I didn't put much thought into it. Here are some alternative syntax proposals courtesy of Phil Sturgeon: +
- +
-<code php> +
-// currently implemented: +
-test(foo => "oof", bar => "rab"); +
-test("foo" => "oof", "bar" => "rab"); +
- +
-// suggestions (can use keywords): +
-test($foo => "oof", $bar => "rab"); +
-test(:foo => "oof", :bar => "rab"); +
- +
-// suggestions (cannot use keywords): +
-test(foo = "oof", bar = "rab"); +
-test(foo: "oof", bar: "rab"); +
- +
-// not possible because already valid code: +
-test($foo = "oof", $bar = "rab"); +
-</code> +
- +
-Which one(s) of these we want to support is up to discussion. +
- +
-==== Collection of unknown named args into ...$opts ==== +
- +
-The current implementation / proposal suggests to use the ''%%...$opts%%'' syntax both for positional variadics and for named variadics. Python takes a different approach where the former are collected into ''%%*args%%'' and the latter into ''%%**kwargs%%''+
- +
-Pro current solution: +
- +
-  * Seems very PHP-like to do it this waybecause PHP allows mixing "normalarrays and dictionaries, which is an option Python does not have.+
      
-Con current solution: +===== Proposed PHP Version =====
- +
-  * Having a separate syntax for capturing unknown named args makes the intention clearer: You don't always want to support **both** positional and named variadics. Separate syntax allows you to enforce one type or the other. +
-   +
-Opinions and arguments how to handle this are welcome. +
- +
-==== Unpacking named args ==== +
- +
-The same question comes up for argument unpacking: Should the ''%%...$foo%%'' notation be used both for unpacking positional and named arguments, or do we want separate ''%%*$foo%%'' and ''%%**$foo%%'' notations? +
- +
-In any case, this descision should mirror the one for the previous question. +
- +
-==== Parameters names are part of the contract ==== +
- +
-Currently parameter names are not part of the contract: In an interface implementation you can rename parameters as much as you like, it won't make a difference to the caller. Named arguments change this. If an inheriting class changes a parameter name calls using named args might fail, thus violating LSP: +
- +
-<code php> +
-interface A { +
- public function test($foo, $bar); +
-+
- +
-class B implements A { +
- public function test($a, $b) {} +
-}+
  
-$obj = new B;+Next minor version: PHP 8.2.
  
-// Pass params according to A::test() contract +===== Unaffected PHP Functionality =====
-$obj->test(foo => "foo", bar => "bar"); // ERROR! +
-</code>+
  
-If named parameters are introduced, signature validation should include parameter names. Throwing fatal error (for the interface/class combination) would break backwards compatibility thoughWe could use some lower error type (warningnoticestrictfor this.+  * Manually casting to boolean will not raise notice. 
 +  * Strict Type behaviour is unaffected. 
 +  * Implicit boolean expressions (as used in ifternarylogic operatorsare not affected. 
 +  * FILTER_VALIDATE_BOOLEAN in the filter extension is not affected.
  
-===== Patch =====+===== Patches and Tests =====
  
-You can find the diff for the work-in-programm patch here: https://github.com/nikic/php-src/compare/splat...namedParams. The patch is incomplete, dirty and has known bugs.+Patch: https://github.com/php/php-src/pull/8565
  
-Credits: The patch includes some of the work that Stas' did for the skipparams RFC.+===== References =====
  
-Work that still needs to be done:+Initial mailing list discussion<https://externals.io/message/117608> \\ 
 +RFC mailing list discussion: <https://externals.io/message/117732> \\
  
-  * Implement the results of "Open questions" 
-  * Update all arginfos of internal functions to match the documentation (and improve names along the way). The current arginfo structs are hopelessly outdated. I hope that this work can be done mainly automatically. (Note: After named parameters are introduced the argument names are frozen and should not be changed.) 
-  * Make sure that internal functions properly handle skipped arguments. This should work in most cases automatically, but I'm sure that there are quite a few cases where additional adjustments need to be done. Hopefully misbehaving functions can be found through fuzzing. 
rfc/named_params.txt · Last modified: 2022/05/26 15:31 by iquito