rfc:named_params

PHP RFC: Named Parameters

  • Version: 0.9
  • Date: 2013-09-06
  • Author: Nikita Popov nikic@php.net
  • Status: Under Discussion
  • Proposed for: PHP 5.6

State of this RFC

This is a preliminary RFC for named parameters. It's purpose is to find out if we want to support them in the next PHP version and if so, how the implementation should work. The syntax and behavior described here are just basic ideas that still need to be fleshed out.

The implementation that accompanies this proposal is not complete yet. As this is a very complicated feature I do not wish to spend time finishing it without knowing that we actually want this feature.

The implementation is based off and includes the variadics and argument_unpacking RFCs. I think they are essential if we implement named params (otherwise we couldn't have good error handling for unknown named params and no unpacking support at all, unless we break BC in call_user_func_array). The choice of how they should be implemented somewhat depends on whether we want to support named params, so I'm doing this proposal first.

What are named arguments?

Named arguments are a way to pass arguments to a function, which makes use of the parameter names rather than the position of the parameters:

// Using positional arguments:
array_fill(0, 100, 42);
// Using named arguments:
array_fill(start_index => 0, num => 100, value => 42);

The order in which the named arguments are passed does not matter. The above example passes them in the same order as they are declared in the function signature, but any other order is possible too:

array_fill(value => 42, num => 100, start_index => 0);

It is possible to combine named arguments with normal, positional arguments and it is also possible to specify only some of the optional arguments of a function, irregardless of their order:

htmlspecialchars($string, double_encode => false);
// Same as
htmlspecialchars($string, ENT_COMPAT | ENT_HTML401, 'UTF-8', false);

What are the benefits of named arguments?

One obvious benefit of named arguments can be seen in the last code sample (using htmlspecialchars): You no longer have to specify all defaults until the one you want to change. Named args allow you to directly overwrite only those defaults that you wish to change.

This is also possible with the skipparams RFC, but named args make the intended behavior a lot more clear. Compare:

htmlspecialchars($string, default, default, false);
// vs
htmlspecialchars($string, double_encode => false);

Seeing the first line you will not know what the false argument does (unless you happen to know the htmlspecialchars signature by heart), whereas the double_encode => false variant makes the intention clear.

The benefit of making code self-documenting obviously even applies when you are not skipping optional arguments. E.g. compare the following two lines:

$str->contains("foo", true);
// vs
$str->contains("foo", caseInsensitive => true);

Currently you can already get something similar to named arguments by taking an $options array as a parameter, which would be used like this:

htmlspecialchars($string, ['double_encode' => false]);

Using an $options array is not much more verbose at the call-site than named arguments, but it has several drawbacks which make it a lot less practical than actual named args:

  • The available options are not documented in the signature. You have to look into the code to find out.
  • Handling $options requires more code in the implementation, because default values have to be merged and values extracted. Especially if you also want to throw an error if an unknown option is specified things get complicated.
  • Something like $options always needs to be explicitly implemented, whereas named arguments always work. In particular they will also be usable for existing (and internal) functions. All functions will be able to benefit from the readability improvements.

Lastly, named arguments allow a new sort of variadic function, one which can take not just an ordered list of values, but also a list of key-value pairs. Sample application is the $db->query() method, which would now be able to support named parameters too:

// currently possible:
$db->query(
    'SELECT * from users where firstName = ? AND lastName = ? AND age > ?',
    $firstName, $lastName, $minAge
);
// named args additionally allow:
$db->query(
    'SELECT * from users where firstName = :firstName AND lastName = :lastName AND age > :minAge',
    firstName => $firstName, lastName => $lastName, minAge => $minAge
);

Implementation

Internally

Named args are internally passed the same way as other arguments (via the VM stack). They differ in that positional arguments are always passed on top of the stack whereas named arguments can be inserted into the “stack” in any order. Stack positions that are not used contain the value NULL. The argument count that is pushed after the arguments includes the NULL arguments in the count.

Errors

While it is possible to mix positional and named arguments, the named arguments always have to come last. Otherwise a compile error is thrown:

strpos(haystack => "foobar", "bar");
// Fatal error: Cannot pass positional arguments after named arguments

If a named argument is not known (a parameter with that name does not exist) and the function is not variadic (more on that later) a fatal error is thrown:

strpos(hasytack => "foobar", needle => "bar");
// Fatal error: Unknown named argument $hasytack

When named arguments are in used, it can happen that the same parameter is set twice. In this case the newer value will overwrite the older one and a warning is thrown:

function test($a, $b) { var_dump($a, $b); }
 
test(1, 1, a => 2); // 2, 1
// Warning: Overwriting already passed parameter 1 ($a)
test(a => 1, b => 1, a => 2); // 2, 1
// Warning: Overwriting already passed parameter 1 ($a)

Collecting unknown named arguments

Functions declared as variadic using ...$args syntax will also collect unknown named arguments into $args. The unknown named arguments will always follow after any positional arguments and will be in the order in which they were passed.

Example of the behavior:

function test(...$args) { var_dump($args); }
 
test(1, 2, 3, a => 'a', b => 'b');
// [1, 2, 3, "a" => "a", "b" => "b"]

An example usage is the $db->query() method already mentioned above, which could now also work with named bound parameters.

This feature is known as **kwargs in Python.

Unpacking named arguments

The foo(...$args) unpacking syntax from the argument_unpacking RFC also supports unpacking named parameters:

$params = ['needle' => 'bar', 'haystack' => 'barfoobar', 'offset' => 3];
strpos(...$params); // int(6)

Any value with a string key is unpacked as a named parameter. Other key types (for arrays only integers) are treated as normal positional arguments.

It's possible to unpack both positional and named args in one go, but named arguments have to strictly follow any positional arguments. If a positional argument is encountered after a named argument a warning is thrown and the unpacking operation aborted.

func_* and call_user_func_array

If (due to the usage of named arguments) some arguments are missing (NULL on the stack) the func_* functions behave as follows:

  • func_num_args() returns the number of arguments including the NULLs.
  • func_get_arg($n) will return the default value (or NULL if there is no default value)
  • func_get_args() will use the default values (or NULL in cases where there is no default value) at the missing offsets.

All three functions are also oblivious to the collection of unknown named arguments by variadics. func_get_args() will not return the collected values and func_num_args() will not include them in the argument count.

The call_user_func_array function will continue behaving exactly as is - it does not support named parameters. Unpacking of named parameters is only supported using the ...$options syntax. (Adding support to call_user_func_array would break any code that's passing an array with string keys.)

Generally: The func_* and call_user_func_array functions are designed to stay as close as possible to their old behavior.

Open questions

Syntax

The current implementation (and proposal) support the following two syntaxes for named parameters:

test(foo => "oof", bar => "rab");
test("foo" => "oof", "bar" => "rab");

The second syntax is supported in order to allow named arguments where the parameter name is a reserved keyword:

test(array => [1, 2, 3]);   // syntax error
test("array" => [1, 2, 3]); // works

The choice of this syntax is mostly arbitrary, I didn't put much thought into it. Here are some alternative syntax proposals (most courtesy of Phil Sturgeon):

// currently implemented:
test(foo => "oof", bar => "rab");
test("foo" => "oof", "bar" => "rab");
 
// suggestions (can use keywords):
test($foo => "oof", $bar => "rab");
test(:foo => "oof", :bar => "rab");
test($foo: "oof", $bar: "rab");
 
// suggestions (cannot use keywords):
test(foo = "oof", bar = "rab");
test(foo: "oof", bar: "rab");
 
// not possible because already valid code:
test($foo = "oof", $bar = "rab");

Which one(s) of these we want to support is up to discussion.

Collection of unknown named args into ...$opts

The current implementation / proposal suggests to use the ...$opts syntax both for positional variadics and for named variadics. Python takes a different approach where the former are collected into *args and the latter into **kwargs.

Pro current solution:

  • Seems very PHP-like to do it this way, because PHP allows mixing “normal” arrays and dictionaries, which is an option Python does not have.

Con current solution:

  • Having a separate syntax for capturing unknown named args makes the intention clearer: You don't always want to support both positional and named variadics. Separate syntax allows you to enforce one type or the other.

Opinions and arguments how to handle this are welcome.

Unpacking named args

The same question comes up for argument unpacking: Should the ...$foo notation be used both for unpacking positional and named arguments, or do we want separate *$foo and **$foo notations?

In any case, this descision should mirror the one for the previous question.

Signature validation allows changing parameter names

Currently parameter names are not part of the signature-contract. When only positional arguments are used, this is quite reasonable: The name of the parameter is irrelevant to the caller. Named arguments change this. If an inheriting class changes a parameter name, calls using named args might fail, thus violating LSP:

interface A {
	public function test($foo, $bar);
}
 
class B implements A {
	public function test($a, $b) {}
}
 
$obj = new B;
 
// Pass params according to A::test() contract
$obj->test(foo => "foo", bar => "bar"); // ERROR!

If named parameters are introduced, signature validation should make sure that parameter names are not changed. Usually signature mismatches between an interface and an implementing class throw a fatal error, but this is not possible in this case due to the large BC break it would cause. Instead we could use some lower error type for this (warning / notice / strict).

To address one specific discussion point relating to this: PHP has left behind the practice of introducing ini settings that change language runtime behavior. As such making this error controlled by an ini setting is not an option, in my eyes at least.

Patch

You can find the diff for the work-in-programm patch here: https://github.com/nikic/php-src/compare/splat...namedParams. The patch is incomplete, dirty and has known bugs.

Credits: The patch includes some of the work that Stas' did for the skipparams RFC.

Work that still needs to be done:

  • Implement the results of “Open questions”
  • Update all arginfos of internal functions to match the documentation (and improve names along the way). The current arginfo structs are hopelessly outdated. I hope that this work can be done mainly automatically. (Note: After named parameters are introduced the argument names are frozen and should not be changed.)
  • Make sure that internal functions properly handle skipped arguments. This should work in most cases automatically, but I'm sure that there are quite a few cases where additional adjustments need to be done. Hopefully misbehaving functions can be found through fuzzing.

Changelog

  • 09.09.2013 - func_get_arg(s) now return default values on skipped parameters.
rfc/named_params.txt · Last modified: 2013/09/09 17:23 by nikic