Anonymous functions in PHP can be verbose, in part due to the need to manually import used variables. This makes code using simple closures hard to read and understand.
Arrow Functions were introduced in PHP 7.4 as an alternative. However, the single-expression limitation can lead to complex one-liners, or makes Arrow Functions unfit in many use-cases that would benefit from a more concise syntax.
This RFC proposes an extension of the Arrow Function syntax supporting multiple statements:
$guests = array_filter($users, fn ($user) { $guest = $repository->findByUserId($user->id); return $guest !== null && in_array($guest->id, $guestsIds); });
Short Closures extend Arrow Functions by allowing multiple statements enclosed in {
and }
instead of a single expression:
fn (parameter_list) { statement_list; }
The statement_list
is a sequence of statements separated by semicolons. A return
statement must be used to return a value.
The syntax and behavior otherwise match those of Arrow Functions.
Like Arrow Functions, Short Closures use auto capture by-value. When a variable used in the Short Closure is defined in the parent scope it will be automatically captured by-value. In the following example the functions $fn1, $fn2, and $fn3 behave the same:
$y = 1; $fn1 = fn ($x) => $x + $y; $fn2 = fn ($x) { return $x + $y; }; $fn3 = function ($x) use ($y) { return $x + $y; };
Explicit capture is not included in the new syntax. It remains available only via the existing long-closure syntax, which only captures explicitly. Earlier versions of this proposal included mixing auto-capture and explicit capture, but it was determined that was too confusing.
The signature accepts the same syntax as that of Arrow Functions:
fn () { } fn ($a, $b) { } fn ($a, ...$args) { } // Variadic parameter fn (int $a): string { } // Type hints fn ($a = 42) { } // Parameter default value fn &($a) { } // Return by-reference fn (&$a) { } // Pass by-reference
The signature must be followed by {
, a statement list, and }
:
fn () { return 1; } fn () { print 1; } fn () { $tmp = $a + $b; return $tmp; }
Note that Short Closures with a multi-statement body do not have an implicit return value. A return
statement must be used to return a value.
The syntax choice here is consistent with other language constructs:
{ ... }
denotes a statement list, without implicit return value.=>
token is followed by an expression in all circumstances. (Arrow Functions, arrays, and match()
.)fn
keyword indicates a function that will auto-capture variables, by-value.function
keyword indicates a function that has no auto-capture.These rules are easily recognizable and learnable by developers.
Arrow Functions were added as an alternative to Anonymous Functions. The latter can be quite verbose, even when they only perform a simple operation. This is due to a large amount of syntactic boilerplate that is needed to manually import used variables with the use
keyword.
While Arrow Functions solve this problem to some extent, the one-expression limit can lead to one-liners with non ideal readability, or can make them unfit for some use-cases. There are ample cases where breaking an expression to 2-3 statements is required or would improve the legibility of the code.
As an example, writing the following code snippet with a single-expression Arrow Function would degrade legibility, but writing it as an Anonymous Function would be cumbersome:
$guests = array_filter($users, fn ($user) { $guest = $repository->findByUserId($user->id); return $guest !== null && in_array($guest->id, $guestsIds); });
Auto capture was first introduced by Arrow Functions.
In the past, there had been reticence about auto-capture that has kept it out of evolutions in closures. Mostly that has boiled down to a few concerns: Implementation difficulties, performance, and debugability.
Implementation difficulties arise from by-reference or by-variable semantics, especially when supporting dynamic means of accessing variables like variable-variables, compact(), or eval(). In this proposal and in Arrow Functions, the implementation difficulties are eliminated by using by-value semantics and requiring dynamically accessed variables to be captured explicitly.
As noted in the benchmarks section, the implementation offered here has effectively no performance impact either way.
In the majority of cases where closures are used in practice, the code involved is short enough that debugging is not hampered by automatic capture. They are usually only a few lines long, easily small enough to fit into a developer's short term memory while reading it. What variables are captured is visually self-evident.
Potential confusing behavior is further mitigated by PHP's (correct) use of by-value capture, which minimizes the potential for inadvertent confusing changes to values from closures.
Furthermore, as noted PHP is unusual in requiring explicit capture. The only other language that does so is C++. Most languages get along fine without that extra step.
For those few cases in which, for whatever reason, the developer is concerned about auto-capture reducing debugability or about accidental capture, the existing explicit-only syntax remains valid and unchanged.
Using variables from the parent block is not unusual in PHP. We do it all the time in loops.
In the following example, the loop uses three variables from the parent block. We have learned to recognize that what follows a foreach
, for
, or while
keyword can do that.
$guests = []; foreach ($users as $user) { $guest = $repository->findByUserId($user->id); if ($guest !== null && in_array($guest->id, $guestsIds)) { $guests[] = $guest; } }
In the following example, the function uses two variables from the parent block, which should not be more surprising than with a loop once we have learned that what follows a fn
keyword can do that, like we did with foreach
.
$guests = array_filter($users, fn ($user) { $guest = $repository->findByUserId($user->id); return $guest !== null && in_array($guest->id, $guestsIds); });
However the comparison stops here. These two examples do not behave equally with regard to side effects: Variable assignments to the $guest
and $user
variables in the loop can be observed after the loop, but the same is not true with the Short Closure.
It is important to note that the default capture mode in Anonymous Functions, Arrow Functions, and Short Closures is by-value. This purposefully differs from the semantics commonly found in other programming languages.
A by-value capture means that it is not possible to modify any variables from the outer scope:
$a = 1; $f = fn () { $a++; // Has no effect outside of the function $tmp = $a + 1; // Has no effect outside of the function return $tmp; }; print $a; // prints "1" $f(); print $a; // prints "1" (again)
Conversely, the outer scope cannot modify variables in the function:
$a = 1; $f = fn () { print $a; }; $f(); // prints "1" $a = 2; $f(); // prints "1" (again)
Because variables are bound by-value, the confusing behaviors often associated with closures do not exist. As an example, the following code snippet demonstrates such a behavior in JavaScript:
// JavaScript var fns = []; for (var i = 0; i < 3; i++) { fns.push(function() { console.log(i); }); } for (var k in fns) { var fn = fns[k]; fn(); // Prints "3", "3", "3" }
In PHP the behavior is intuitive and less confusing:
// PHP $fns = []; for ($i = 0; $i < 3; $i++) { $fns[] = fn () { print $i; }; } foreach ($fns as $fn) { $fn(); // Prints "0", "1", "2" }
In JavaScript the same output can be obtained by declaring i
with the let
keyword. Using the var
keyword, and loops, is largely discouraged. However i
is still captured by-variable (not to be confused with by-value), so the anonymous functions can still modify the value of i
. A different behavior can be obtained with the const
keyword.
In PHP, the variable is captured by-value, thus entirely avoiding the confusion.
Of course, functions can have side-effects when accessing mutable values such as objects or resources. The following example demonstrates this:
$d = new DateTime(); $fn1 = fn () { $d->modify('+ 1 day'); // Has an effect on the object bound to $d }; $fn2 = function () use ($d) { $d->modify('+ 1 day'); // Has an effect on the object bound to $d }; $fn3 = function (DateTime $d) { $d->modify('+ 1 day'); // Has an effect on the object bound to $d };
The RFC inherits the auto-capture semantics of Arrow Functions. These semantics can be stated as follows:
Short Closures can access a snapshot of the variable bindings of their declaring scope by accessing variables literally. The snapshot is taken when the function is declared. Assignments to variables do not have an effect on the declaring scope.
This can also be stated as follows:
Short Closures can read variables of their declaring scope by accessing variables literally. The values of these variables are the ones that were bound to them at function declaration. Assignments to variables do not have an effect on the declaring scope.
This is implemented by binding the value of the declaring scope variables to local variables in the function. This is referred to as capture in this RFC.
This RFC leaves unspecified which variables are captured, as long as these semantics are maintained.
A naive approach would capture all the variables that are accessed literally by the closure. This will commonly capture variables that are not necessary to maintain these semantics. In the following example, the variable $tmp
would be captured although this is not necessary because it is always assigned before being read (remember that variable assignments do not have an effect outside of the closure).
$tmp = 5; fn () { $tmp = foo(); bar($tmp); return $tmp; }
This approach would result in a waste of memory or CPU usage.
The implementation proposed in this RFC prevents this by attempting to capture the smallest possible set of variables necessary to maintain these semantics. In practice, Short Closures end up capturing the same set of variables that Anonymous Functions with a manually curated capture list would have captured. This was observed on the PHPStan code base by converting all Anonymous Functions to Short Closures, and looking at which variables were automatically captured after that.
These implementation details are irrelevant for most purposes, as they do not have an effect on the behavior of the program, apart from the marginal cases listed in the next subsection. However, the exact behavior can be defined as follows:
global
static
unset()
This optimization is not applied to Arrow Functions because variable bindings are unusual in these functions.
As long as the semantics are maintained, whether a variable is captured or not is largely irrelevant for most purposes, and can be observed only in marginal cases. These cases are listed here.
compact()
function, whose use is largely discouraged in modern PHP, can only see variables that are captured.use
list.The capture analysis used in this RFC will only capture the variables that may be read before being assigned by the function. This uses the Optimizer's implementation of live-variable analysis.
This maintains the semantics described earlier, so an understanding of these semantics is enough to reason about Short Closures.
In benchmarks, the implementation in the 1.0 version of this RFC showed a notable CPU and memory increase when using auto-capturing multi-statement closure in some cases.
The 2.0 version, proposed here, has only marginal impact compared to PHP 8.1, well within the margin of error for profiling tools. In some cases the profiling run shows the Short Closure version being slightly more performant, which is likely just random test jitter between runs. We therefore conclude that the performance impact of this approach is effectively zero.
The capture analysis approach described above makes Short Closures as efficient as Anonymous Functions.
For more benchmark details, see: https://gist.github.com/arnaud-lb/d9adfbf786ce9e37c7157b0f9c8d8e13
The existing Anonymous Function syntax remains valid, and there is no intent to deprecate it.
There has been related discussion of multi-line expressions, specifically in the context of match()
arms. We considered whether multi-line expressions made sense as an alternative approach, but decided against it as that introduces considerably more edge cases both syntactically and in the engine.
As a side benefit, the syntax proposed here does offer a somewhat round-about way to have a multi-line match()
arm. This is not a deliberate feature of the RFC, but more of a convenient side-effect noted here for completeness.
$b = ...; $c = ...; $ret = match ($a) { 1, 3, 5 => (fn () { $val = $a * $b; return $val * $c; })(), 2, 4, 6 => (fn () { $val = $a + $b; return $val + $c; })(), };
While sub-optimal, it may be sufficient for the few times that a multi-statement match()
arm is needed.
As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here.
Languages commonly capture by-variable (not to be confused with by-value) or by reference. In practice this can lead to confusing effects, especially in loops. For that reason, PHP defaults to capturing by-value, which avoids this problem. This is discussed above in this RFC, as well as in Arrow Functions.
The first discussion 1 around Anonymous Functions was objected to because of the lack of closures: It would be unusual for anonymous functions to not support closures, which would surprise users and limit the usefulness of the construct. At the same time, objections against closures cited implementation difficulties and performance issues, as well as potential complexity or pitfalls most commonly found in other programming languages.
In the same and subsequent discussions 2 3 a solution was proposed to use explicit capture with a new keyword, lexical
, close in many aspects to the global
keyword. Alternative syntaxes were later proposed that would allow to choose between by-reference and by-value capture, ultimately leading to the current use($x)
syntax.
It is unclear whether this was chosen because of technical concerns or concerns over semantics. Objections focusing on semantics appear to have been based on those most commonly found in other programming languages. These semantics differ significantly from what is proposed here. For instance, objections cite the possibility of a kind of side-effects that would not exist with by-value capture. Discussions do not appear to have occurred in the light of by-value semantics.
The Short Closures 1.0 RFC was declined for three main reasons 4: The syntax, the lack of type declarations, and implicit capture. Objections to implicit closures appear to be based on semantics that do not exist in the current RFC.
The Arrow Functions 2.0 RFC was accepted with a large majority. Compared to the Short Closures 1.0 RFC, it addressed the syntax and type hints concerns, limited the body to only one expression, and kept implicit closure by-value.
A few people suggested implementing the same functionality via a different syntax, that is, basing it on the long-closure syntax with a use(*)
or use(...)
syntax to indicate “capture everything that makes sense” rather than building on the short-closure syntax which already “captures everything that makes sense.”
The resulting behavior in either case would be identical, making it a largely aesthetic or philosophical distinction. The authors felt that the more compact syntax is preferable, for several reasons:
fn()
syntax for a number of years now, and should be sufficiently familiar with the concept of auto-capture.function
keyword without a use
statement at all would be a semantic BC break, which is not acceptable.=>
for {}
.
For those reasons, the authors went with the fn()
-derived syntax shown here.
None.
PHP 8.2.
None.
Existing function syntaxes continues to work precisely as they do now. Only new combinations are possible.
These are some possible future extensions, but the authors don't necessarily endorse them.
It would be possible to extend the Short Closure syntax to allow an explicit use list:
$fn = fn () use ($a, &$b) { };
One anticipated use-case is to selectively capture some variables by-reference.
There are at least two possible variations of this extension. In one of them, the use list is merged with auto-capture, so that explicit uses and auto-capture can coexist on the same function. In another the use list disables auto-capture on the function.
This RFC initially proposed the first possibility. This is not included in the current version because this appeared to create confusion.
This RFC proposes an optimized auto-capture. It would be possible to apply this optimization to Arrow Functions as well, but this would be a breaking change in some rare cases.
This is not included in this RFC because most Arrow Functions would not benefit from this.
This is a simple Yes/No vote, requiring 2/3 to pass. Vote ends on 15 July 2022.
Pull Request: https://github.com/php/php-src/pull/8330
After the project is implemented, this section should contain
2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact; renamed to “Short Closures 2.0”