This is an old revision of the document!

PHP RFC: Auto-capturing multi-statement closures

Version: 2.0
Date: 2022-05-25
Author: Nuno Maduro (enunomaduro@gmail.com)
Author: Larry Garfield (larry@garfieldtech.com)
Author: Arnaud Le Blanc (arnaud.lb@gmail.com)
Status: In Discussion
First Published at: http://wiki.php.net/rfc/auto-capture-closure

Introduction

Closures (also known as lambdas or anonymous functions), have become increasingly powerful and useful in PHP in recent versions. In their current form they have two versions, long and short. Unfortunately, these two syntaxes have different, mutually-incompatible benefits. This RFC proposes a syntax for closures that combines the benefits of both for those situations where that is warranted.

// As of 8.1:
 
$y = 1;
 
$fn1 = fn($x) => $x + $y; // auto-capture + single expression
 
$fn2 = function ($x) use ($y): int { // manual-capture + statement list
   // ...
 
   return $x + $y;
};

The proposed syntax combines the auto-capture and multi-line capabilities into a single syntax:

$fn3 = fn ($x): int { // auto-capture + statement list
    // ...
 
    return $x + $y;
};

Proposal

Background

As of PHP 8.1, the following syntaxes around functions have the following meaning:

// A named, globally available function.
// No variables are auto-captured from the environment.
// The body is a statement list, with possibly a return statement.
function foo($a, $b, $c): int {
  return $a * $b * $c;
}
 
// An anonymous, locally available function.
// Variables are explicitly captured lexically. 
// The body is a statement list, with possibly a return statement.
$foo = function ($a, $b) use ($c) {
  return $a * $b * $c;
};
 
// An anonymous, locally available function.
// Variables are auto-captured lexically.
// The body is a single-expression, whose value is returned.
$foo = fn($a, $b): int => $a * $b * $c;

That is, a function may be named or local/anonymous, auto-capture or not, and a statement list or single expression. That means there are 8 possible combinations of properties, of which only three are currently supported.

The declined Short Functions RFC sought to add one additional combination: named, no-capture, single-expression.

This RFC seeks to add a different combination: anonymous, auto-capture, statement list.

The remaining combinations would be:

named function, auto-capture, statement list - This is of little use in practice as there is nothing to auto-capture, except potentially global variables.
named function, auto-capture, expression - Ibid.
anonymous function, manual-capture, expression - While this form would be possible to add, its use cases are limited. The existing short-closure syntax is superior in nearly all cases.

None of these additional variants are included in this RFC.

Auto-capture multi-statement closures

Specifically, this RFC adds the following syntax:

// An anonymous, locally available function.
// Variables are auto-captured lexically.
// The body is a statement list, with possibly a return statement;
$c = 1;
$foo = fn($a, $b):int {
  $val = $a * $b;
  return $val * $c;
};

The syntax choice here leads to the following consistent syntactic meanings:

The => symbol always means “evaluates to the expression on the right,” in all circumstances. (Named functions, anonymous functions, arrays, and match().)
{ ... } denotes a statement list, potentially ending in a return.
The function keyword indicates a function that has no auto-capture.
The fn keyword indicates a function that will auto-capture variables, by value.
A function with a name is declared globally at compile time. A function without a name is declared locally as a closure at runtime.

These rules are easily recognizable and learnable by developers.

Variable capture

Auto-capture semantics

We propose auto-capture semantics that are intuitive and that do not impact performances. These semantics are similar to those of arrow functions.

Auto-capturing multi-statement closures can access all variables in their declaring scope with the variable access syntax (e.g. $var):

$a = 1;
$b = 2;
$f = fn () {
    print $a + $b;
};
 
$f(); // prints "3"

Accessed variables are bound by value at the time of the function declaration:

$a = 1;
$f = fn () {
    print $a;
};
 
$f();     // prints "1"
$a = 2;
$f();     // prints "1" (again)

$a = 1;
$f = fn () {
    $a++;
};
 
print $a; // prints "1"
$f();
print $a; // prints "1" (again)

Because variables bound by value, the potential for “spooky action at a distance” is minimized. Captured scalar values changed inside a closure will not “leak” to other parts of the code. Objects captured inside a closure may have changes that propagate, depending on the object, but that is no different than objects used in any other function or object, and developers are used to being aware of that potential.

This is the behavior of long closures with explicit capture and of arrow functions.

For performance reasons, only the variables that are directly accessed with the variable access syntax in the closure are captured. This excludes dynamic means of accessing variables, such as the variable-variable syntax. This matches arrow functions.

Additionally, variables that are always assigned by the closure before being read are not captured, since this is not needed. This differs from arrow functions.

We can express these semantics more succinctly like this: Auto-capturing multi-statement closures capture at least all the variables that are directly accessed by the closure.

The “at least” part has only marginal effect aside from performance, and is not relevant for most programs. Whether a variable is captured or not may only be observed through reflection, or through object destructors (because capturing may impact the exact moment at which they are called).

Explicit capture

The proposed syntax supports explicit capture with the use keyword. Auto-capture and explicit capture can coexist in the same function declaration.

$c = 1;
fn () use ($a, &$b) {
    return $a + $b + $c; // $a, $b, and $c are captured
                         // $b is captured by reference
}

This allows auto-capturing multi-statement closures to match long closures in functionality. Without this, it could be necessary to switch back and forth between the auto-capturing syntax and the long closure syntax when capturing by reference is needed.

We expect that explicitly capturing by value will be rare in practice.

Implementation details

Auto-capturing all variables directly accessed by a closure body will commonly capture too many variables. In the following example, the variable $tmp would be captured although this is not necessary because it is always assigned before being read (remember that variable assignments do not have an effect outside of the closure).

fn() {
    $tmp = foo();
    bar($tmp);
    return $tmp;
}

This can lead to additional CPU or memory usage, as shown in the benchmarks linked later in this section.

The version 2.0 of this RFC, proposed here, avoids this by capturing only variables that may be read by the function before being assigned.

In practice, auto-capturing multi-statement closures end up capturing the same set of variables as long closure with explicit capture would have captured. This was verified on the PHPStan code base by converting all closures to auto-capturing multi-statement closures, and observing which variables was captured.

This optimization makes auto-capturing multi-statement closures as efficient as long closures with explicit capture.

In profiling, the implementation in the 1.0 version of this RFC showed a notable CPU and memory increase when using auto-capturing multi-statement closure in some cases. The 2.0 version, proposed here, has only marginal impact compared to PHP 8.1, well within the margin of error for profiling tools. In some cases the profiling run shows the auto-capture version being slightly more performant, which is likely just random test jitter between runs. We therefore conclude that the performance impact of this approach is effectively zero.

For more details, see: https://gist.github.com/arnaud-lb/d9adfbf786ce9e37c7157b0f9c8d8e13

Capture analysis, the process of chosing which variables to capture, is based on live-variable analysis. This reuses the Optimizer's existing implementation of live-variable analysis. We use this to conservatively find the variables for which a path exists in the function's code in which the variable may be read before being assigned. These variables are the minimum set we need to capture.

This retains the semantics described in the previous section, so an understanding of these semantics is enough to reason about auto-capturing multi-statement closures.

Changes to arrow functions

This RFC proposes to change the capture analysis of arrow functions to match the one described here, both for consistency and performance reasons.

There is a very rare case of breaking change, described in the breaking changes section.

Why add another function mechanism?

Long Closures in PHP can be quite verbose, even when they only perform a simple operation. This is due to a large amount of syntactic boilerplate that is needed in “long closures” to manually import used variables with the use keyword.

While one-line arrow functions solve this problem to some extent, there are ample cases that require a 2-3 statement body. That is still short enough that the chances of a developer confusing in-function and out-of-function variables is very remote, but the burden of manually closing over 3-4 variables is relatively high.

One example is when you are within a class method with multiple arguments and you want to simply return a closure that uses all the arguments, using the “use” keyword to list all the arguments is entirely redundant and pointless.

Then there are often use-cases with array_filter() and similar functions where the use() just adds visual noise to what the code actually means.

The trend in PHP in recent years has been toward more compact but still readable syntax that eliminates redundancy. Property promotion, arrow functions, the nullsafe operator, and similar recent well-received additions demonstrate this trend. This RFC seeks to continue that trend to make PHP more pleasant to write while still being just as clear to read.

Methods

As methods cannot be anonymous, there are no impacts on methods from this RFC.

What about long-closures?

The existing multi-line closure syntax remains valid, and there is no intent to deprecate it.

Multi-line expressions

There has been related discussion of multi-line expressions, specifically in the context of match() arms. We considered whether multi-line expressions made sense as an alternative approach, but decided against it as that introduces considerably more edge cases both syntactically and in the engine.

As a side benefit, the syntax proposed here does offer a somewhat round-about way to have a multi-line match() arm. This is not a deliberate feature of the RFC, but more of a convenient side-effect noted here for completeness.

$b = ...;
$c = ...;
$ret = match ($a) {
  1, 3, 5 => (fn() {
    $val = $a * $b;
    return $val * $c;
  })(),
  2, 4, 6 => (fn() {
    $val = $a + $b;
    return $val + $c;
  })(),
};

While sub-optimal, it may be sufficient for the few times that a multi-statement match() arm is needed.

Examples

Closures are often used to “wrap” some behavior in other behavior. One example provided by Mark Randall is for a throw-aware buffer. The following is actual code he wrote:

$x = function () use ($to, $library, $thread, $author, $title, $library_name, $top_post) {
// ...
};

From Mark: “That was just to get those variables inside a callback that could be invoked inside a throw-aware buffering helper.”

Another similar example is for wrapping behavior in a transaction. Often, that is done by passing a callable to an inTransaction() method or similar.

public function savePost($user, $date, $title, $body, $tags) {
  return $this->db->inTransaction(function() use ($user, $date, $title, $body, $tags) {
    $this->db->query(...);
    $this->db->query(...);
    return $this->db->lastInsertId();
  });
}

In this case, the used variable listing is entirely redundant and pointless, much the same as constructor property promotion eliminated entirely redundant boilerplate. (Though admittedly, the difference there was greater.)

Comparison to other languages

As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here.

Many languages tend to capture by variable. In practice this can lead to surprising effects, especially in loops.

Backward Incompatible Changes

Changing capture analysis in arrow functions may break existing code in very rare cases. Impacted functions are those that access a variable indirectly before assigning the same variable, like this:

$var = 'a';
fn () => $$var && $a = 1

Occurrences of this should be rare because assignments in arrow functions have no effect to the outer scope.

Proposed PHP Version(s)

PHP 8.2.

Open Issues

None.

Unaffected PHP Functionality

Existing function syntax continues to work precisely as it does now. Only new combinations are possible.

Future Scope

The proposal section detailed three additional possible combinations of function functionality that are not included here. While it is not likely that they have much use, the pattern here clearly lays out what they would be were a future RFC to try and implement them.

Specifically, they would be:

// Global scope
$c = 1;
 
fn foo($a, $b): int {
  $val = $a * $b;
  return $val * $c;
}
 
fn foo($a, $b): int => $a * $b * $c;
 
$foo = function($a, $b) use ($c): int => $a * $b * $c;

Those versions are not included in this RFC.

Proposed Voting Choices

This is a simple Yes/No vote, requiring 2/3 to pass.

Patches and Tests

Pull Request: https://github.com/php/php-src/pull/8330

Implementation

After the project is implemented, this section should contain

the version(s) it was merged into
a link to the git commit(s)
a link to the PHP manual entry for the feature
a link to the language specification section (if any)

References

PHP RFC: Short Functions

Changelog

2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact