rfc:auto-capture-closure

This is an old revision of the document!


PHP RFC: Auto-capturing multi-statement closures

  • Version: 2.0
  • Date: 2022-05-25
  • Author: Nuno Maduro (enunomaduro@gmail.com)
  • Author: Larry Garfield (larry@garfieldtech.com)
  • Author: Arnaud Le Blanc (arnaud.lb@gmail.com)
  • Status: In Discussion

Introduction

Closures (also known as lambdas or anonymous functions), have become increasingly powerful and useful in PHP in recent versions. In their current form they have two versions, long and short. Unfortunately, these two syntaxes have different, mutually-incompatible benefits. This RFC proposes a syntax for closures that combines the benefits of both for those situations where that is warranted.

// As of 8.1:
 
$y = 1;
 
$fn1 = fn($x) => $x + $y; // auto-capture + single expression
 
$fn2 = function ($x) use ($y): int { // manual-capture + statement list
   // ...
 
   return $x + $y;
};

The proposed syntax combines the auto-capture and multi-line capabilities into a single syntax:

$fn3 = fn ($x): int { // auto-capture + statement list
    // ...
 
    return $x + $y;
};

Why add another function mechanism?

Long Closures in PHP can be quite verbose, even when they only perform a simple operation. This is due to a large amount of syntactic boilerplate that is needed in “long closures” to manually import used variables with the use keyword.

While one-line arrow functions solve this problem to some extent, there are ample cases that require a 2-3 statement body. That is still short enough that the chances of a developer confusing in-function and out-of-function variables is very remote, but the burden of manually closing over 3-4 variables is relatively high.

One example is when you are within a class method with multiple arguments and you want to simply return a closure that uses all the arguments, using the “use” keyword to list all the arguments is entirely redundant and pointless.

Then there are often use-cases with array_filter() and similar functions where the use() just adds visual noise to what the code actually means.

The trend in PHP in recent years has been toward more compact but still readable syntax that eliminates redundancy. Property promotion, arrow functions, the nullsafe operator, and similar recent well-received additions demonstrate this trend. This RFC seeks to continue that trend to make PHP more pleasant to write while still being just as clear to read.

Examples

Closures are often used to “wrap” some behavior in other behavior. One example provided by Mark Randall is for a throw-aware buffer. The following is actual code he wrote:

$x = function () use ($to, $library, $thread, $author, $title, $library_name, $top_post) {
// ...
};

From Mark: “That was just to get those variables inside a callback that could be invoked inside a throw-aware buffering helper.”

Another similar example is for wrapping behavior in a transaction. Often, that is done by passing a callable to an inTransaction() method or similar.

public function savePost($user, $date, $title, $body, $tags) {
  return $this->db->inTransaction(function() use ($user, $date, $title, $body, $tags) {
    $this->db->query(...);
    $this->db->query(...);
    return $this->db->lastInsertId();
  });
}

In this case, the used variable listing is entirely redundant and pointless, much the same as constructor property promotion eliminated entirely redundant boilerplate. (Though admittedly, the difference there was greater.)

This code could, with this RFC, be simplified to the following, which is no less debuggable, more readable, and less typing to produce:

public function savePost($user, $date, $title, $body, $tags) {
  return $this->db->inTransaction(fn() {
    $this->db->query(...);
    $this->db->query(...);
    return $this->db->lastInsertId();
  });
}

As noted, inline callbacks may also need to capture multiple variables for only a short operation.

/** @var Product[] */
$arr = [ ... ];
 
$wantApproved = true;
$size = 'L'
 
$filtered = array_filter($arr, function ($item) use ($wantApproved, $size): bool {
  if ($wantApproved) {
    return $item->isApproved();
  } else if ($size) {
    return $item->size() == $size;
  } else {
    return false;
  }
});

In this case, again, the explicit use clause offers no clarity, only visual noise. This RFC allows it to be simplified to:

/** @var Product[] */
$arr = [ ... ];
 
$wantApproved = true;
$size = 'L'
 
$filtered = array_filter($arr, fn ($item): bool {
  if ($wantApproved) {
    return $item->isApproved();
  } else if ($size) {
    return $item->size() == $size;
  } else {
    return false;
  }
});

The majority of closures users fall into these type of categories.

Proposal

Auto-capture multi-statement closures

Specifically, this RFC adds the following syntax:

// An anonymous, locally available function.
// Variables are auto-captured lexically.
// The body is a statement list, with possibly a return statement;
$c = 1;
$foo = fn(int $a, int $b):int {
  $val = $a * $b;
  return $val * $c;
};

The syntax choice here leads to the following consistent syntactic meanings:

  • The => symbol always means “evaluates to the expression on the right,” in all circumstances. (Named functions, anonymous functions, arrays, and match().)
  • { ... } denotes a statement list, potentially ending in a return.
  • The function keyword indicates a function that has no auto-capture.
  • The fn keyword indicates a function that will auto-capture variables, by value.
  • A function with a name is declared globally at compile time. A function without a name is declared locally as a closure at runtime.

These rules are easily recognizable and learnable by developers.

The use keyword may still be used with auto-capturing closures if desired, to support capturing by reference or to capture variables to use in a variable-variable expression.

$c = 1;
$foo = fn($a, $b) use (&$c):int {
  $val = $a * $b;
  return $val * $c;
};

In practice, we anticipate the use keyword to be rarely used.

Explicit capture

The proposed syntax supports explicit capture with the use keyword. Auto-capture and explicit capture can coexist in the same function declaration.

$c = 1;
fn () use ($a, &$b) {
    return $a + $b + $c; // $a is explicitly captured by value
                         // $b is explicitly captured by reference
                         // $c is auto-captured by value
}

This allows auto-capturing multi-statement closures to match long closures in functionality. Without this, it could be necessary to switch back and forth between the auto-capturing syntax and the long closure syntax when capturing by reference is needed.

We expect that explicitly capturing by value will be rare in practice.

Auto-capture semantics

The auto-capture semantics presented here are designed to be intuitive and have negligible performance impact.

Auto-capturing multi-statement closures can access all variables in their declaring scope with the variable access syntax (e.g. $var):

$a = 1;
$b = 2;
$f = fn () {
    print $a + $b;
};
 
$f(); // prints "3"

Accessed variables are bound by value at the time of the function declaration:

$a = 1;
$f = fn () {
    print $a;
};
 
$f();     // prints "1"
$a = 2;
$f();     // prints "1" (again)
$a = 1;
$f = fn () {
    $a++;
};
 
print $a; // prints "1"
$f();
print $a; // prints "1" (again)

Because variables are bound by value, the potential for “spooky action at a distance” is minimized. Captured scalar values changed inside a closure will not “leak” to other parts of the code. Objects captured inside a closure may have changes that propagate, depending on the object, but that is no different than objects used in any other function or object, and developers are used to being aware of that potential.

This is the behavior of long closures with explicit capture and of arrow functions.

For performance reasons, only the variables that are directly accessed with the variable access syntax in the closure are auto-captured. This excludes dynamic means of accessing variables, such as the variable-variable syntax. This matches arrow functions.

Additionally, variables that are always assigned by the closure before being read are not captured, since this is not needed. This differs from arrow functions (which rarely assign to a value, so that situation does not come up).

We can express these semantics more succinctly like this: Auto-capturing multi-statement closures capture at least all the variables that are directly accessed by the closure.

The “at least” part has only marginal effect aside from performance, and is not relevant for most programs. Whether a variable is captured or not may only be observed through reflection, or through object destructors (because capturing may impact the exact moment at which they are called).

Implementation details

Auto-capturing all variables directly accessed by a closure body will commonly capture too many variables. In the following example, the variable $tmp would be captured although this is not necessary because it is always assigned before being read (remember that variable assignments do not have an effect outside of the closure).

$tmp = 5;
fn() {
    $tmp = foo();
    bar($tmp);
    return $tmp;
}

A naive capture mechanism would unnecessarily capture $tmp, resulting in wasted memory usage.

Capture analysis, the process of choosing which variables to capture, is based on live-variable analysis. This reuses the Optimizer's existing implementation of live-variable analysis. We use this to conservatively find the variables for which a path exists in the function's code in which the variable may be read before being assigned. These variables are the minimum set we need to capture.

In practice, auto-capturing multi-statement closures end up capturing the same set of variables as long closure with explicit capture would have captured. This was verified on the PHPStan code base by converting all closures to auto-capturing multi-statement closures, and observing which variables was captured.

This retains the semantics described in the previous section, so an understanding of these semantics is enough to reason about auto-capturing multi-statement closures.

Benchmarks

In benchmarks, the implementation in the 1.0 version of this RFC showed a notable CPU and memory increase when using auto-capturing multi-statement closure in some cases.

The 2.0 version, proposed here, has only marginal impact compared to PHP 8.1, well within the margin of error for profiling tools. In some cases the profiling run shows the auto-capture version being slightly more performant, which is likely just random test jitter between runs. We therefore conclude that the performance impact of this approach is effectively zero.

The capture analysis approach described above makes auto-capturing multi-statement closures as efficient as long closures with explicit capture.

For more benchmark details, see: https://gist.github.com/arnaud-lb/d9adfbf786ce9e37c7157b0f9c8d8e13

Methods

As methods cannot be anonymous, there are no impacts on methods from this RFC.

What about long-closures?

The existing all-explicit multi-line closure syntax remains valid, and there is no intent to deprecate it.

Multi-line expressions

There has been related discussion of multi-line expressions, specifically in the context of match() arms. We considered whether multi-line expressions made sense as an alternative approach, but decided against it as that introduces considerably more edge cases both syntactically and in the engine.

As a side benefit, the syntax proposed here does offer a somewhat round-about way to have a multi-line match() arm. This is not a deliberate feature of the RFC, but more of a convenient side-effect noted here for completeness.

$b = ...;
$c = ...;
$ret = match ($a) {
  1, 3, 5 => (fn() {
    $val = $a * $b;
    return $val * $c;
  })(),
  2, 4, 6 => (fn() {
    $val = $a + $b;
    return $val + $c;
  })(),
};

While sub-optimal, it may be sufficient for the few times that a multi-statement match() arm is needed.

Comparison to other languages

As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here.

Many languages tend to capture by reference. In practice this can lead to surprising effects, especially in loops. For that reason, PHP defaults to capturing by value, which avoids this problem.

Counter-arguments

In the past, there has been reticence about auto-capture that has kept it out of previous evolutions in closures. Mostly that has boiled down to two concerns: Performance and debugability.

As noted above in the benchmarks section, the implementation offered here has effectively no performance impact either way.

In the majority of cases where closures are used in practice, the code involved is short enough that debugging is not hampered by implicit capture. They are usually only a few lines long, easily small enough to fit into a developer's short term memory while reading it. What variables are captured is visually self-evident. In many cases, such as in the examples above, an entire named function/method body is just wrapping context around a closure, such that repeating the named function's arguments in a use block is simply redundant.

Potential confusing behavior is further mitigated by PHP's (correct) use of by-value capture, which minimizes the potential for inadvertent bizarre changes to values from closures.

Furthermore, as noted PHP is unusual in requiring explicit capture. The only other language that does so is C++. Most languages get along fine without that extra step.

For those few cases in which, for whatever reason, the developer is concerned about auto-capture reducing debugability or about accidental capture, the existing explicit-only syntax remains valid and unchanged.

Backward Incompatible Changes

None.

Proposed PHP Version(s)

PHP 8.2.

Open Issues

None.

Unaffected PHP Functionality

Existing function syntaxes continues to work precisely as they do now. Only new combinations are possible.

Proposed Voting Choices

This is a simple Yes/No vote, requiring 2/3 to pass.

Patches and Tests

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Changelog

2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact

rfc/auto-capture-closure.1653664309.txt.gz · Last modified: 2022/05/27 15:11 by crell