rfc:auto-capture-closure

This is an old revision of the document!


PHP RFC: Auto-capturing multi-statement closures

  • Version: 2.0
  • Date: 2022-05-25
  • Author: Nuno Maduro (enunomaduro@gmail.com), Larry Garfield (larry@garfieldtech.com), Arnaud Le Blanc (arnaud.lb@gmail.com)
  • Status: In Discussion

Introduction

Closures (also known as lambdas or anonymous functions), have become increasingly powerful and useful in PHP in recent versions. In their current form they have two versions, long and short. Unfortunately, these two syntaxes have different, mutually-incompatible benefits. This RFC proposes a syntax for closures that combines the benefits of both for those situations where that is warranted.

// As of 8.1:
 
$y = 1;
 
$fn1 = fn($x) => $x + $y; // auto-capture + single expression
 
$fn2 = function ($x) use ($y): int { // manual-capture + statement list
   // ...
 
   return $x + $y;
};

The proposed syntax combines the auto-capture and multi-line capabilities into a single syntax:

$fn3 = fn ($x): int { // auto-capture + statement list
    // ...
 
    return $x + $y;
};

Proposal

Background

As of PHP 8.1, the following syntaxes around functions have the following meaning:

// A named, globally available function.
// No variables are auto-captured from the environment.
// The body is a statement list, with possibly a return statement.
function foo($a, $b, $c): int {
  return $a * $b * $c;
}
 
// An anonymous, locally available function.
// Variables are explicitly captured lexically. 
// The body is a statement list, with possibly a return statement.
$foo = function ($a, $b) use ($c) {
  return $a * $b * $c;
};
 
// An anonymous, locally available function.
// Variables are auto-captured lexically.
// The body is a single-expression, whose value is returned.
$foo = fn($a, $b): int => $a * $b * $c;

That is, a function may be named or local/anonymous, auto-capture or not, and a statement list or single expression. That means there are 8 possible combinations of properties, of which only three are currently supported.

The declined Short Functions RFC sought to add one additional combination: named, no-capture, single-expression.

This RFC seeks to add a different combination: anonymous, auto-capture, statement list.

The remaining combinations would be:

  • named function, auto-capture, statement list - This is of little use in practice as there is nothing to auto-capture, except potentially global variables.
  • named function, auto-capture, expression - Ibid.
  • anonymous function, manual-capture, expression - While this form would be possible to add, its use cases are limited. The existing short-closure syntax is superior in nearly all cases.

None of these additional variants are included in this RFC.

Auto-capture multi-statement closures

Specifically, this RFC adds the following syntax:

// An anonymous, locally available function.
// Variables are auto-captured lexically.
// The body is a statement list, with possibly a return statement;
$c = 1;
$foo = fn($a, $b):int {
  $val = $a * $b;
  return $val * $c;
};

The syntax choice here leads to the following consistent syntactic meanings:

  • The => symbol always means “evaluates to the expression on the right,” in all circumstances. (Named functions, anonymous functions, arrays, and match().)
  • { ... } denotes a statement list, potentially ending in a return.
  • The function keyword indicates a function that has no auto-capture.
  • The fn keyword indicates a function that will auto-capture variables, by value.
  • A function with a name is declared globally at compile time. A function without a name is declared locally as a closure at runtime.

These rules are easily recognizable and learnable by developers.

Variable capture

In the current implementation, variables defined within an auto-capturing long closure that are also defined externally to the closure will be captured by value. Variables outside the closure that are not referenced inside the closure will not be captured and thus have no CPU or memory implication.

In profiling, the implementation in the 1.0 version of this RFC showed a notable CPU and memory increase when using auto-capture long closures. The 2.0 version, proposed here, has only marginal impact, well within the margin of error for profiling tools. In some cases the profiling run shows the auto-capture version being slightly more performant, which is likely just random test jitter between runs. We therefore conclude that the performance impact of this approach is effectively zero.

For more details, see: https://gist.github.com/arnaud-lb/d9adfbf786ce9e37c7157b0f9c8d8e13

Because variables are always auto-captured by value, the potential for “spooky action at a distance” is minimized. Captured scalar values changed inside a closure will not “leak” to other parts of the code. Objects captured inside a closure may have changes that propagate, depending on the object, but that is no different than objects used in any other function or object, and developers are used to being aware of that potential.

Why add another function mechanism?

Long Closures in PHP can be quite verbose, even when they only perform a simple operation. This is due to a large amount of syntactic boilerplate that is needed in “long closures” to manually import used variables with the use keyword.

While one-line arrow functions solve this problem to some extent, there are ample cases that require a 2-3 statement body. That is still short enough that the chances of a developer confusing in-function and out-of-function variables is very remote, but the burden of manually closing over 3-4 variables is relatively high.

One example is when you are within a class method with multiple arguments and you want to simply return a closure that uses all the arguments, using the “use” keyword to list all the arguments is entirely redundant and pointless.

Then there are often use-cases with array_filter() and similar functions where the use() just adds visual noise to what the code actually means.

The trend in PHP in recent years has been toward more compact but still readable syntax that eliminates redundancy. Property promotion, arrow functions, the nullsafe operator, and similar recent well-received additions demonstrate this trend. This RFC seeks to continue that trend to make PHP more pleasant to write while still being just as clear to read.

Methods

As methods cannot be anonymous, there are no impacts on methods from this RFC.

What about long-closures?

The existing multi-line closure syntax remains valid, and there is no intent to deprecate it. It is likely to become less common in practice, but it still has two use cases where it will be necessary:

  • When it is desirable to capture variables explicitly, such as to avoid name collision.
  • When it is desirable to capture a variable by reference. Such use case are rare but do exist.
// This remains the only way to capture by reference.
$c = 1;
$f = function($a, $b) use (&$c) {
  $c = $a * $b;
};

Multi-line expressions

There has been related discussion of multi-line expressions, specifically in the context of match() arms. We considered whether multi-line expressions made sense as an alternative approach, but decided against it as that introduces considerably more edge cases both syntactically and in the engine.

As a side benefit, the syntax proposed here does offer a somewhat round-about way to have a multi-line match() arm. This is not a deliberate feature of the RFC, but more of a convenient side-effect noted here for completeness.

$b = ...;
$c = ...;
$ret = match ($a) {
  1, 3, 5 => (fn() {
    $val = $a * $b;
    return $val * $c;
  })(),
  2, 4, 6 => (fn() {
    $val = $a + $b;
    return $val + $c;
  })(),
};

While sub-optimal, it may be sufficient for the few times that a multi-statement match() arm is needed.

Examples

Closures are often used to “wrap” some behavior in other behavior. One example provided by Mark Randall is for a throw-aware buffer. The following is actual code he wrote:

$x = function () use ($to, $library, $thread, $author, $title, $library_name, $top_post) {
// ...
};

From Mark: “That was just to get those variables inside a callback that could be invoked inside a throw-aware buffering helper.”

Another similar example is for wrapping behavior in a transaction. Often, that is done by passing a callable to an inTransaction() method or similar.

public function savePost($user, $date, $title, $body, $tags) {
  return $this->db->inTransaction(function() use ($user, $date, $title, $body, $tags) {
    $this->db->query(...);
    $this->db->query(...);
    return $this->db->lastInsertId();
  });
}

In this case, the used variable listing is entirely redundant and pointless, much the same as constructor property promotion eliminated entirely redundant boilerplate. (Though admittedly, the difference there was greater.)

Comparison to other languages

As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here.

Backward Incompatible Changes

None.

Proposed PHP Version(s)

PHP 8.2.

Open Issues

None.

Unaffected PHP Functionality

Existing function syntax continues to work precisely as it does now. Only new combinations are possible.

Future Scope

The proposal section detailed three additional possible combinations of function functionality that are not included here. While it is not likely that they have much use, the pattern here clearly lays out what they would be were a future RFC to try and implement them.

Specifically, they would be:

// Global scope
$c = 1;
 
fn foo($a, $b): int {
  $val = $a * $b;
  return $val * $c;
}
 
fn foo($a, $b): int => $a * $b * $c;
 
$foo = function($a, $b) use ($c): int => $a * $b * $c;

Those versions are not included in this RFC.

Proposed Voting Choices

This is a simple Yes/No vote, requiring 2/3 to pass.

Patches and Tests

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Changelog

2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact

rfc/auto-capture-closure.1653502134.txt.gz · Last modified: 2022/05/25 18:08 by crell