rfc:auto-capture-closure

PHP RFC: Auto-capturing multi-statement closures

Introduction

Closures (also known as lambdas or anonymous functions), have become increasingly powerful and useful in PHP in recent versions. In their current form they have two versions, long and short. Unfortunately, these two syntaxes have different, mutually-incompatible benefits. This RFC proposes a syntax for closures that combines the benefits of both for those situations where that is warranted.

// As of 8.0:
 
$y = 1;
 
$fn1 = fn($x) => $x + $y; // auto-capture + single expression
 
$fn2 = function ($x) use ($y): int { // manual-capture + statement list
   // ...
 
   return $x + $y;
};

The proposed syntax combines the auto-capture and multi-line capabilities into a single syntax:

$fn3 = fn ($x): int { // auto-capture + statement list
    // ...
 
    return $x + $y;
};

This RFC has also been designed in concert with the Short Functions RFC, such that the syntax choices of both RFCs are mutually complementary in a logical, predictable fashion and described below.

Proposal

Background

As of PHP 8.0, the following syntax around functions has the following meaning:

// A named, globally available function.
// No variables are auto-captured from the environment.
// The body is a statement list, with possibly a return statement.
function foo($a, $b, $c): int {
  return $a * $b * $c;
}
 
// An anonymous, locally available function.
// Variables are explicitly captured lexically. 
// The body is a statement list, with possibly a return statement.
$foo = function ($a, $b) use ($c) {
  return $a * $b * $c;
};
 
// An anonymous, locally available function.
// Variables are auto-captured lexically.
// The body is a single-expression, whose value is returned.
$foo = fn($a, $b): int => $a * $b * $c;

That is, a function may be named or local/anonymous, auto-capture or not, and a statement list or single expression. That means there are 8 possible combinations of properties, of which only three are currently supported.

The Short Functions RFC seeks to add one additional combination: named, no-capture, single-expression.

This RFC seeks to add a different combination: anonymous, auto-capture, statement list.

The remaining combinations would be:

  • named function, auto-capture, statement list - This is of little use in practice as there is nothing to auto-capture, except potentially global variables.
  • named function, auto-capture, expression - Ibid.
  • anonymous function, manual-capture, expression - While this form would be possible to add, its use cases are limited. The existing short-closure syntax is superior in nearly all cases.

Auto-capture multi-statement closures

Specifically, this RFC adds the following syntax:

// An anonymous, locally available function.
// Variables are auto-captured lexically.
// The body is a statement list, with possibly a return statement;
$c = 1;
$foo = fn($a, $b):int {
  $val = $a * $b;
  return $val * $c;
};

The syntax choice here, in combination with the short-functions RFC, leads to the following consistent syntactic meanings:

  • The => sigil always means “evaluates to the expression on the right,” in all circumstances. (Named functions, anonymous functions, arrays, and match().)
  • { ... } denotes a statement list, potentially ending in a return.
  • The function keyword indicates a function that has no auto-capture.
  • The fn keyword indicates a function that will auto-capture variables, by value.
  • A function with a name is declared globally at compile time. A function without a name is declared locally as a closure at runtime.

These rules are easily recognizable and learnable by developers.

Methods

As methods cannot be anonymous, there are no impacts on methods from this RFC. The short-functions RFC does address methods, and does so in a way that is completely consistent with the syntactic rules defined above.

What about long-closures?

The existing multi-line closure syntax remains valid, and there is no intent to deprecate it. It is likely to become less common in practice, but it still has two use cases where it will be necessary:

  • When it is desirable to capture variables explicitly, such as to avoid name collision.
  • When it is desirable to capture a variable by reference. Such use case are rare but do exist.
// This remains the only way to capture by reference.
$c = 1;
$f = function($a, $b) use (&$c) {
  $c = $a * $b;
};

Multi-line expressions

There has been related discussion of multi-line expressions, specifically in the context of match() arms. We considered whether multi-line expressions made sense as an alternative approach, but decided against it as that introduces considerably more edge cases both syntactically and in the engine.

As a side benefit, the syntax proposed here does offer a somewhat round-about way to have a multi-line match() arm. This is not a deliberate feature of the RFC, but more of a convenient side-effect.

$b = ...;
$c = ...;
$ret = match ($a) {
  1, 3, 5 => (fn() {
    $val = $a * $b;
    return $val * $c;
  })(),
  2, 4, 6 => (fn() {
    $val = $a + $b;
    return $val + $c;
  })(),
};

While sub-optimal, it may be sufficient for the few times that a multi-statement match() arm is needed.

Examples

Closures are often used to “wrap” some behavior in other behavior. One example provided by Mark Randall is for a throw-aware buffer. The following is actual code he wrote:

$x = function () use ($to, $library, $thread, $author, $title, $library_name, $top_post) {
// ...
};

From Mark: “That was just to get those variables inside a callback that could be invoked inside a throw-aware buffering helper.”

Another similar example is for wrapping behavior in a transaction. Often, that is done by passing a callable to an inTransaction() method or similar.

public function savePost($user, $date, $title, $body, $tags) {
  return $this->db->inTransaction(function() use ($user, $date, $title, $body, $tags) {
    $this->db->query(...);
    $this->db->query(...);
    return $this->db->lastInsertId();
  });
}

In this case, the used variable listing is entirely redundant and pointless, much the same as constructor property promotion eliminated entirely redundant boilerplate. (Though admittedly, the difference there was much greater.)

Comparison to other languages

As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here.

Backward Incompatible Changes

None.

Proposed PHP Version(s)

PHP 8.1.

Open Issues

None.

Unaffected PHP Functionality

Existing function syntax continues to work precisely as it does now. Only new combinations are possible.

Future Scope

The proposal section detailed three additional possible combinations of function functionality that are not included here. While it is not likely that they have much use, the pattern here clearly lays out what they would be were a future RFC to try and implement them.

Specifically, they would be:

// Global scope
$c = 1;
 
fn foo($a, $b): int {
  $val = $a * $b;
  return $val * $c;
}
 
fn foo($a, $b): int => $a * $b * $c;
 
$foo = function($a, $b) use ($c): int => $a * $b * $c;

Those versions are not included in this RFC.

Proposed Voting Choices

This is a simple Yes/No vote, requiring 2/3 to pass.

Patches and Tests

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/auto-capture-closure.txt · Last modified: 2021/04/16 11:19 by seld