rfc:auto-capture-closure
Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
rfc:auto-capture-closure [2021/04/16 11:19] seld Fix typo |
rfc:auto-capture-closure [2022/07/02 13:12] (current) imsop update "Vote" heading |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: Auto-capturing multi-statement closures | + | ====== PHP RFC: Short Closures 2.0 ====== |
- | * Version: | + | * Version: |
- | * Date: 2021-03-22 | + | * Date: 2022-05-25 |
- | * Author: Nuno Maduro (enunomaduro@gmail.com), Larry Garfield (larry@garfieldtech.com) | + | * Author: Nuno Maduro (enunomaduro@gmail.com) |
- | * Status: In Discussion | + | * Author: |
+ | * Author: Arnaud Le Blanc (arnaud.lb@gmail.com) | ||
+ | * Status: In Voting | ||
* First Published at: http:// | * First Published at: http:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Closures (also known as lambdas or anonymous | + | Anonymous |
+ | |||
+ | [[rfc: | ||
+ | |||
+ | This RFC proposes | ||
<code php> | <code php> | ||
- | // As of 8.0: | + | $guests = array_filter($users, |
+ | $guest = $repository-> | ||
+ | return $guest !== null && in_array($guest-> | ||
+ | }); | ||
+ | </code> | ||
+ | ===== Proposal ===== | ||
+ | |||
+ | Short Closures extend Arrow Functions by allowing multiple statements enclosed in '' | ||
+ | |||
+ | <code php> | ||
+ | fn (parameter_list) { | ||
+ | statement_list; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | |||
+ | The syntax and behavior otherwise match those of Arrow Functions. | ||
+ | |||
+ | ==== Auto capture by-value ==== | ||
+ | |||
+ | Like Arrow Functions, Short Closures use auto capture by-value. When a variable used in the Short Closure is defined in the parent scope it will be automatically captured by-value. In the following example the functions $fn1, $fn2, and $fn3 behave the same: | ||
+ | |||
+ | <code php> | ||
$y = 1; | $y = 1; | ||
- | $fn1 = fn($x) => $x + $y; // auto-capture + single expression | + | $fn1 = fn ($x) => $x + $y; |
- | $fn2 = function | + | $fn2 = fn ($x) { |
- | // ... | + | return $x + $y; |
+ | }; | ||
- | | + | $fn3 = function ($x) use ($y) { |
+ | | ||
}; | }; | ||
</ | </ | ||
- | The proposed | + | ==== No explicit capture ==== |
+ | |||
+ | Explicit capture is not included in the new syntax. It remains available only via the existing long-closure syntax, which only captures explicitly. | ||
+ | |||
+ | ==== Syntax ==== | ||
+ | |||
+ | The signature accepts the same syntax | ||
<code php> | <code php> | ||
- | $fn3 = fn ($x): int { // auto-capture + statement list | + | fn () { } |
- | // ... | + | fn ($a, $b) { } |
+ | fn ($a, ...$args) { } // Variadic parameter | ||
+ | fn (int $a): string | ||
+ | fn ($a = 42) { } // Parameter default value | ||
+ | fn &($a) { } // Return by-reference | ||
+ | fn (&$a) { } | ||
+ | </ | ||
- | | + | The signature must be followed by '' |
- | }; | + | |
+ | <code php> | ||
+ | fn () { return | ||
+ | fn () { print 1; } | ||
+ | fn () { | ||
+ | | ||
+ | | ||
+ | } | ||
</ | </ | ||
- | This RFC has also been designed in concert | + | Note that Short Closures |
- | ===== Proposal ===== | + | The syntax choice here is consistent with other language constructs: |
- | ==== Background ==== | + | * '' |
+ | * Conversely, the '' | ||
+ | * The '' | ||
+ | * The '' | ||
- | As of PHP 8.0, the following syntax around functions has the following | + | These rules are easily recognizable and learnable by developers. |
+ | |||
+ | ===== Why extend Arrow Functions? ===== | ||
+ | |||
+ | Arrow Functions were added as an alternative to Anonymous Functions. The latter can be quite verbose, even when they only perform a simple operation. This is due to a large amount | ||
+ | |||
+ | While Arrow Functions solve this problem to some extent, the one-expression limit can lead to one-liners with non ideal readability, | ||
+ | |||
+ | As an example, writing | ||
<code php> | <code php> | ||
+ | $guests = array_filter($users, | ||
+ | $guest = $repository-> | ||
+ | return $guest !== null && in_array($guest-> | ||
+ | }); | ||
+ | </ | ||
- | // A named, globally available function. | + | ===== Discussion on auto-capture ===== |
- | // No variables are auto-captured | + | |
- | // The body is a statement list, with possibly | + | Auto capture was first introduced by Arrow Functions. |
- | function foo($a, $b, $c): int { | + | |
- | | + | In the past, there had been reticence about auto-capture that has kept it out of evolutions in closures. |
+ | |||
+ | Implementation difficulties arise from by-reference or by-variable semantics, especially when supporting dynamic means of accessing | ||
+ | |||
+ | As noted in the benchmarks section, the implementation offered here has effectively no performance impact either way. | ||
+ | |||
+ | In the majority of cases where closures are used in practice, the code involved | ||
+ | |||
+ | Potential confusing behavior is further mitigated by PHP' | ||
+ | |||
+ | Furthermore, | ||
+ | |||
+ | For those few cases in which, for whatever reason, the developer is concerned about auto-capture reducing debugability or about accidental capture, the existing explicit-only syntax remains valid and unchanged. | ||
+ | |||
+ | ==== Using variables from the parent block ==== | ||
+ | |||
+ | Using variables from the parent block is not unusual in PHP. We do it all the time in loops. | ||
+ | |||
+ | In the following example, the loop uses three variables from the parent block. We have learned to recognize that what follows | ||
+ | |||
+ | <code php> | ||
+ | $guests = []; | ||
+ | foreach ($users as $user) { | ||
+ | $guest = $repository-> | ||
+ | if ($guest !== null && in_array($guest-> | ||
+ | $guests[] = $guest; | ||
+ | } | ||
} | } | ||
+ | </ | ||
- | // An anonymous, locally available | + | In the following example, the function |
- | // Variables are explicitly captured lexically. | + | |
- | // The body is a statement list, with possibly | + | <code php> |
- | $foo = function | + | $guests |
- | return $a * $b * $c; | + | $guest = $repository-> |
+ | return | ||
+ | }); | ||
+ | </ | ||
+ | |||
+ | However the comparison stops here. These two examples do not behave equally with regard to side effects: Variable assignments to the '' | ||
+ | |||
+ | ==== Capture is by-value, no unintended side-effects ==== | ||
+ | |||
+ | It is important to note that the default capture mode in Anonymous Functions, Arrow Functions, and Short Closures is by-value. This purposefully differs from the semantics commonly found in other programming languages. | ||
+ | |||
+ | A by-value capture means that it is not possible to modify any variables from the outer scope: | ||
+ | |||
+ | <code php> | ||
+ | $a = 1; | ||
+ | $f = fn () { | ||
+ | $a++; // Has no effect outside of the function | ||
+ | $tmp = $a + 1; // Has no effect outside of the function | ||
+ | return | ||
}; | }; | ||
- | // An anonymous, locally available function. | + | print $a; // prints " |
- | // Variables are auto-captured lexically. | + | $f(); |
- | // The body is a single-expression, | + | print $a; // prints " |
- | $foo = fn($a, $b): int => $a * $b * $c; | + | |
</ | </ | ||
- | That is, a function | + | Conversely, the outer scope cannot modify variables in the function: |
- | The [[rfc: | + | <code php> |
+ | $a = 1; | ||
+ | $f = fn () { | ||
+ | print $a; | ||
+ | }; | ||
- | This RFC seeks to add a different combination: | + | $f(); // prints " |
+ | $a = 2; | ||
+ | $f(); // prints " | ||
+ | </ | ||
- | The remaining combinations would be: | + | Because variables are bound by-value, the confusing behaviors often associated with closures do not exist. As an example, the following code snippet demonstrates such a behavior in JavaScript: |
- | * named function, auto-capture, | + | <code javascript> |
- | * named function, auto-capture, expression - Ibid. | + | // JavaScript |
- | * anonymous function, manual-capture, | + | var fns = []; |
+ | for (var i = 0; i < 3; i++) { | ||
+ | fns.push(function() { | ||
+ | console.log(i); | ||
+ | }); | ||
+ | } | ||
+ | for (var k in fns) { | ||
+ | var fn = fns[k]; | ||
+ | fn(); // Prints " | ||
+ | } | ||
+ | </ | ||
- | ==== Auto-capture multi-statement closures ==== | + | In PHP the behavior is intuitive and less confusing: |
- | Specifically, this RFC adds the following syntax: | + | <code php> |
+ | // PHP | ||
+ | $fns = []; | ||
+ | for ($i = 0; $i < 3; $i++) { | ||
+ | $fns[] = fn () { | ||
+ | print $i; | ||
+ | }; | ||
+ | } | ||
+ | foreach ($fns as $fn) { | ||
+ | $fn(); // Prints " | ||
+ | } | ||
+ | </ | ||
+ | In JavaScript the same output can be obtained by declaring '' | ||
+ | |||
+ | In PHP, the variable is captured by-value, thus entirely avoiding the confusion. | ||
+ | |||
+ | Of course, functions can have side-effects when accessing mutable values such as objects or resources. The following example demonstrates this: | ||
<code php> | <code php> | ||
- | // An anonymous, locally available function. | + | $d = new DateTime(); |
- | // Variables are auto-captured lexically. | + | |
- | // The body is a statement list, with possibly a return statement; | + | $fn1 = fn () { |
- | $c = 1; | + | $d-> |
- | $foo = fn($a, $b):int { | + | }; |
- | $val = $a * $b; | + | |
- | | + | $fn2 = function () use ($d) { |
+ | $d-> | ||
+ | }; | ||
+ | |||
+ | $fn3 = function (DateTime | ||
+ | $d-> | ||
}; | }; | ||
</ | </ | ||
- | The syntax choice here, in combination with the short-functions RFC, leads to the following consistent syntactic meanings: | + | ===== Auto-capture semantics ===== |
- | * The '' | + | The RFC inherits |
- | * '' | + | |
- | * The '' | + | |
- | * The '' | + | |
- | * A function with a name is declared globally at compile time. A function without a name is declared locally | + | |
- | These rules are easily recognizable and learnable | + | > Short Closures can access a snapshot of the variable bindings of their declaring scope by accessing variables literally. The snapshot is taken when the function is declared. Assignments to variables do not have an effect on the declaring scope. |
- | ==== Methods ==== | + | This can also be stated as follows: |
- | As methods cannot be anonymous, there are no impacts on methods from this RFC. The short-functions RFC does address methods, and does so in a way that is completely consistent with the syntactic rules defined above. | + | > Short Closures can read variables of their declaring scope by accessing variables literally. The values of these variables are the ones that were bound to them at function declaration. Assignments to variables do not have an effect on the declaring scope. |
- | ==== What about long-closures? | + | This is implemented by binding the value of the declaring scope variables to local variables in the function. This is referred to as //capture// in this RFC. |
- | The existing multi-line closure syntax remains valid, and there is no intent to deprecate it. It is likely to become less common in practice, but it still has two use cases where it will be necessary: | + | This RFC leaves unspecified which variables are captured, as long as these semantics are maintained. |
- | * When it is desirable to capture variables | + | ==== Optimization ==== |
- | * When it is desirable to capture a variable | + | |
+ | A naive approach would capture | ||
<code php> | <code php> | ||
- | // This remains the only way to capture by reference. | + | $tmp = 5; |
- | $c = 1; | + | fn () { |
- | $f = function($a, $b) use (&$c) { | + | |
- | $c = $a * $b; | + | bar($tmp); |
- | }; | + | |
+ | } | ||
</ | </ | ||
- | ==== Multi-line expressions ==== | + | This approach would result in a waste of memory or CPU usage. |
+ | |||
+ | The implementation proposed in this RFC prevents this by attempting to capture the smallest possible set of variables necessary to maintain these semantics. In practice, Short Closures end up capturing the same set of variables that Anonymous Functions with a manually curated capture list would have captured. This was observed on the PHPStan code base by converting all Anonymous Functions to Short Closures, and looking at which variables were automatically captured after that. | ||
+ | |||
+ | These implementation details are irrelevant for most purposes, as they do not have an effect on the behavior of the program, apart from the marginal cases listed in the next subsection. However, the exact behavior can be defined as follows: | ||
+ | |||
+ | * If there is a possibility that a variable may be read by the function before binding it, it is captured | ||
+ | * When inspecting the code, the following operations are assumed to always bind a variable without reading it: | ||
+ | * Variable assignments | ||
+ | * Variable assignments by reference | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * This excludes assignments to object properties (they never bind the variable), assignments to array dimensions (they read the variable) | ||
+ | * In all other situations in which a variable is used, it is assumed that it is read | ||
+ | |||
+ | This optimization is not applied to Arrow Functions because variable bindings are unusual in these functions. | ||
+ | |||
+ | ==== Observable effects of capture ==== | ||
+ | |||
+ | As long as the semantics are maintained, whether a variable is captured or not is largely irrelevant for most purposes, and can be observed only in marginal cases. These cases are listed here. | ||
+ | |||
+ | * When debugging: Whether a variable is captured or not may be visible in the list of variables in scope in debuggers. Captured variables are local variables in the Closure, initialized to the captured value.\\ \\ | ||
+ | * Via reflection: Captured variables will be visible in ReflectionFunction.\\ \\ | ||
+ | * Via dynamic variable access: Means to access variables dynamically, | ||
+ | * Via destructors: | ||
+ | * Via resource usage: Capturing too much could increase memory or CPU usage. The optimized capture used in this RFC prevents this. It ends up capturing the same variables that would have been captured by a manually curated '' | ||
+ | |||
+ | ==== Implementation details ==== | ||
+ | |||
+ | The capture analysis used in this RFC will only capture the variables that may be read before being assigned by the function. This uses the Optimizer' | ||
+ | |||
+ | This maintains the semantics described earlier, so an understanding of these semantics is enough to reason about Short Closures. | ||
+ | |||
+ | ===== Benchmarks ===== | ||
+ | |||
+ | In benchmarks, the implementation in the 1.0 version of this RFC showed a notable CPU and memory increase when using auto-capturing multi-statement closure in some cases. | ||
+ | |||
+ | The 2.0 version, proposed here, has only marginal impact compared to PHP 8.1, well within the margin of error for profiling tools. In some cases the profiling run shows the Short Closure version being slightly more performant, which is likely just random test jitter between runs. We therefore conclude that the performance impact of this approach is effectively zero. | ||
+ | |||
+ | The capture analysis approach described above makes Short Closures as efficient as Anonymous Functions. | ||
+ | |||
+ | For more benchmark details, see: https:// | ||
+ | |||
+ | ===== What about Anonymous Functions? ===== | ||
+ | |||
+ | The existing Anonymous Function syntax remains valid, and there is no intent to deprecate it. | ||
+ | |||
+ | ===== Multi-line expressions | ||
There has been related discussion of multi-line expressions, | There has been related discussion of multi-line expressions, | ||
- | As a side benefit, the syntax proposed here does offer a somewhat round-about way to have a multi-line '' | + | As a side benefit, the syntax proposed here does offer a somewhat round-about way to have a multi-line '' |
<code php> | <code php> | ||
Line 130: | Line 324: | ||
$c = ...; | $c = ...; | ||
$ret = match ($a) { | $ret = match ($a) { | ||
- | 1, 3, 5 => (fn() { | + | 1, 3, 5 => (fn () { |
$val = $a * $b; | $val = $a * $b; | ||
return $val * $c; | return $val * $c; | ||
})(), | })(), | ||
- | 2, 4, 6 => (fn() { | + | 2, 4, 6 => (fn () { |
$val = $a + $b; | $val = $a + $b; | ||
return $val + $c; | return $val + $c; | ||
Line 143: | Line 337: | ||
While sub-optimal, | While sub-optimal, | ||
- | ==== Examples | + | ===== Comparison to other languages ===== |
- | Closures | + | As far as we are aware, only two languages in widespread use require variables |
- | <code php> | + | Languages commonly capture by-variable |
- | $x = function | + | |
- | // ... | + | |
- | }; | + | |
- | </ | + | |
- | From Mark: "That was just to get those variables inside a callback that could be | + | ===== History ===== |
- | invoked inside a throw-aware buffering helper." | + | |
- | Another similar example is for wrapping behavior in a transaction. Often, that is done by passing a callable to an '' | + | The first discussion [[https:// |
- | <code php> | + | In the same and subsequent discussions [[https:// |
- | public function savePost($user, $date, $title, $body, $tags) { | + | |
- | return $this->db-> | + | |
- | $this-> | + | |
- | $this-> | + | |
- | return $this-> | + | |
- | }); | + | |
- | } | + | |
- | </ | + | |
- | In this case, the '' | + | It is unclear whether |
- | ==== Comparison | + | The [[rfc: |
- | As far as we are aware, only two languages in widespread | + | The [[rfc: |
+ | |||
+ | ===== Alternative implementations ===== | ||
+ | |||
+ | A few people suggested implementing the same functionality via a different syntax, that is, basing it on the long-closure syntax with a '' | ||
+ | |||
+ | The resulting behavior in either case would be identical, making it a largely aesthetic or philosophical distinction. The authors felt that the more compact syntax is preferable, for several reasons: | ||
+ | |||
+ | - The longer form introduces more visual noise to achieve the same result. | ||
+ | - PHP developers have been using the '' | ||
+ | - With the improved | ||
+ | - Using the longer '' | ||
+ | - If converting from a single line short-lambda to a 2 line closure, switching to the long-form syntax is more work than just switching '' | ||
+ | |||
+ | For those reasons, the authors went with the '' | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 180: | Line 375: | ||
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8.1. | + | PHP 8.2. |
===== Open Issues ===== | ===== Open Issues ===== | ||
Line 188: | Line 383: | ||
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
- | Existing function | + | Existing function |
===== Future Scope ===== | ===== Future Scope ===== | ||
- | The proposal section detailed three additional possible combinations of function functionality that are not included here. While it is not likely that they have much use, the pattern here clearly lays out what they would be were a future RFC to try and implement | + | These are some possible future extensions, but the authors don't necessarily endorse |
- | Specifically, | + | ==== Explicit use list on Short Closures ==== |
+ | |||
+ | It would be possible to extend the Short Closure syntax to allow an explicit use list: | ||
<code php> | <code php> | ||
- | // Global scope | + | $fn = fn () use ($a, &$b) { |
- | $c = 1; | + | }; |
+ | </ | ||
- | fn foo($a, $b): int { | + | One anticipated use-case is to selectively capture some variables by-reference. |
- | $val = $a * $b; | + | |
- | return $val * $c; | + | |
- | } | + | |
- | fn foo($a, $b): int => $a * $b * $c; | + | There are at least two possible variations of this extension. In one of them, the use list is merged with auto-capture, |
- | $foo = function($a, | + | This RFC initially proposed the first possibility. This is not included in the current version because this appeared to create confusion. |
- | </ | + | |
- | Those versions are //not// included in this RFC. | + | ==== Optimize Arrow Functions ==== |
- | ===== Proposed Voting Choices ===== | + | This RFC proposes an optimized auto-capture. It would be possible to apply this optimization to Arrow Functions as well, but this would be a breaking change in some rare cases. |
- | This is a simple Yes/No vote, requiring 2/3 to pass. | + | This is not included in this RFC because most Arrow Functions would not benefit from this. |
+ | |||
+ | ===== Vote ===== | ||
+ | |||
+ | This is a simple Yes/No vote, requiring 2/3 to pass. Vote ends on 15 July 2022. | ||
+ | |||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | Pull Request: https:// | + | Pull Request: https:// |
===== Implementation ===== | ===== Implementation ===== | ||
Line 228: | Line 432: | ||
===== References ===== | ===== References ===== | ||
- | [[rfc: | + | * [[rfc: |
+ | * [[rfc: | ||
+ | * [[rfc: | ||
+ | |||
+ | ===== Changelog ===== | ||
+ | |||
+ | 2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact; renamed to "Short Closures 2.0" | ||
- | ===== Rejected Features ===== | ||
- | Keep this updated with features that were discussed on the mail lists. |
rfc/auto-capture-closure.1618571948.txt.gz · Last modified: 2021/04/16 11:19 by seld