rfc:auto-capture-closure
Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
rfc:auto-capture-closure [2022/05/27 15:11] crell Rework to put the arguments earlier |
rfc:auto-capture-closure [2022/07/01 12:41] (current) lbarnaud |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: Auto-capturing multi-statement closures | + | ====== PHP RFC: Short Closures 2.0 ====== |
* Version: 2.0 | * Version: 2.0 | ||
* Date: 2022-05-25 | * Date: 2022-05-25 | ||
Line 10: | Line 10: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Closures (also known as lambdas or anonymous | + | Anonymous |
- | <code php> | + | [[rfc:arrow_functions_v2|Arrow Functions]] were introduced in PHP 7.4 as an alternative. However, the single-expression limitation can lead to complex one-liners, or makes Arrow Functions unfit in many use-cases that would benefit from a more concise syntax. |
- | // As of 8.1: | + | |
- | $y = 1; | + | This RFC proposes an extension of the Arrow Function syntax supporting multiple statements: |
- | $fn1 = fn($x) => $x + $y; // auto-capture + single expression | + | <code php> |
- | + | $guests | |
- | $fn2 = function ($x) use ($y): int { // manual-capture + statement list | + | $guest = $repository-> |
- | // ... | + | return $guest !== null && in_array($guest-> |
- | + | }); | |
- | return $x + $y; | + | |
- | }; | + | |
</ | </ | ||
- | The proposed syntax combines the auto-capture and multi-line capabilities into a single syntax: | + | ===== Proposal ===== |
- | <code php> | + | Short Closures extend Arrow Functions by allowing multiple statements enclosed in '' |
- | $fn3 = fn ($x): int { // auto-capture + statement list | + | |
- | // ... | + | |
- | return $x + $y; | + | <code php> |
- | }; | + | fn (parameter_list) { |
+ | statement_list; | ||
+ | } | ||
</ | </ | ||
- | ==== Why add another function mechanism? ==== | + | The '' |
- | Long Closures in PHP can be quite verbose, even when they only perform a simple operation. This is due to a large amount | + | The syntax and behavior otherwise match those of Arrow Functions. |
- | While one-line arrow functions solve this problem to some extent, there are ample cases that require a 2-3 statement body. That is still short enough that the chances of a developer confusing in-function and out-of-function variables is very remote, but the burden of manually closing over 3-4 variables is relatively high. | + | ==== Auto capture by-value ==== |
- | One example is when you are within a class method with multiple arguments and you want to simply return a closure that uses all the arguments, using the “use” keyword to list all the arguments | + | Like Arrow Functions, Short Closures |
- | Then there are often use-cases with '' | + | <code php> |
+ | $y = 1; | ||
- | The trend in PHP in recent years has been toward more compact but still readable syntax that eliminates redundancy. | + | $fn1 = fn ($x) => $x + $y; |
- | ==== Examples ==== | + | $fn2 = fn ($x) { |
+ | return $x + $y; | ||
+ | }; | ||
- | Closures are often used to " | + | $fn3 = function ($x) use ($y) { |
- | + | | |
- | <code php> | + | |
- | $x = function () use ($to, $library, $thread, $author, $title, $library_name, | + | |
- | // ... | + | |
}; | }; | ||
</ | </ | ||
- | From Mark: "That was just to get those variables inside a callback that could be | + | ==== No explicit capture ==== |
- | invoked inside a throw-aware buffering helper." | + | |
- | Another similar example | + | Explicit capture |
+ | |||
+ | ==== Syntax ==== | ||
+ | |||
+ | The signature accepts the same syntax as that of Arrow Functions: | ||
<code php> | <code php> | ||
- | public function savePost($user, $date, $title, $body, $tags) { | + | fn () { } |
- | | + | fn ($a, $b) { } |
- | $this-> | + | fn ($a, ...$args) { } // Variadic parameter |
- | $this-> | + | fn (int $a): string { } // Type hints |
- | | + | fn ($a = 42) { } // Parameter default value |
- | }); | + | fn &($a) { } // Return by-reference |
- | } | + | fn (&$a) { } // Pass by-reference |
</ | </ | ||
- | In this case, the '' | + | The signature must be followed by '' |
- | + | ||
- | This code could, with this RFC, be simplified to the following, which is no less debuggable, more readable, and less typing to produce: | + | |
<code php> | <code php> | ||
- | public function savePost($user, $date, $title, $body, $tags) { | + | fn () { return 1; } |
- | | + | fn () { print 1; } |
- | | + | fn () { |
- | $this-> | + | $tmp = $a + $b; |
- | return $this-> | + | return $tmp; |
- | }); | + | |
} | } | ||
</ | </ | ||
- | As noted, inline callbacks may also need to capture multiple variables for only a short operation. | + | Note that Short Closures with a multi-statement body do not have an implicit return value. A '' |
- | <code php> | + | The syntax choice here is consistent with other language constructs: |
- | /** @var Product[] */ | + | |
- | $arr = [ ... ]; | + | |
- | $wantApproved = true; | + | * '' |
- | $size = 'L' | + | * Conversely, the '' |
+ | * The '' | ||
+ | * The '' | ||
- | $filtered = array_filter($arr, | + | These rules are easily recognizable and learnable by developers. |
- | if ($wantApproved) { | + | |
- | return $item-> | + | |
- | } else if ($size) { | + | |
- | return $item-> | + | |
- | } else { | + | |
- | return false; | + | |
- | } | + | |
- | }); | + | |
- | </ | + | |
- | In this case, again, the explicit '' | + | ===== Why extend Arrow Functions? ===== |
- | <code php> | + | Arrow Functions were added as an alternative to Anonymous Functions. The latter can be quite verbose, even when they only perform a simple operation. This is due to a large amount of syntactic boilerplate that is needed to manually import used variables with the '' |
- | /** @var Product[] */ | + | |
- | $arr = [ ... ]; | + | |
- | $wantApproved = true; | + | While Arrow Functions solve this problem to some extent, the one-expression limit can lead to one-liners with non ideal readability, |
- | $size = ' | + | |
- | $filtered | + | As an example, writing the following code snippet with a single-expression Arrow Function would degrade legibility, but writing it as an Anonymous Function would be cumbersome: |
- | if ($wantApproved) { | + | |
- | | + | <code php> |
- | } else if ($size) { | + | $guests |
- | return $item->size() == $size; | + | $guest = $repository->findByUserId($user-> |
- | } else { | + | return $guest !== null && in_array($guest->id, $guestsIds); |
- | return false; | + | |
- | } | + | |
}); | }); | ||
</ | </ | ||
- | The majority of closures users fall into these type of categories. | + | ===== Discussion on auto-capture ===== |
- | ===== Proposal ===== | + | Auto capture was first introduced by Arrow Functions. |
- | ==== Auto-capture | + | In the past, there had been reticence about auto-capture |
- | Specifically, this RFC adds the following syntax: | + | Implementation difficulties arise from by-reference or by-variable semantics, especially when supporting dynamic means of accessing variables like variable-variables, |
- | <code php> | + | As noted in the benchmarks section, the implementation offered here has effectively no performance impact either way. |
- | // An anonymous, locally available function. | + | |
- | // Variables are auto-captured lexically. | + | |
- | // The body is a statement list, with possibly a return statement; | + | |
- | $c = 1; | + | |
- | $foo = fn(int $a, int $b):int { | + | |
- | $val = $a * $b; | + | |
- | return $val * $c; | + | |
- | }; | + | |
- | </ | + | |
- | The syntax choice here leads to the following consistent syntactic meanings: | + | In the majority of cases where closures are used in practice, the code involved is short enough that debugging is not hampered by automatic capture. |
- | * The '' | + | Potential confusing behavior is further mitigated by PHP's (correct) use of by-value capture, |
- | * '' | + | |
- | * The '' | + | |
- | * The '' | + | |
- | * A function with a name is declared globally at compile time. A function without a name is declared locally as a closure at runtime. | + | |
- | These rules are easily recognizable and learnable by developers. | + | Furthermore, |
- | The '' | + | For those few cases in which, for whatever reason, the developer is concerned about auto-capture reducing debugability |
- | + | ||
- | <code php> | + | |
- | $c = 1; | + | |
- | $foo = fn($a, $b) use (& | + | |
- | $val = $a * $b; | + | |
- | return $val * $c; | + | |
- | }; | + | |
- | </ | + | |
- | In practice, we anticipate | + | ==== Using variables from the parent block ==== |
- | ==== Explicit capture ==== | + | Using variables from the parent block is not unusual in PHP. We do it all the time in loops. |
- | The proposed syntax supports explicit capture with the '' | + | In the following example, the loop uses three variables from the parent block. We have learned to recognize that what follows a '' |
<code php> | <code php> | ||
- | $c = 1; | + | $guests |
- | fn () use ($a, &$b) { | + | foreach |
- | | + | $guest = $repository-> |
- | // $b is explicitly captured by reference | + | if ($guest !== null && in_array($guest->id, $guestsIds)) { |
- | // | + | $guests[] = $guest; |
+ | } | ||
} | } | ||
</ | </ | ||
- | This allows auto-capturing multi-statement closures to match long closures in functionality. Without this, it could be necessary to switch back and forth between | + | In the following example, the function uses two variables from the parent block, which should not be more surprising than with a loop once we have learned that what follows a '' |
- | We expect that explicitly capturing by value will be rare in practice. | + | <code php> |
+ | $guests = array_filter($users, | ||
+ | $guest = $repository-> | ||
+ | return $guest !== null && in_array($guest-> | ||
+ | }); | ||
+ | </ | ||
- | ==== Auto-capture semantics ==== | + | However the comparison stops here. These two examples do not behave equally with regard to side effects: Variable assignments to the '' |
- | The auto-capture semantics presented here are designed to be intuitive and have negligible performance impact. | + | ==== Capture is by-value, no unintended side-effects ==== |
- | Auto-capturing multi-statement closures can access all variables | + | It is important to note that the default capture mode in Anonymous Functions, Arrow Functions, and Short Closures is by-value. This purposefully differs from the semantics commonly found in other programming languages. |
+ | |||
+ | A by-value capture means that it is not possible to modify any variables | ||
<code php> | <code php> | ||
$a = 1; | $a = 1; | ||
- | $b = 2; | ||
$f = fn () { | $f = fn () { | ||
- | | + | $a++; // Has no effect outside of the function |
+ | $tmp = $a + 1; // Has no effect outside of the function | ||
+ | return | ||
}; | }; | ||
- | $f(); // prints "3" | + | print $a; // prints " |
+ | $f(); | ||
+ | print $a; // prints "1" | ||
</ | </ | ||
- | Accessed variables are bound //by value// at the time of the function | + | Conversely, |
<code php> | <code php> | ||
Line 213: | Line 186: | ||
$f(); // prints " | $f(); // prints " | ||
</ | </ | ||
+ | |||
+ | Because variables are bound by-value, the confusing behaviors often associated with closures do not exist. As an example, the following code snippet demonstrates such a behavior in JavaScript: | ||
+ | |||
+ | <code javascript> | ||
+ | // JavaScript | ||
+ | var fns = []; | ||
+ | for (var i = 0; i < 3; i++) { | ||
+ | fns.push(function() { | ||
+ | console.log(i); | ||
+ | }); | ||
+ | } | ||
+ | for (var k in fns) { | ||
+ | var fn = fns[k]; | ||
+ | fn(); // Prints " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | In PHP the behavior is intuitive and less confusing: | ||
<code php> | <code php> | ||
- | $a = 1; | + | // PHP |
- | $f = fn () { | + | $fns = []; |
- | $a++; | + | for ($i = 0; $i < 3; $i++) { |
+ | $fns[] = fn () { | ||
+ | print $i; | ||
+ | }; | ||
+ | } | ||
+ | foreach ($fns as $fn) { | ||
+ | $fn(); // Prints " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | In JavaScript the same output can be obtained by declaring '' | ||
+ | |||
+ | In PHP, the variable is captured by-value, thus entirely avoiding the confusion. | ||
+ | |||
+ | Of course, functions can have side-effects when accessing mutable values such as objects or resources. The following example demonstrates this: | ||
+ | <code php> | ||
+ | $d = new DateTime(); | ||
+ | |||
+ | $fn1 = fn () { | ||
+ | $d-> | ||
}; | }; | ||
- | print $a; // prints " | + | $fn2 = function () use ($d) { |
- | $f(); | + | $d-> |
- | print $a; // prints " | + | }; |
+ | |||
+ | $fn3 = function | ||
+ | $d-> | ||
+ | }; | ||
</ | </ | ||
- | Because variables are bound by value, the potential for " | + | ===== Auto-capture semantics ===== |
- | This is the behavior of long closures with explicit | + | The RFC inherits |
- | For performance reasons, only the variables that are directly accessed with the variable | + | > Short Closures can access a snapshot of the variable |
- | Additionally, | + | This can also be stated as follows: |
- | We can express | + | > Short Closures |
- | The "at least" part has only marginal effect aside from performance, | + | This is implemented by binding the value of the declaring scope variables to local variables in the function. This is referred to as //capture// in this RFC. |
- | ==== Implementation details | + | This RFC leaves unspecified which variables are captured, as long as these semantics are maintained. |
+ | |||
+ | ==== Optimization | ||
- | Auto-capturing | + | A naive approach would capture //all// the variables |
<code php> | <code php> | ||
$tmp = 5; | $tmp = 5; | ||
- | fn() { | + | fn () { |
$tmp = foo(); | $tmp = foo(); | ||
bar($tmp); | bar($tmp); | ||
Line 250: | Line 266: | ||
</ | </ | ||
- | A naive capture mechanism | + | This approach |
- | Capture analysis, | + | The implementation proposed in this RFC prevents this by attempting to capture |
- | In practice, auto-capturing multi-statement closures end up capturing the same set of variables | + | These implementation details are irrelevant for most purposes, as they do not have an effect |
- | This retains | + | * If there is a possibility that a variable may be read by the function before binding it, it is captured |
+ | * When inspecting the code, the following operations are assumed to always bind a variable without reading it: | ||
+ | * Variable assignments | ||
+ | * Variable assignments by reference | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * This excludes assignments to object properties (they never bind the variable), assignments to array dimensions (they read the variable) | ||
+ | * In all other situations in which a variable is used, it is assumed that it is read | ||
- | ==== Benchmarks ==== | + | This optimization is not applied to Arrow Functions because variable bindings are unusual in these functions. |
- | In benchmarks, the implementation in the 1.0 version | + | ==== Observable effects |
- | The 2.0 version, proposed here, has only marginal | + | As long as the semantics are maintained, whether a variable is captured or not is largely irrelevant for most purposes, and can be observed |
- | The capture | + | * When debugging: Whether a variable is captured or not may be visible in the list of variables in scope in debuggers. Captured variables are local variables in the Closure, initialized to the captured value.\\ \\ |
+ | * Via reflection: Captured variables will be visible in ReflectionFunction.\\ \\ | ||
+ | * Via dynamic variable access: Means to access variables dynamically, | ||
+ | * Via destructors: | ||
+ | * Via resource usage: Capturing too much could increase memory or CPU usage. | ||
- | For more benchmark | + | ==== Implementation |
- | ==== Methods ==== | + | The capture analysis used in this RFC will only capture the variables that may be read before being assigned by the function. This uses the Optimizer' |
- | As methods cannot be anonymous, there are no impacts on methods from this RFC. | + | This maintains the semantics described earlier, so an understanding of these semantics is enough to reason about Short Closures. |
+ | |||
+ | ===== Benchmarks ===== | ||
+ | |||
+ | In benchmarks, the implementation in the 1.0 version of this RFC showed a notable CPU and memory increase when using auto-capturing multi-statement closure in some cases. | ||
+ | |||
+ | The 2.0 version, proposed here, has only marginal impact compared to PHP 8.1, well within the margin of error for profiling tools. In some cases the profiling run shows the Short Closure version being slightly more performant, which is likely just random test jitter between runs. We therefore conclude that the performance impact of this approach is effectively zero. | ||
+ | |||
+ | The capture analysis approach described above makes Short Closures as efficient as Anonymous Functions. | ||
+ | |||
+ | For more benchmark details, see: https:// | ||
- | ==== What about long-closures? ==== | + | ===== What about Anonymous Functions? ===== |
- | The existing | + | The existing |
- | ==== Multi-line expressions ==== | + | ===== Multi-line expressions |
There has been related discussion of multi-line expressions, | There has been related discussion of multi-line expressions, | ||
Line 286: | Line 324: | ||
$c = ...; | $c = ...; | ||
$ret = match ($a) { | $ret = match ($a) { | ||
- | 1, 3, 5 => (fn() { | + | 1, 3, 5 => (fn () { |
$val = $a * $b; | $val = $a * $b; | ||
return $val * $c; | return $val * $c; | ||
})(), | })(), | ||
- | 2, 4, 6 => (fn() { | + | 2, 4, 6 => (fn () { |
$val = $a + $b; | $val = $a + $b; | ||
return $val + $c; | return $val + $c; | ||
Line 299: | Line 337: | ||
While sub-optimal, | While sub-optimal, | ||
- | ==== Comparison to other languages ==== | + | ===== Comparison to other languages |
As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here. | As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here. | ||
- | Many languages tend to capture by reference. In practice this can lead to surprising | + | Languages commonly |
- | ==== Counter-arguments | + | ===== History ===== |
- | In the past, there has been reticence about auto-capture that has kept it out of previous evolutions in closures. Mostly that has boiled down to two concerns: Performance | + | The first discussion [[https:// |
- | As noted above in the benchmarks section, the implementation offered here has effectively no performance impact either way. | + | In the same and subsequent discussions [[https:// |
- | In the majority | + | It is unclear whether this was chosen because |
- | Potential confusing behavior is further mitigated by PHP's (correct) use of by-value capture, which minimizes | + | The [[rfc: |
- | Furthermore, | + | The [[rfc: |
- | For those few cases in which, for whatever reason, the developer is concerned about auto-capture | + | ===== Alternative implementations ===== |
+ | |||
+ | A few people suggested implementing the same functionality via a different syntax, that is, basing it on the long-closure syntax with a '' | ||
+ | |||
+ | The resulting behavior in either case would be identical, making it a largely aesthetic or philosophical distinction. The authors felt that the more compact syntax is preferable, for several reasons: | ||
+ | |||
+ | - The longer form introduces more visual noise to achieve the same result. | ||
+ | - PHP developers have been using the '' | ||
+ | - With the improved | ||
+ | | ||
+ | - If converting from a single line short-lambda to a 2 line closure, switching to the long-form | ||
+ | |||
+ | For those reasons, the authors went with the '' | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 334: | Line 384: | ||
Existing function syntaxes continues to work precisely as they do now. Only new combinations are possible. | Existing function syntaxes continues to work precisely as they do now. Only new combinations are possible. | ||
+ | |||
+ | ===== Future Scope ===== | ||
+ | |||
+ | These are some possible future extensions, but the authors don't necessarily endorse them. | ||
+ | |||
+ | ==== Explicit use list on Short Closures ==== | ||
+ | |||
+ | It would be possible to extend the Short Closure syntax to allow an explicit use list: | ||
+ | |||
+ | <code php> | ||
+ | $fn = fn () use ($a, &$b) { | ||
+ | }; | ||
+ | </ | ||
+ | |||
+ | One anticipated use-case is to selectively capture some variables by-reference. | ||
+ | |||
+ | There are at least two possible variations of this extension. In one of them, the use list is merged with auto-capture, | ||
+ | |||
+ | This RFC initially proposed the first possibility. This is not included in the current version because this appeared to create confusion. | ||
+ | |||
+ | ==== Optimize Arrow Functions ==== | ||
+ | |||
+ | This RFC proposes an optimized auto-capture. It would be possible to apply this optimization to Arrow Functions as well, but this would be a breaking change in some rare cases. | ||
+ | |||
+ | This is not included in this RFC because most Arrow Functions would not benefit from this. | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
This is a simple Yes/No vote, requiring 2/3 to pass. | This is a simple Yes/No vote, requiring 2/3 to pass. | ||
+ | |||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 351: | Line 432: | ||
===== References ===== | ===== References ===== | ||
- | [[rfc: | + | * [[rfc: |
+ | * [[rfc: | ||
+ | * [[rfc: | ||
===== Changelog ===== | ===== Changelog ===== | ||
- | 2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact | + | 2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact; renamed to "Short Closures 2.0" |
rfc/auto-capture-closure.1653664309.txt.gz · Last modified: 2022/05/27 15:11 by crell