rfc:auto-capture-closure
Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
rfc:auto-capture-closure [2022/05/26 12:50] lbarnaud |
rfc:auto-capture-closure [2022/07/02 13:12] (current) imsop update "Vote" heading |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: Auto-capturing multi-statement closures | + | ====== PHP RFC: Short Closures 2.0 ====== |
* Version: 2.0 | * Version: 2.0 | ||
* Date: 2022-05-25 | * Date: 2022-05-25 | ||
Line 5: | Line 5: | ||
* Author: Larry Garfield (larry@garfieldtech.com) | * Author: Larry Garfield (larry@garfieldtech.com) | ||
* Author: Arnaud Le Blanc (arnaud.lb@gmail.com) | * Author: Arnaud Le Blanc (arnaud.lb@gmail.com) | ||
- | * Status: In Discussion | + | * Status: In Voting |
* First Published at: http:// | * First Published at: http:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Closures (also known as lambdas or anonymous | + | Anonymous |
- | <code php> | + | [[rfc:arrow_functions_v2|Arrow Functions]] were introduced in PHP 7.4 as an alternative. However, the single-expression limitation can lead to complex one-liners, or makes Arrow Functions unfit in many use-cases that would benefit from a more concise syntax. |
- | // As of 8.1: | + | |
- | $y = 1; | + | This RFC proposes an extension of the Arrow Function syntax supporting multiple statements: |
- | $fn1 = fn($x) => $x + $y; // auto-capture + single expression | + | <code php> |
+ | $guests | ||
+ | $guest | ||
+ | return | ||
+ | }); | ||
+ | </code> | ||
- | $fn2 = function ($x) use ($y): int { // manual-capture + statement list | + | ===== Proposal ===== |
- | // ... | + | |
- | return $x + $y; | + | Short Closures extend Arrow Functions by allowing multiple statements enclosed in '' |
- | }; | + | |
+ | <code php> | ||
+ | fn (parameter_list) { | ||
+ | statement_list; | ||
+ | } | ||
</ | </ | ||
- | The proposed | + | The '' |
+ | |||
+ | The syntax | ||
+ | |||
+ | ==== Auto capture by-value ==== | ||
+ | |||
+ | Like Arrow Functions, Short Closures use auto capture | ||
<code php> | <code php> | ||
- | $fn3 = fn ($x): int { // auto-capture | + | $y = 1; |
- | // ... | + | |
+ | $fn1 = fn ($x) => $x + $y; | ||
+ | $fn2 = fn ($x) { | ||
+ | return $x + $y; | ||
+ | }; | ||
+ | |||
+ | $fn3 = function ($x) use ($y) { | ||
return $x + $y; | return $x + $y; | ||
}; | }; | ||
</ | </ | ||
- | ===== Proposal ===== | + | ==== No explicit capture |
- | ==== Background ==== | + | Explicit capture is not included in the new syntax. |
- | As of PHP 8.1, the following syntaxes around functions have the following meaning: | + | ==== Syntax ==== |
+ | |||
+ | The signature accepts the same syntax as that of Arrow Functions: | ||
<code php> | <code php> | ||
+ | fn () { } | ||
+ | fn ($a, $b) { } | ||
+ | fn ($a, ...$args) { } // Variadic parameter | ||
+ | fn (int $a): string { } // Type hints | ||
+ | fn ($a = 42) { } // Parameter default value | ||
+ | fn &($a) { } // Return by-reference | ||
+ | fn (&$a) { } // Pass by-reference | ||
+ | </ | ||
- | // A named, globally available function. | + | The signature must be followed by '' |
- | // No variables are auto-captured from the environment. | + | |
- | // The body is a statement list, with possibly a return | + | <code php> |
- | function foo($a, $b, $c): int { | + | fn () { return |
- | | + | fn () { print 1; } |
+ | fn () { | ||
+ | $tmp = $a + $b; | ||
+ | return | ||
} | } | ||
+ | </ | ||
- | // An anonymous, locally available function. | + | Note that Short Closures with a multi-statement |
- | // Variables are explicitly captured lexically. | + | |
- | // The body is a statement | + | |
- | $foo = function ($a, $b) use ($c) { | + | |
- | | + | |
- | }; | + | |
- | // An anonymous, locally available | + | The syntax choice here is consistent with other language constructs: |
- | // Variables are auto-captured lexically. | + | |
- | // The body is a single-expression, | + | * '' |
- | $foo = fn($a, $b): int => $a * $b * $c; | + | * Conversely, the '' |
+ | * The '' | ||
+ | * The '' | ||
+ | |||
+ | These rules are easily recognizable and learnable by developers. | ||
+ | |||
+ | ===== Why extend Arrow Functions? ===== | ||
+ | |||
+ | Arrow Functions were added as an alternative to Anonymous Functions. | ||
+ | |||
+ | While Arrow Functions solve this problem to some extent, the one-expression | ||
+ | |||
+ | As an example, writing the following code snippet with a single-expression Arrow Function would degrade legibility, but writing it as an Anonymous Function would be cumbersome: | ||
+ | |||
+ | <code php> | ||
+ | $guests | ||
+ | $guest | ||
+ | return | ||
+ | }); | ||
</ | </ | ||
- | That is, a function may be named or local/ | + | ===== Discussion on auto-capture |
- | The declined [[rfc: | + | Auto capture |
- | This RFC seeks to add a different combination: | + | In the past, there had been reticence about auto-capture |
- | The remaining combinations would be: | + | Implementation difficulties arise from by-reference or by-variable semantics, especially when supporting dynamic means of accessing variables like variable-variables, |
- | * named function, auto-capture, | + | As noted in the benchmarks section, the implementation offered here has effectively no performance impact either way. |
- | * named function, auto-capture, | + | |
- | * anonymous function, manual-capture, | + | |
- | None of these additional variants | + | In the majority |
- | ==== Auto-capture | + | Potential confusing behavior is further mitigated by PHP's (correct) use of by-value capture, which minimizes the potential for inadvertent confusing changes to values from closures. |
- | Specifically, this RFC adds the following | + | Furthermore, |
+ | |||
+ | For those few cases in which, for whatever reason, the developer is concerned about auto-capture reducing debugability or about accidental capture, the existing explicit-only | ||
+ | |||
+ | ==== Using variables from the parent block ==== | ||
+ | |||
+ | Using variables from the parent block is not unusual in PHP. We do it all the time in loops. | ||
+ | |||
+ | In the following example, the loop uses three variables from the parent block. We have learned to recognize that what follows a '' | ||
<code php> | <code php> | ||
- | // An anonymous, locally available function. | + | $guests = []; |
- | // Variables are auto-captured lexically. | + | foreach ($users as $user) { |
- | // The body is a statement list, with possibly a return statement; | + | $guest |
- | $c = 1; | + | if ($guest !== null && in_array($guest-> |
- | $foo = fn($a, $b):int { | + | $guests[] |
- | $val = $a * $b; | + | } |
- | | + | } |
- | }; | + | |
</ | </ | ||
- | The syntax choice here leads to the following | + | In the following |
- | * The '' | + | <code php> |
- | * '' | + | $guests = array_filter($users, fn ($user) { |
- | * The '' | + | |
- | * The '' | + | |
- | * A function with a name is declared globally at compile time. A function without a name is declared locally as a closure at runtime. | + | }); |
- | + | </ | |
- | These rules are easily recognizable and learnable by developers. | + | |
- | === Variable | + | However the comparison stops here. These two examples do not behave equally with regard to side effects: |
- | == Auto-capture semantics | + | ==== Capture is by-value, no unintended side-effects ==== |
- | We propose auto-capture | + | It is important to note that the default |
- | Auto-capturing multi-statement closures can access all variables | + | A by-value capture means that it is not possible to modify any variables |
<code php> | <code php> | ||
$a = 1; | $a = 1; | ||
- | $b = 2; | ||
$f = fn () { | $f = fn () { | ||
- | | + | $a++; // Has no effect outside of the function |
+ | $tmp = $a + 1; // Has no effect outside of the function | ||
+ | return | ||
}; | }; | ||
- | $f(); // prints "3" | + | print $a; // prints " |
+ | $f(); | ||
+ | print $a; // prints "1" | ||
</ | </ | ||
- | Accessed variables are bound //by value// at the time of the function | + | Conversely, |
<code php> | <code php> | ||
Line 133: | Line 186: | ||
$f(); // prints " | $f(); // prints " | ||
</ | </ | ||
- | <code php> | ||
- | $a = 1; | ||
- | $f = fn () { | ||
- | $a++; | ||
- | }; | ||
- | print $a; // prints " | + | Because variables are bound by-value, the confusing behaviors often associated with closures do not exist. As an example, the following code snippet demonstrates such a behavior in JavaScript: |
- | $f(); | + | |
- | print $a; // prints | + | <code javascript> |
+ | // JavaScript | ||
+ | var fns = []; | ||
+ | for (var i = 0; i < 3; i++) { | ||
+ | fns.push(function() { | ||
+ | console.log(i); | ||
+ | }); | ||
+ | } | ||
+ | for (var k in fns) { | ||
+ | var fn = fns[k]; | ||
+ | fn(); // Prints | ||
+ | } | ||
</ | </ | ||
- | Because variables bound by value, | + | In PHP the behavior |
- | This is the behavior of long closures with explicit capture and of arrow functions. | + | <code php> |
+ | // PHP | ||
+ | $fns = []; | ||
+ | for ($i = 0; $i < 3; $i++) { | ||
+ | $fns[] = fn () { | ||
+ | print $i; | ||
+ | }; | ||
+ | } | ||
+ | foreach ($fns as $fn) { | ||
+ | $fn(); // Prints " | ||
+ | } | ||
+ | </ | ||
- | For performance reasons, only the variables that are directly accessed | + | In JavaScript |
- | Additionally, variables that are always assigned by the closure before being read are not captured, | + | In PHP, the variable is captured |
- | We can express these semantics more succinctly like this: Auto-capturing multi-statement closures capture at least all the variables that are directly accessed by the closure. | + | Of course, functions |
+ | <code php> | ||
+ | $d = new DateTime(); | ||
- | The "at least" part has only marginal effect aside from performance, | + | $fn1 = fn () { |
+ | $d-> | ||
+ | }; | ||
- | == Explicit capture == | + | $fn2 = function () use ($d) { |
- | + | | |
- | The proposed syntax supports explicit capture with the '' | + | }; |
- | <code php> | + | $fn3 = function |
- | $c = 1; | + | $d-> |
- | fn () use ($a, &$b) { | + | }; |
- | | + | |
- | // $b is captured by reference | + | |
- | } | + | |
</ | </ | ||
- | This allows | + | ===== Auto-capture semantics ===== |
+ | |||
+ | The RFC inherits the auto-capture semantics of Arrow Functions. These semantics can be stated as follows: | ||
+ | |||
+ | > Short Closures can access a snapshot of the variable bindings of their declaring scope by accessing variables literally. The snapshot is taken when the function is declared. Assignments | ||
+ | |||
+ | This can also be stated as follows: | ||
+ | |||
+ | > Short Closures can read variables of their declaring scope by accessing variables literally. The values of these variables are the ones that were bound to them at function declaration. Assignments to variables do not have an effect on the declaring scope. | ||
+ | |||
+ | This is implemented | ||
- | We expect that explicitly capturing by value will be rare in practice. | + | This RFC leaves unspecified which variables are captured, as long as these semantics are maintained. |
- | == Implementation details | + | ==== Optimization ==== |
- | Auto-capturing | + | A naive approach would capture //all// the variables |
<code php> | <code php> | ||
- | fn() { | + | $tmp = 5; |
+ | fn () { | ||
$tmp = foo(); | $tmp = foo(); | ||
bar($tmp); | bar($tmp); | ||
Line 184: | Line 266: | ||
</ | </ | ||
- | This can lead to additional | + | This approach would result in a waste of memory or CPU usage. |
- | The version 2.0 of this RFC, proposed here, avoids | + | The implementation proposed in this RFC prevents |
- | In practice, auto-capturing multi-statement closures end up capturing the same set of variables | + | These implementation details are irrelevant for most purposes, as they do not have an effect |
- | This optimization makes auto-capturing multi-statement closures as efficient as long closures with explicit capture. | + | * If there is a possibility that a variable may be read by the function before binding it, it is captured |
+ | * When inspecting the code, the following operations are assumed to always bind a variable without reading it: | ||
+ | * Variable assignments | ||
+ | * Variable assignments by reference | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * This excludes assignments to object properties (they never bind the variable), assignments to array dimensions (they read the variable) | ||
+ | * In all other situations in which a variable is used, it is assumed that it is read | ||
- | In profiling, the implementation | + | This optimization is not applied to Arrow Functions because variable bindings are unusual |
- | For more details, see: https:// | + | ==== Observable effects of capture ==== |
- | Capture analysis, | + | As long as the semantics are maintained, whether a variable |
- | This retains | + | * When debugging: Whether a variable is captured or not may be visible in the list of variables in scope in debuggers. Captured variables are local variables |
+ | * Via reflection: Captured variables will be visible in ReflectionFunction.\\ \\ | ||
+ | * Via dynamic variable access: Means to access variables dynamically, | ||
+ | * Via destructors: | ||
+ | * Via resource usage: Capturing too much could increase memory or CPU usage. The optimized capture used in this RFC prevents this. It ends up capturing | ||
- | == Changes to arrow functions | + | ==== Implementation details ==== |
- | This RFC proposes to change the capture analysis | + | The capture analysis |
- | There is a very rare case of breaking change, described in the breaking changes section. | + | This maintains |
- | ==== Why add another function mechanism? | + | ===== Benchmarks ===== |
- | Long Closures in PHP can be quite verbose, even when they only perform a simple operation. This is due to a large amount | + | In benchmarks, the implementation in the 1.0 version |
- | While one-line arrow functions solve this problem | + | The 2.0 version, proposed here, has only marginal impact compared |
- | One example is when you are within a class method with multiple arguments and you want to simply return a closure that uses all the arguments, using the “use” keyword to list all the arguments is entirely redundant and pointless. | + | The capture analysis approach described above makes Short Closures as efficient as Anonymous Functions. |
- | Then there are often use-cases with '' | + | For more benchmark details, see: https:// |
- | The trend in PHP in recent years has been toward more compact but still readable syntax that eliminates redundancy. | + | ===== What about Anonymous Functions? ===== |
- | ==== Methods ==== | + | The existing Anonymous Function syntax remains valid, and there is no intent to deprecate it. |
- | As methods cannot be anonymous, there are no impacts on methods from this RFC. | + | ===== Multi-line expressions |
- | + | ||
- | ==== What about long-closures? | + | |
- | + | ||
- | The existing multi-line closure syntax remains valid, and there is no intent to deprecate it. | + | |
- | + | ||
- | ==== Multi-line expressions ==== | + | |
There has been related discussion of multi-line expressions, | There has been related discussion of multi-line expressions, | ||
Line 236: | Line 324: | ||
$c = ...; | $c = ...; | ||
$ret = match ($a) { | $ret = match ($a) { | ||
- | 1, 3, 5 => (fn() { | + | 1, 3, 5 => (fn () { |
$val = $a * $b; | $val = $a * $b; | ||
return $val * $c; | return $val * $c; | ||
})(), | })(), | ||
- | 2, 4, 6 => (fn() { | + | 2, 4, 6 => (fn () { |
$val = $a + $b; | $val = $a + $b; | ||
return $val + $c; | return $val + $c; | ||
Line 249: | Line 337: | ||
While sub-optimal, | While sub-optimal, | ||
- | ==== Examples | + | ===== Comparison to other languages ===== |
- | Closures | + | As far as we are aware, only two languages in widespread use require variables |
- | <code php> | + | Languages commonly capture by-variable |
- | $x = function | + | |
- | // ... | + | |
- | }; | + | |
- | </ | + | |
- | From Mark: "That was just to get those variables inside a callback that could be | + | ===== History ===== |
- | invoked inside a throw-aware buffering helper." | + | |
- | Another similar example is for wrapping behavior in a transaction. Often, that is done by passing a callable to an '' | + | The first discussion [[https:// |
- | <code php> | + | In the same and subsequent discussions [[https:// |
- | public function savePost($user, $date, $title, $body, $tags) { | + | |
- | return $this->db-> | + | |
- | $this-> | + | |
- | $this-> | + | |
- | return $this-> | + | |
- | }); | + | |
- | } | + | |
- | </ | + | |
- | In this case, the '' | + | It is unclear whether |
- | ==== Comparison | + | The [[rfc: |
- | As far as we are aware, only two languages in widespread use require variables to be explicitly closed over: PHP and C++. All other major languages capture implicitly, as is proposed here. | + | The [[rfc:arrow_functions_v2|Arrow Functions 2.0]] RFC was accepted with a large majority. Compared to the Short Closures 1.0 RFC, it addressed the syntax and type hints concerns, limited the body to only one expression, and kept implicit closure by-value. |
- | Many languages tend to capture by variable. In practice this can lead to surprising effects, especially in loops. | + | ===== Alternative implementations ===== |
- | ===== Backward Incompatible Changes ===== | + | A few people suggested implementing the same functionality via a different syntax, that is, basing it on the long-closure syntax with a '' |
- | Changing capture analysis | + | The resulting behavior |
- | <code php> | + | - The longer form introduces more visual noise to achieve the same result. |
- | $var = 'a'; | + | - PHP developers have been using the '' |
- | fn () => $$var && $a = 1; | + | - With the improved capture logic, many of the arguments for the explicit capture syntax go away. |
- | </code> | + | - Using the longer '' |
+ | - If converting from a single line short-lambda to a 2 line closure, switching to the long-form syntax is more work than just switching '' | ||
- | Occurrences of this should be rare because assignments in arrow functions have no effect on the outer scope. | + | For those reasons, |
+ | |||
+ | ===== Backward Incompatible Changes ===== | ||
+ | |||
+ | None. | ||
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 303: | Line 383: | ||
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
- | Existing function | + | Existing function |
===== Future Scope ===== | ===== Future Scope ===== | ||
- | The proposal section detailed three additional possible combinations of function functionality that are not included here. While it is not likely that they have much use, the pattern here clearly lays out what they would be were a future RFC to try and implement | + | These are some possible future extensions, but the authors don't necessarily endorse |
- | Specifically, | + | ==== Explicit use list on Short Closures ==== |
+ | |||
+ | It would be possible to extend the Short Closure syntax to allow an explicit use list: | ||
<code php> | <code php> | ||
- | // Global scope | + | $fn = fn () use ($a, &$b) { |
- | $c = 1; | + | }; |
+ | </ | ||
- | fn foo($a, $b): int { | + | One anticipated use-case is to selectively capture some variables by-reference. |
- | $val = $a * $b; | + | |
- | return $val * $c; | + | |
- | } | + | |
- | fn foo($a, $b): int => $a * $b * $c; | + | There are at least two possible variations of this extension. In one of them, the use list is merged with auto-capture, |
- | $foo = function($a, | + | This RFC initially proposed the first possibility. This is not included in the current version because this appeared to create confusion. |
- | </ | + | |
- | Those versions are //not// included in this RFC. | + | ==== Optimize Arrow Functions ==== |
- | ===== Proposed Voting Choices ===== | + | This RFC proposes an optimized auto-capture. It would be possible to apply this optimization to Arrow Functions as well, but this would be a breaking change in some rare cases. |
- | This is a simple Yes/No vote, requiring 2/3 to pass. | + | This is not included in this RFC because most Arrow Functions would not benefit from this. |
+ | |||
+ | ===== Vote ===== | ||
+ | |||
+ | This is a simple Yes/No vote, requiring 2/3 to pass. Vote ends on 15 July 2022. | ||
+ | |||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 343: | Line 432: | ||
===== References ===== | ===== References ===== | ||
- | [[rfc: | + | * [[rfc: |
+ | * [[rfc: | ||
+ | * [[rfc: | ||
===== Changelog ===== | ===== Changelog ===== | ||
- | 2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact | + | 2.0: Updated for new patch; reduced discussion of short-function RFC and related topics; expanded discussion of the capture rules and noted benchmarks showing minimal performance impact; renamed to "Short Closures 2.0" |
rfc/auto-capture-closure.1653569407.txt.gz · Last modified: 2022/05/26 12:50 by lbarnaud