rfc:implicit_move_optimisation
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:implicit_move_optimisation [2023/05/13 16:53] – structure, future scope, impact nielsdos | rfc:implicit_move_optimisation [2023/05/14 16:24] (current) – spell out ETL nielsdos | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Implicit Move Optimisation ====== | ====== PHP RFC: Implicit Move Optimisation ====== | ||
- | * Version: 0.1 | + | * Version: 0.2.2 |
* Date: 2023-05-13 | * Date: 2023-05-13 | ||
* Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | * Author: Niels Dossche (nielsdos), dossche.niels@gmail.com | ||
Line 26: | Line 26: | ||
</ | </ | ||
- | The code example shows an array being passed to " | + | The code example shows an array being passed to " |
- | While this example may not highlight the significance of the copy cost, there are scenarios where it does matter, such as processing large data batches (e.g., | + | While this example may not highlight the significance of the copy cost, there are scenarios where it does matter, such as processing large data batches (e.g., |
+ | |||
+ | A similar case where this optimisation is beneficial is for variables which are passed to functions and then never used again the caller. | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | This will be implemented as an addition to the existing DFA optimisation pass. | + | We will first give some brief background information and then dive into the technical details of the proposal. |
- | ==== RC1 functions | + | === Background |
+ | |||
+ | During optimisation, | ||
+ | |||
+ | <code php> | ||
+ | function main(array $array) { | ||
+ | $a = $array; | ||
+ | $a = array_merge($a, | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | This results in the following SSA code, with the type information removed for clarity. | ||
+ | |||
+ | < | ||
+ | main: | ||
+ | ; # | ||
+ | ; #1.CV1($a) NOVAL | ||
+ | 0000 # | ||
+ | 0001 #3.CV1($a) = QM_ASSIGN # | ||
+ | 0002 INIT_FCALL 2 112 string(" | ||
+ | 0003 SEND_VAR #3.CV1($a) 1 | ||
+ | 0004 SEND_VAL array(...) 2 | ||
+ | 0005 #5.V3 = DO_ICALL | ||
+ | 0006 ASSIGN #3.CV1($a) -> #6.CV1($a) NOVAL #5.V3 | ||
+ | 0007 RETURN null | ||
+ | </ | ||
+ | |||
+ | Let's break it down. | ||
+ | A variable that's actually in the code, like " | ||
+ | Every time " | ||
+ | |||
+ | === Technical Details === | ||
+ | |||
+ | This optimisation, | ||
+ | |||
+ | The implementation pull request introduces this optimisation by setting a special flag on the opcodes for SEND_VAR and SEND_VAR_EX, | ||
+ | |||
+ | When do we set that flag? The SSA optimisation pipeline already detects when an SSA variable is never used after its definition. This happens using the NOVAL flag. We use this information, | ||
+ | |||
+ | Because the optimisation unsets the PHP variable, we need a new SSA definition for the affected PHP variable. The SSA code from above will look like this with this optimisation: | ||
+ | |||
+ | < | ||
+ | main: | ||
+ | ; # | ||
+ | ; #1.CV1($a) NOVAL | ||
+ | 0000 # | ||
+ | 0001 #3.CV1($a) = QM_ASSIGN # | ||
+ | 0002 INIT_FCALL 2 112 string(" | ||
+ | 0003 SEND_VAR #3.CV1($a) -> #4.CV1($a) NOVAL 1 | ||
+ | 0004 SEND_VAL array(...) 2 | ||
+ | 0005 #5.V3 = DO_ICALL | ||
+ | 0006 ASSIGN #4.CV1($a) NOVAL -> #6.CV1($a) NOVAL #5.V3 | ||
+ | 0007 RETURN null | ||
+ | </ | ||
+ | |||
+ | As we can see above, there is now a redefinition for " | ||
==== Constraints ==== | ==== Constraints ==== | ||
+ | |||
+ | There are situations where this optimisation cannot be applied. Specifically, | ||
+ | |||
+ | The optimisation is not applied if the caller uses one of the following: | ||
+ | * Variadic arguments | ||
+ | * " | ||
+ | * Variable variables | ||
+ | * " | ||
+ | * Try - catch/ | ||
+ | |||
+ | Furthermore, | ||
+ | It is also not applied on arguments in the **caller**. This is because it could influence backtraces from exceptions or influence the result of " | ||
+ | |||
+ | <code php> | ||
+ | function boom(array $x) { | ||
+ | throw new Exception(" | ||
+ | } | ||
+ | |||
+ | function test(array $x) { | ||
+ | boom($x); | ||
+ | } | ||
+ | |||
+ | function main() { | ||
+ | test(range(1, | ||
+ | } | ||
+ | |||
+ | main(); | ||
+ | </ | ||
+ | |||
+ | If we were to allow the optimisation on **caller** arguments, this would give the following output: | ||
+ | < | ||
+ | PHP Fatal error: | ||
+ | Stack trace: | ||
+ | #0 test.php(8): | ||
+ | #1 test.php(12): | ||
+ | #2 test.php(15): | ||
+ | #3 {main} | ||
+ | thrown in test.php on line 4 | ||
+ | </ | ||
+ | |||
+ | Notice that the argument for " | ||
==== Examples ==== | ==== Examples ==== | ||
+ | |||
+ | Not only userland functions can benefit from this optimisation, | ||
+ | Some builtin functions already avoid a copy if the array argument has a reference count of 1. One example of a builtin function that already does this is " | ||
+ | |||
+ | <code php> | ||
+ | function array_merge_loop($n) { | ||
+ | $a = range(1, 3); | ||
+ | for ($i = 0; $i < $n; ++$i) { | ||
+ | $a = array_merge($a, | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | For 30000 iterations **without** the optimisation, | ||
+ | |||
+ | Other examples of builtin functions that are optimised for a reference count of 1 are " | ||
+ | |||
+ | ==== Alternatives ==== | ||
+ | |||
+ | There are two alternative solutions to the problem outlined in this proposal: references, or explicit moves. I will discuss each of them briefly and explain why they are not ergonomic solutions. | ||
+ | |||
+ | === References === | ||
+ | |||
+ | You might wonder why we cannot just use references instead to prevent copies. References are less ergonomic for developers because they are rarely used, and it can be surprising to developers that functions use references as an optimisation. Surprising behaviour can lead to bugs. Furthermore, | ||
+ | |||
+ | For ergonomics, take a look at the following code example: | ||
+ | |||
+ | <code php> | ||
+ | function my_function(array & | ||
+ | $my_array[] = 3; | ||
+ | var_dump($my_array); | ||
+ | } | ||
+ | |||
+ | function example_ref() { | ||
+ | $array = [0, 1, 2]; | ||
+ | my_function($array); | ||
+ | } | ||
+ | |||
+ | function example_copy() { | ||
+ | $array = [0, 1, 2]; | ||
+ | $array_copy = array_merge($array, | ||
+ | my_function($array_copy); | ||
+ | var_dump($array); | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | In the function " | ||
+ | |||
+ | <code php> | ||
+ | function my_function(array $my_array) { | ||
+ | $my_array[] = 3; | ||
+ | var_dump($my_array); | ||
+ | } | ||
+ | |||
+ | function example_ref() { | ||
+ | $array = [0, 1, 2]; | ||
+ | my_function($array); | ||
+ | } | ||
+ | |||
+ | function example_copy() { | ||
+ | $array = [0, 1, 2]; | ||
+ | my_function($array); | ||
+ | var_dump($array); | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | === Explicit Moves === | ||
+ | |||
+ | An alternative is creating a new keyword " | ||
+ | An optimisation that is applied automatically is more ergonomic and automatically benefits existing PHP code. | ||
==== Risks ==== | ==== Risks ==== | ||
+ | |||
+ | Every optimisation that gets added to PHP has the risk of introducing breakages. In particular, the type inference needs to be changed to reflect the move of a variable. Type inference is heavily used in the JIT engine, so there' | ||
+ | |||
+ | The SSA construction change does introduce a **compile-time** performance decrease for WordPress because the optimiser performs extra work. However, the optimisation improves the **run-time** performance of WordPress slightly, which is what actually matters. | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | What breaks, and what is the justification | + | |
+ | There' | ||
+ | |||
+ | <code php> | ||
+ | class Foo { | ||
+ | public function __destruct() { | ||
+ | var_dump(1); | ||
+ | } | ||
+ | } | ||
+ | |||
+ | function test(array $x) { | ||
+ | unset($x[0]); | ||
+ | } | ||
+ | |||
+ | function main() { | ||
+ | $x = [new Foo]; | ||
+ | test($x); | ||
+ | var_dump(2); | ||
+ | } | ||
+ | |||
+ | main(); | ||
+ | </ | ||
+ | |||
+ | Without this optimisation this will first output 2, and then 1. But **with** the optimisation this will output 1 first, and then 2. This is because with the optimisation the array " | ||
+ | It is hard to predict what the impact will be on real-world applications. Executing code with side-effects which needs to happen in a specific order is dangerous as-is, because even in current PHP no such guarantees are made if objects have cycles | ||
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 57: | Line 253: | ||
==== To Opcache ==== | ==== To Opcache ==== | ||
- | This proposal implements an extra optimisation in the DFA pass in the optimiser. There needed | + | This proposal implements an extra optimisation in the DFA pass in the optimiser. There need to be changes to SCCP, DFG, type inference, and SSA construction to account for the behaviour change in SEND_VAR. |
==== New Constants ==== | ==== New Constants ==== |
rfc/implicit_move_optimisation.1683996825.txt.gz · Last modified: 2023/05/13 16:53 by nielsdos