rfc:implicit_move_optimisation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:implicit_move_optimisation [2023/05/13 22:26] – better intro to the optimiser nielsdosrfc:implicit_move_optimisation [2023/05/14 16:24] (current) – spell out ETL nielsdos
Line 1: Line 1:
 ====== PHP RFC: Implicit Move Optimisation ====== ====== PHP RFC: Implicit Move Optimisation ======
-  * Version: 0.1.2+  * Version: 0.2.2
   * Date: 2023-05-13   * Date: 2023-05-13
   * Author: Niels Dossche (nielsdos), dossche.niels@gmail.com   * Author: Niels Dossche (nielsdos), dossche.niels@gmail.com
Line 28: Line 28:
 The code example shows an array being passed to "my_function", which modifies the array and returns the modified result. However, due to the copy-on-write (CoW) mechanism, the variable "$my_array" is copied within "my_function" because the array's reference count is greater than 1 and it is modified. In this example, the copy is unnecessary because the result always overwrites the original array. The purpose of this RFC is to propose an optimisation for avoiding such unnecessary copies. The code example shows an array being passed to "my_function", which modifies the array and returns the modified result. However, due to the copy-on-write (CoW) mechanism, the variable "$my_array" is copied within "my_function" because the array's reference count is greater than 1 and it is modified. In this example, the copy is unnecessary because the result always overwrites the original array. The purpose of this RFC is to propose an optimisation for avoiding such unnecessary copies.
  
-While this example may not highlight the significance of the copy cost, there are scenarios where it does matter, such as processing large data batches (e.g., ETL-like processes) or having more complex data manipulation functions. The reason for having a separate "main()" function in the example will become clear later in the proposal.+While this example may not highlight the significance of the copy cost, there are scenarios where it does matter, such as processing large data batches (e.g., Extract-Transform-Load processes) or having more complex data manipulation functions. The reason for having a separate "main()" function in the example will become clear later in the proposal.
  
 A similar case where this optimisation is beneficial is for variables which are passed to functions and then never used again the caller. A similar case where this optimisation is beneficial is for variables which are passed to functions and then never used again the caller.
Line 63: Line 63:
 </code> </code>
  
-Let's break it down. A variable that's actually in the code, like "$array" and "$a" is called a Compiled Variable (CV). CV0 refers to "$array" and CV1 regers to "$a". As an SSA variable may only be assigned once, but there are multiple assignments to "$a", "$a" is numbered in the SSA code. We have: "#0.CV0($array)""#2.CV0($array)" for "$array"; and "#1.CV1($a)", "#3.CV1($a)", "#6.CV1($a)" for "$a". Every time "$a" is modified, the counter after the pound symbol will increase and from that point on the SSA variable to use for "$a" is the one with the incremented counter. This is called a redefinition of "$a". In particular, the PHP code `$a = array_merge($a, [4]);results in using the SSA variable "#3.CV1($a)" and the assignment results in a new SSA variable "#6.CV1($a)".+Let's break it down. 
 +A variable that's actually in the code, like "$array" and "$a" is called a Compiled Variable (CV). CV0 refers to "$array" and CV1 refers to "$a". As an SSA variable may only be assigned once, but there are multiple assignments to "$a", "$a" is numbered in the SSA code. We have: "#0.CV0($array)" and "#2.CV0($array)" for "$array"; and "#1.CV1($a)", "#3.CV1($a)", and "#6.CV1($a)" for "$a". 
 +Every time "$a" is modified, the SSA variable needs to be "redefined". The counter after the pound symbol will increase and from that point on the SSA variable we use for "$a" is the one with the incremented counter. In particular, the PHP code "$a = array_merge($a, [4]);results in using the SSA variable "#3.CV1($a)" and the assignment results in a new SSA variable "#6.CV1($a)".
  
 === Technical Details === === Technical Details ===
  
-This optimisation, known as "implicit move," will be added to the existing optimisation pipeline as an SSA-based optimization. The term "implicit" indicates that this optimization occurs automatically: it eliminates the need for PHP developers to take any specific action. The “move” refers to the fact that the lifetime and data is moved into the called function. The optimisation is only applied to local variables, because properties and array dimensions are not represented as SSA variables.+This optimisation, known as "implicit move," will be added to the existing optimisation pipeline as an SSA-based optimization. The term "implicit" indicates that this optimisation occurs automatically: it eliminates the need for PHP developers to take any specific action. The “move” refers to the fact that the lifetime and data is moved into the called function. The optimisation is only applied to local variables, because properties and array dimensions are not represented as SSA variables.
  
-The implementation pull request introduces this optimisation by setting a special flag on the oplines for SEND_VAR and SEND_VAR_EX, indicating the possibility of an implicit move. A specialized VM handler is implemented to handle cases where this flag is set. It passes the array/string as an argument without creating a copy and it unsets the original variable. As a result, the reference count of the array/string remains unchanged. If the data's reference count is 1, it remains as such when the callee executes. Therefore, the need for a copy (even when the data is modified) is avoided.+The implementation pull request introduces this optimisation by setting a special flag on the opcodes for SEND_VAR and SEND_VAR_EX, indicating the possibility of an implicit move. A specialized VM handler is implemented to handle cases where this flag is set. It passes the array/string as an argument without creating a copy and it unsets the original variable. As a result, the reference count of the array/string remains unchanged. If the data's reference count is 1, it remains as such when the callee executes. Therefore, the need for a copy (even when the data is modified) is avoided.
  
-When do we set that flag? The SSA optimisation pipeline already detects when an SSA variable is never used after its definition. We use this information, along with alias information, to determine when to set the flag. Furthermore, the flag can only be set if the variable may be an array or may be a string.+When do we set that flag? The SSA optimisation pipeline already detects when an SSA variable is never used after its definition. This happens using the NOVAL flag. We use this information, along with alias information, to determine when to set the flag. Furthermore, the flag can only be set if the variable may be an array or may be a string
 + 
 +Because the optimisation unsets the PHP variable, we need a new SSA definition for the affected PHP variable. The SSA code from above will look like this with this optimisation: 
 + 
 +<code> 
 +main: 
 +     ; #0.CV0($array) NOVAL 
 +     ; #1.CV1($a) NOVAL 
 +0000 #2.CV0($array) = RECV 1 
 +0001 #3.CV1($a) = QM_ASSIGN #2.CV0($array) 
 +0002 INIT_FCALL 2 112 string("array_merge"
 +0003 SEND_VAR #3.CV1($a) -> #4.CV1($a) NOVAL 1 
 +0004 SEND_VAL array(...) 2 
 +0005 #5.V3 = DO_ICALL 
 +0006 ASSIGN #4.CV1($a) NOVAL -> #6.CV1($a) NOVAL #5.V3 
 +0007 RETURN null 
 +</code> 
 + 
 +As we can see above, there is now a redefinition for "$a" ("#4.CV1($a)") in the SEND_VAR opcode. The optimisation pass sees that "#4.CV1($a)" has the NOVAL flag, which indicates its result is never used again. It also knows that "$a" may be an array, and "$a" is not an argument of "main". Therefore, the special flag is set on the SEND_VAR opcode.
  
 ==== Constraints ==== ==== Constraints ====
Line 136: Line 156:
 ==== Alternatives ==== ==== Alternatives ====
  
-There are two alternative solutions to the problem outlined in this proposal: references, or explicit moves. I will discuss each of them briefly and explain why I think they are not ergonomic solutions.+There are two alternative solutions to the problem outlined in this proposal: references, or explicit moves. I will discuss each of them briefly and explain why they are not ergonomic solutions.
  
 === References === === References ===
  
-You might wonder why we cannot just use references instead to prevent copies. It is my opinion that references are less ergonomic for developers. Furthermore, builtin functions such as "array_merge", "array_replace", etc., don't take a reference so those functions wouldn't be able to avoid a copy.+You might wonder why we cannot just use references instead to prevent copies. References are less ergonomic for developers because they are rarely used, and it can be surprising to developers that functions use references as an optimisation. Surprising behaviour can lead to bugs. Furthermore, builtin functions such as "array_merge", "array_replace", etc., don't take a reference so those functions wouldn't be able to avoid a copy.
  
 For ergonomics, take a look at the following code example: For ergonomics, take a look at the following code example:
Line 152: Line 172:
 function example_ref() { function example_ref() {
     $array = [0, 1, 2];     $array = [0, 1, 2];
-    my_function($array); // No copy, want to change the array+    my_function($array); // No copy, we want to change the array
 } }
  
Line 163: Line 183:
 </code> </code>
  
-In the function "example_ref", want to modify "$array" in-place, while in "example_copy" want the original array to stay the same. With references have to explicitly copy the array if want to keep the original. If we instead have an optimisation that can detect when it can modify in-place, then we can avoid references and the code becomes simpler (while still avoiding a copy):+In the function "example_ref", we want to modify "$array" in-place, while in "example_copy" we want the original array to stay the same. With references We have to explicitly copy the array if we want to keep the original. If we instead have an optimisation that can detect when it can modify in-place, then we can avoid references and the code becomes simpler (while still avoiding a copy):
  
 <code php> <code php>
Line 173: Line 193:
 function example_ref() { function example_ref() {
     $array = [0, 1, 2];     $array = [0, 1, 2];
-    my_function($array); // No copy, want to change the array+    my_function($array); // No copy, we want to change the array
 } }
  
Line 186: Line 206:
  
 An alternative is creating a new keyword "move", which the programmer can place manually in their code to perform the move optimisation manually. Ilija has prototyped this in the past, independently from this proposal. An alternative is creating a new keyword "move", which the programmer can place manually in their code to perform the move optimisation manually. Ilija has prototyped this in the past, independently from this proposal.
-It is my opinion that having an optimisation applied automatically is more ergonomic and automatically benefits existing PHP code.+An optimisation that is applied automatically is more ergonomic and automatically benefits existing PHP code.
  
 ==== Risks ==== ==== Risks ====
  
-Every optimisation that gets added to PHP has the risk of introducing breakages. In particular, the type inference needs to be changed to reflect the move of a variable. Type inference is heavily used in the JIT engine, so there's a concern that this optimisation might cause bugs in the JIT. I haven't seen such bugs yet, and I would be surprised if there are because the type inference only differs if the SSA variable is never used again.+Every optimisation that gets added to PHP has the risk of introducing breakages. In particular, the type inference needs to be changed to reflect the move of a variable. Type inference is heavily used in the JIT engine, so there's a concern that this optimisation might cause bugs in the JIT. The type inference only affects SSA variables that are not used again, so it should not have an impact on the JIT.
  
 The SSA construction change does introduce a **compile-time** performance decrease for WordPress because the optimiser performs extra work. However, the optimisation improves the **run-time** performance of WordPress slightly, which is what actually matters. The SSA construction change does introduce a **compile-time** performance decrease for WordPress because the optimiser performs extra work. However, the optimisation improves the **run-time** performance of WordPress slightly, which is what actually matters.
Line 219: Line 239:
  
 Without this optimisation this will first output 2, and then 1. But **with** the optimisation this will output 1 first, and then 2. This is because with the optimisation the array "$x" is destroyed after the call to "test" finishes as its lifetime was transferred to "test". Without this optimisation this will first output 2, and then 1. But **with** the optimisation this will output 1 first, and then 2. This is because with the optimisation the array "$x" is destroyed after the call to "test" finishes as its lifetime was transferred to "test".
-It is hard to predict what the impact will be on real-world applications. In my opinion executing code with side-effects which needs to happen in a specific order is dangerous as-is, because even in current PHP no such guarantees are made if objects have cycles for example.+It is hard to predict what the impact will be on real-world applications. Executing code with side-effects which needs to happen in a specific order is dangerous as-is, because even in current PHP no such guarantees are made if objects have cycles for example.
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
rfc/implicit_move_optimisation.1684016782.txt.gz · Last modified: 2023/05/13 22:26 by nielsdos