rfc:structs-v2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:structs-v2 [2024/04/23 15:50] – Reflection ilutovrfc:structs-v2 [2024/04/24 20:33] (current) ilutov
Line 33: Line 33:
 ===== The problem ===== ===== The problem =====
  
-Classes are commonly used to model data in PHP. Such classes have many names (data transfer objects, plain old php objects, records, etc.). This allows the developer to describe the shape of the data, thus documenting it and improving developer experience in IDEs.+Classes are commonly used to model data in PHP. Such classes have many names (data transfer objects, plain old php objects, records, etc.). This allows the developer to describe the shape of the data, thus documenting it and improving developer experience in IDEs over arrays.
  
 Using classes for data comes with one significant downside: Objects are passed by reference, rather than by value. When dealing with mutable data, this makes it very easy to shoot yourself in the foot by exposing mutations to places that don't expect to see them. Using classes for data comes with one significant downside: Objects are passed by reference, rather than by value. When dealing with mutable data, this makes it very easy to shoot yourself in the foot by exposing mutations to places that don't expect to see them.
Line 57: Line 57:
 $shapes = createShapes(); $shapes = createShapes();
  
-function applyGravity() { +// Apply gravity
-    foreach ($shapes as $shape) { +
-        /* We're not physicists. :P */ +
-        $shape->position->y--; +
-    } +
-+
- +
-applyGravity($shape); +
 foreach ($shapes as $shape) { foreach ($shapes as $shape) {
 +    /* We're not physicists. :P */
 +    $shape->position->y--;
     var_dump($shape->position);     var_dump($shape->position);
 } }
Line 95: Line 89:
  
 //Conceptually//, ''$circle->position'' and ''$square->position'' are distinct objects at the end of this function. ''applyGravity()'' can no longer influence multiple references to ''position''. This completely avoids the "spooky action at a distance" problem. //Conceptually//, ''$circle->position'' and ''$square->position'' are distinct objects at the end of this function. ''applyGravity()'' can no longer influence multiple references to ''position''. This completely avoids the "spooky action at a distance" problem.
 +
 +At first glance, it doesn't seem like that would avoid useless copies. In reality, it works somewhat differently, but the details are not too important for now. It will be explained in more detailed in the CoW chapter.
  
 ====== Growable data structures ====== ====== Growable data structures ======
Line 161: Line 157:
 ===== CoW 🐄 ===== ===== CoW 🐄 =====
  
-But wait, this sounds familiar.+But wait:
  
 <blockquote> <blockquote>
 What's the solution? ''position'' needs to be copied, but where? We can either copy it in ''createShapes()'' so that each shape has its own distinct position ... Unfortunately, either can lead to useless copies. What's the solution? ''position'' needs to be copied, but where? We can either copy it in ''createShapes()'' so that each shape has its own distinct position ... Unfortunately, either can lead to useless copies.
 +
 +...
 +
 +Like arrays, strings and other value types, structs are //conceptually// copied when assigned to a variable, or when passed to a function.
  
 <cite>This RFC, minutes ago</cite> <cite>This RFC, minutes ago</cite>
 </blockquote> </blockquote>
  
-You may assume that structs come with the same slowdown as creating a copy for each assignment of an object. However, structs have a cool trick up their sleeves: Copy-on-write, or CoW for short. CoW is already used for both arrays and strings, so this is not a new concept to the PHP engine. PHP tracks the reference count for each allocation such as objects, arrays and strings. When value types are modified, PHP checks if the reference count is >1, and if so, it copies the element before performing a modification.+This solution doesn't sound like it would solve the presented problem. You may assume that structs come with the same slowdown as creating a copy for each assignment of an object. However, structs have a cool trick up their sleeves: Copy-on-write, or CoW for short. CoW is already used for both arrays and strings, so this is not a new concept to the PHP engine. PHP tracks the reference count for each allocation such as objects, arrays and strings. When value types are modified, PHP checks if the reference count is >1, and if so, it copies the element before performing a modification.
  
 <code php> <code php>
Line 276: Line 276:
  
 Only mutating methods can and must be called using the ''!()'' syntax. Calling mutating methods with ''()'', or non-mutating methods with ''!()'' results in a runtime error. Only mutating methods can and must be called using the ''!()'' syntax. Calling mutating methods with ''()'', or non-mutating methods with ''!()'' results in a runtime error.
 +
 +Similarly, classes trying to implement ''mutating'' methods will compile error.
  
 TOOD: Check if we can enforce ''mutating'' at compile-time, anytime ''$this'' is fetched with RW (assignments, calling of mutating methods, fetching references). TOOD: Check if we can enforce ''mutating'' at compile-time, anytime ''$this'' is fetched with RW (assignments, calling of mutating methods, fetching references).
Line 312: Line 314:
 } }
  
-$vector = new Vector([[1], [2], [3]]);e+$vector = new Vector([[1], [2], [3]]);
 $vector->values[0][0] *= 2; $vector->values[0][0] *= 2;
 </code> </code>
Line 337: Line 339:
 TODO: This is actually broken currently. TODO: This is actually broken currently.
  
-This modification is //not// considered mutating, because the object may change from some other place anyway. Structs behave closer to objects, so interior mutation is not allowed.+This modification is //not// considered mutating, because the object may change from some other place anyway. Structs behave closer to arrays, so interior mutation is not allowed.
  
 <code php> <code php>
Line 350: Line 352:
 <code php> <code php>
 $bigNum1 = new BigNum(1); $bigNum1 = new BigNum(1);
-$bigNum2 = new BigNum(1);+$bigNum2 = $bigNum1;
  
 $reflection = new ReflectionProperty(BigNum::class, 'value'); $reflection = new ReflectionProperty(BigNum::class, 'value');
-$reflection->setValue($bigNum, 2);+$reflection->setValue($bigNum2, 2); 
 + 
 +// Desired behavior 
 +var_dump($bigNum1, $bigNum2); // 1, 2
 </code> </code>
  
-To work properly, ''ReflectionProperty::setValue()'' would need to accept a reference for the ''$objectOrValue'' property. This change would break existing cases where ''$objectOrValue'' is a temporary value (e.g. the result of a function call). There's also the special ''@prefer-ref'' annotation that is only available for internal functions. If the value can be passed by reference, it is. Otherwise, it is passed by value. This solution works well, but breaks userland overrides of ''ReflectionProperty::setValue()'' with no possibility of mitigation, because ''@prefer-ref'' is not available in userland.+for this to work properly, ''ReflectionProperty::setValue()'' would need to accept a reference for the ''$objectOrValue'' property. That is because internal functions are assumed not to mutate struct objects when they are accepted by value, because the copy could not be written back to the original variable. Making ''$objectOrValue'' by-reference would break existing code where ''$objectOrValue'' is a temporary value (e.g. the result of a function call). There's also the special ''@prefer-ref'' annotation that is only available for internal functions. If the value can be passed by reference, it is. Otherwise, it is passed by value. This solution works well, but breaks userland overrides of ''ReflectionProperty::setValue()'' with no possibility of mitigation, because ''@prefer-ref'' is not available in userland.
  
 For this reason, I have opted to throw when passing a struct object to ''ReflectionProperty::setValue()'' for the time being. For this reason, I have opted to throw when passing a struct object to ''ReflectionProperty::setValue()'' for the time being.
Line 363: Line 368:
  
 Inheritance is currently not allowed for structs. Structs are mainly targeted at data modelling, which should prefer composition over inheritance. There are currently no known technical issues with inheritance for structs, but we may want to be cautious when introducing them, and carefully consider the plethora of subtle semantic nuances. Inheritance is currently not allowed for structs. Structs are mainly targeted at data modelling, which should prefer composition over inheritance. There are currently no known technical issues with inheritance for structs, but we may want to be cautious when introducing them, and carefully consider the plethora of subtle semantic nuances.
 +
 +Implementing interfaces is allowed, however. Interface methods may be ''mutating'', which will be enforced when implementing the interface method. However, they may obviously only be implemented by structs, but not classes.
 +
 +===== Hashing =====
 +
 +''SplObjectStorage'' allows using objects as keys. For structs, these semantics are not too useful, because the object id changes unpredictably. Instead, the lookup should be based on the objects property. However, as hashing is a complicated topic, this will be postponed to a separate RFC. For now, using struct objects is not allowed for  ''SplObjectStorage'' or ''WeakMap''.
 +
 +===== Move semantics =====
 +
 +There are still some cases where useless copies occur.
 +
 +<code php>
 +function doubled($bigNum) {
 +    $bigNum->value *= 2;
 +    return $bigNum;
 +}
 +
 +$bigNum = 1;
 +$bigNum = doubled($bigNum);
 +</code>
 +
 +In this case, copying ''$bigNum'' before passing it to ''doubled'' is actually unnecessary, as it is immediately overwritten anyway. The ownership of ''$bigNum'' could thus be "moved" to ''doubled()''. Knowing when exactly this is safe is tough, because it depends on whether ''doubled()'' can throw exceptions, and whether ''$bigNum'' is the sole reference to the struct object before the function call.
 +
 +One could implement such move semantics by hand.
 +
 +<code php>
 +function move(&$value) {
 +    $moved = $value;
 +    $value = null;
 +    return $moved;
 +}
 +
 +$bigNum = 1;
 +$bigNum = doubled(move($bigNum));
 +</code>
 +
 +Essentially, this code sets ''$bigNum'' to ''null'' before passing the value to ''doubled()'', making ''doubled()'' the sole owner of the value. However, if ''doubled()'' fails for one reason or another, the value of ''$bigNum'' is lost.
 +
 +There were some attempts to implement implicit move semantics, namely https://github.com/php/php-src/pull/11166. We may try to pursue this further.
  
 ===== Performance ===== ===== Performance =====
  
-TODO+Assignment to a property now needs to check whether the object is a struct object, and then clone it. This change was necessary in various code paths. In my benchmarks, this lead to a small slowdown of +0.07%, whether you use structs or not. The benchmark was performed on Symfony Demo, with Opcache.
  
 ===== Backwards incompatible changes ===== ===== Backwards incompatible changes =====
  
-TODO +''struct'' needs to become a keyword in this RFC. However, ''struct'' will only be considered a keyword when it is followed by another identifier, excluding ''extends'' and ''implements''. This is the same approach used for the [[https://wiki.php.net/rfc/enumerations#backward_incompatible_changes|enum RFC]], and thus completely avoided backwards incompatible changes.
- +
-===== Future scope =====+
  
-  - Hashing for ''SplObjectStorage''.+There are no other backwards incompatible changes.
  
 ===== Vote ===== ===== Vote =====
rfc/structs-v2.1713887405.txt.gz · Last modified: 2024/04/23 15:50 by ilutov