rfc:comprehensions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
rfc:comprehensions [2019/03/10 21:22] – created crellrfc:comprehensions [2019/04/05 01:10] (current) – Revise explanation of generator choice. crell
Line 1: Line 1:
 ====== PHP RFC: Generator comprehensions ====== ====== PHP RFC: Generator comprehensions ======
  
-* Version: 0.1 +  * Version: 0.1 
-* Date: 2019-03-10 +  * Date: 2019-03-10 
-* Author: Larry Garfield, larry@garfieldtech.com +  * Author: Larry Garfield, larry@garfieldtech.com 
-* Status: Draft +  * Status: Draft 
-* First published at: http://wiki.php.net/rfc/comprehensions+  * First published at: http://wiki.php.net/rfc/comprehensions
  
 ===== Introduction ===== ===== Introduction =====
Line 31: Line 31:
 </code> </code>
  
-In both cases, '%%'$gen%%'' is now a generator that will produce double the odd values of $list.  However, the first case uses 38 characters (with spaces) vs 94 characters (with spaces), and is easily compacted onto a single line as opposed to 7.+In both cases, ''%%$gen%%'' is now a generator that will produce double the odd values of $list.  However, the first case uses 38 characters (with spaces) vs 94 characters (with spaces), and is easily compacted onto a single line as opposed to 7.
  
 ===== Proposal ===== ===== Proposal =====
Line 39: Line 39:
 The general form of a comprehension is: The general form of a comprehension is:
  
 +<code>
 '[' ('for' <iterable expression> 'as' $key '=>' $value ('if' <condition>)?)+ (yield <expression>)? ']' '[' ('for' <iterable expression> 'as' $key '=>' $value ('if' <condition>)?)+ (yield <expression>)? ']'
 +</code>
  
 That is, one or more for-if clauses in which the if statement is optional, optionally followed by a ''%%yield%%'' keyword and a single expression.  The entire expression is wrapped in square brackets. That is, one or more for-if clauses in which the if statement is optional, optionally followed by a ''%%yield%%'' keyword and a single expression.  The entire expression is wrapped in square brackets.
Line 75: Line 77:
 A comprehension is whitespace insensitive. It may be broken out to multiple lines if it aids readability with no semantic impact. A comprehension is whitespace insensitive. It may be broken out to multiple lines if it aids readability with no semantic impact.
  
-The following examples show a comprehension and the equivalent inline generator.  In each case the semantic behavior of '"%%$result%%'' is identical for both versions, but the comprehension syntax is shorter and easier to comprehend (pun intended).+The following examples show a comprehension and the equivalent inline generator.  In each case the semantic behavior of ''%%$result%%'' is identical for both versions, but the comprehension syntax is shorter and easier to comprehend (pun intended).
  
 <code php> <code php>
Line 123: Line 125:
 ]; ];
  
-// Whitespace is irrelevant, so breaking it out like this is totally fine if it aids readability.+// Whitespace is irrelevant, so breaking it  
 +// out like this is totally fine if it aids readability.
 $result = [for $table as $num => $row if $num %2 ==0  $result = [for $table as $num => $row if $num %2 ==0 
     for $row as $col => $value if $col >= 3     for $row as $col => $value if $col >= 3
Line 157: Line 160:
  
   - In context the for is unambiguously being used in a foreach-style way, thus there is no confusion.   - In context the for is unambiguously being used in a foreach-style way, thus there is no confusion.
-  - The '"%%for%%'' keyword is used by both Python and Javascript, the languages with the most similar existing syntax.  (See below.)+  - The ''%%for%%'' keyword is used by both Python and Javascript, the languages with the most similar existing syntax.  (See below.)
   - The point of comprehensions is a compact yet expressive syntax.  Given the above two points, using ''%%foreach%%'' would add nothing except four additional characters.   - The point of comprehensions is a compact yet expressive syntax.  Given the above two points, using ''%%foreach%%'' would add nothing except four additional characters.
  
Line 168: Line 171:
   - In most cases it doesn't matter either way. The result will be put into a foreach() loop and that will be the end of it.   - In most cases it doesn't matter either way. The result will be put into a foreach() loop and that will be the end of it.
   - Cases where it does matter are where the list is especially large, or especially expensive to generate and only selected values will be used.  In those cases a generator is superior as it minimizes the memory and CPU usage (respectively) needed to represent values.   - Cases where it does matter are where the list is especially large, or especially expensive to generate and only selected values will be used.  In those cases a generator is superior as it minimizes the memory and CPU usage (respectively) needed to represent values.
-  - If an actual array is desired, converting a generator to an array is a trivial call to ''%%iterator_to_array()%%'' Converting an array to an iterator, while technically easy, has no benefit aside from compatibility with other iterators.  Returning generatorthereforeoffers the most benefit with the fewest limitations.+  - If an actual array is desired, converting a generator to an array is a trivial call to ''%%iterator_to_array()%%'' Converting an array to an iterator, while technically easy, has no benefit aside from compatibility with other iterators. 
 +  - That is, greedy-list value can be composed out of a lazy-list value and a expansion operation.  Howevera lazy-list value cannot be composed from a greedy-list.  That means since both are valuable, the one that provides both via syntactic composition is the superior approach.
   - A compact syntax to produce a generator allows for some nifty functional programming techniques that until now have been verbose to implement for non-array iterators.   - A compact syntax to produce a generator allows for some nifty functional programming techniques that until now have been verbose to implement for non-array iterators.
  
Line 199: Line 203:
 The common default "is truth-y" use of ''%%array_filter()%%'' with no callback specified would be easily expressed as: The common default "is truth-y" use of ''%%array_filter()%%'' with no callback specified would be easily expressed as:
  
 +<code php>
 $result = [for $list as $x if $x]; $result = [for $list as $x if $x];
 +</code>
  
 ==== array_map() ==== ==== array_map() ====
Line 207: Line 213:
  
 $result = array_map(function ($x) { $result = array_map(function ($x) {
-  $x * 2;+  return $x * 2;
 }, $list); }, $list);
  
Line 225: Line 231:
 $list = array_combine(range('a', 'j'), range(1, 10)); $list = array_combine(range('a', 'j'), range(1, 10));
  
-// array_map() itself cannot produce an array with dynamically defined keys so is omitted.+// array_map() itself cannot produce an array  
 +//with dynamically defined keys so is omitted.
  
 $result = (function() use ($list) { $result = (function() use ($list) {
Line 239: Line 246:
  
 <code php> <code php>
-$list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];+$list = range(1, 10);
  
-// In practice you'd almost always just use a foreach() rather than this monstrosity, but I include it for completeness. +// In practice you'd almost always just use a  
-$result = array_filter(array_map(function ($x) { +// foreach() rather than this monstrosity,  
-  $x * 2; +// but I include it for completeness. 
-}, $list), function($x) { +$result = array_map(function($x) { 
 +  return $x * 2; 
 +}, array_filter(function() {
   return $x % 2;   return $x % 2;
-});+}, $list));
  
 $result = (function() use ($list) { $result = (function() use ($list) {
   foreach ($list as $x) {   foreach ($list as $x) {
-    If ($x % 2) {+    if ($x % 2) {
       yield $x * 2;       yield $x * 2;
     }     }
Line 276: Line 285:
 </code> </code>
  
-Because a generator implements Iterator, we can call '"%%current()%%'' on it to return the first/current item that would be produced.  The generator itself can be discarded with no further computation expense.+Because a generator implements Iterator, we can call ''%%current()%%'' on it to return the first/current item that would be produced.  The generator itself can be discarded with no further computation expense.
  
 ==== any() ==== ==== any() ====
Line 330: Line 339:
 Numerous languages include a comprehension syntax of some form (https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(list_comprehension)).  Numerous languages include a comprehension syntax of some form (https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(list_comprehension)). 
  
-Two of the languages PHP developers are most likely to also use, JavaScript and Python, feature a very similar syntax.  That is not surprising as Javascript (according to Wikipedia) borrowed its syntax from Python, modifying it to the typical syntax Javascript uses for for-each constructs.  This RFC proposes to do the samemodifying the syntax to fit PHP for-each conventions instead. +The syntax proposed here was initially based on Python'syntax, modified to be more easily handled by PHP's parser and follow more conventional PHP syntax ordering.
- +
-Note that in Python 2.x list comprehensions produce a complete list.  In Python 3.x they produce a generator that will, in turn, produce a complete list.  That change has been a source of incompatibility between Python 2.x and 3.x code.  This RFC proposes using generators exclusively for comprehensions.+
  
 If a more terse syntax that is still lexer-friendly can be proposed that may be adopted instead of the syntax proposed here. If a more terse syntax that is still lexer-friendly can be proposed that may be adopted instead of the syntax proposed here.
 +
 +Note that in Python 2.x list comprehensions produce a complete list.  In Python 3.x they produce a generator that will, in turn, produce a complete list.  That change has been a source of incompatibility between Python 2.x and 3.x code.  This RFC proposes using generators exclusively for comprehensions.
  
 ===== Comparison to other proposals ===== ===== Comparison to other proposals =====
  
-The "sort lambda" or "arrow function" RFC has also been discussed in the past.  While the authors of this RFC support both, they should not be viewed as competing but as complementary as they serve different purposes.  While arrow functions would improve the readability of the examples above over their current counterparts, they still would not offer as clean and readable a syntax for the cases where Comprehensions are well suited.  They also would not address the array-or-iterable question for ''%%array_map()%%'' and ''%%array_filter()%%'' Consider this example from above:+The "short lambda" or "arrow function" RFC has also been discussed in the past.  While the authors of this RFC support both, they should not be viewed as competing but as complementary as they serve different purposes.  While arrow functions would improve the readability of the examples above over their current counterparts, they still would not offer as clean and readable a syntax for the cases where Comprehensions are well suited.  They also would not address the array-or-iterable question for ''%%array_map()%%'' and ''%%array_filter()%%'' Consider this example from above:
  
 <code php> <code php>
-$result = array_filter(array_map(function ($x) { +$result = array_map(function($x) { 
-  $x * 2; +  return $x * 2; 
-}, $list), function($x) { +}, array_filter(function() {
   return $x % 2;   return $x % 2;
-});+}, $list));
  
 $result = [for $list as $x if $x % 2 yield $x * 2]; $result = [for $list as $x if $x % 2 yield $x * 2];
Line 351: Line 360:
  
 The arrow function equivalent would be: The arrow function equivalent would be:
 +Which, while unquestionably an improvement over the array_map/array_filter status quo, is still substantially more verbose and hard to read than the proposed Comprehension.
  
 <code php> <code php>
Line 359: Line 369:
 </code> </code>
  
-Which, while unquestionably an improvement over the array_map/array_filter status quo, is still substantially more verbose and hard to read than the proposed Comprehension.+Or potentially: 
 + 
 +<code php> 
 +$result = (fn() => foreach($list as $x) if ($x % 2) yield $x * 2)(); 
 +</code> 
 + 
 +Either is definitely an improvement over the array_map/array_filter status quo, but even the more compact version is longer and entails considerably more syntax salad than a dedicated comprehension syntax.
  
 That said, there are ample other cases where arrow functions would be useful so the adoption of this RFC should in no way be seen to detract from their benefit. That said, there are ample other cases where arrow functions would be useful so the adoption of this RFC should in no way be seen to detract from their benefit.
Line 384: Line 400:
 $gen = [for $array as $x : int]; $gen = [for $array as $x : int];
 foreach ($gen as $val) { foreach ($gen as $val) {
-  // A TypeError would be thrown on the 3rd value, as it's not an int.+  // A TypeError would be thrown on the 3rd value,  
 +  // as it's not an int.
 } }
 </code> </code>
Line 395: Line 412:
 $run = [for $products as $p yield save($p)]; $run = [for $products as $p yield save($p)];
  
-// iterator_to_array() will result in an array of return values fro save_entity(). Depending on the data set this could be quite large, and must be allocated even if not saved.+// iterator_to_array() will result in an array of return  
 +// values fro save_entity(). Depending on the data  
 +// set this could be quite large, and must be allocated  
 +// even if not saved.
 iterator_to_array($run); iterator_to_array($run);
  
-// An empty foreach() will simply discard the return values, but is rather clumsy.+// An empty foreach() will simply discard the return values,  
 +// but is rather clumsy.
 foreach ($run as $val); foreach ($run as $val);
 </code> </code>
  
 It would be preferable to introduce a new function or language construct that can take an arbitrary generator and "run it out", discarding the results.  Such an operator would be a "nice to have" but is not a requirement of this RFC. It would be preferable to introduce a new function or language construct that can take an arbitrary generator and "run it out", discarding the results.  Such an operator would be a "nice to have" but is not a requirement of this RFC.
 +
 +===== Implementation =====
 +
 +Sara Golemon has written a proof of concept that demonstrates an approximate implementation:
 +
 +https://github.com/php/php-src/compare/master...sgolemon:list.comp
 +
 +It is currently incomplete as it lacks auto-capture and requires an explicit ''%%use%%'' statement.  Collaborators wishing to finish the implementation and/or assist with a terser syntax are most welcome.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
Line 411: Line 440:
  
 PHP 7.4 PHP 7.4
- 
  
  
rfc/comprehensions.txt · Last modified: 2019/04/05 01:10 by crell