rfc:generators
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
Previous revisionLast revision | |||
— | rfc:generators [2013/04/06 11:27] – Note that cloning support was removed nikic | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Request for Comments: Generators ====== | ||
+ | * Date: 2012-06-05 | ||
+ | * Author: Nikita Popov < | ||
+ | * Status: Implemented | ||
+ | ===== Introduction ===== | ||
+ | |||
+ | Generators provide an easy, boilerplate-free way of implementing iterators. | ||
+ | |||
+ | As an example, consider how you would implement the '' | ||
+ | |||
+ | <code php> | ||
+ | function getLinesFromFile($fileName) { | ||
+ | if (!$fileHandle = fopen($fileName, | ||
+ | return; | ||
+ | } | ||
+ | | ||
+ | $lines = []; | ||
+ | while (false !== $line = fgets($fileHandle)) { | ||
+ | $lines[] = $line; | ||
+ | } | ||
+ | | ||
+ | fclose($fileHandle); | ||
+ | | ||
+ | return $lines; | ||
+ | } | ||
+ | |||
+ | $lines = getLinesFromFile($fileName); | ||
+ | foreach ($lines as $line) { | ||
+ | // do something with $line | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | The main disadvantage of this kind of code is evident: It will read the whole file into a large array. Depending | ||
+ | on how big the file is, this can easily hit the memory limit. This is not what you usually want. Instead you | ||
+ | want to get the lines one by one. This is what iterators are perfect for. | ||
+ | |||
+ | Sadly implementing iterators requires an insane amount of boilerplate code. E.g. consider this iterator variant | ||
+ | of the above function: | ||
+ | |||
+ | <code php> | ||
+ | class LineIterator implements Iterator { | ||
+ | protected $fileHandle; | ||
+ | | ||
+ | protected $line; | ||
+ | protected $i; | ||
+ | | ||
+ | public function __construct($fileName) { | ||
+ | if (!$this-> | ||
+ | throw new RuntimeException(' | ||
+ | } | ||
+ | } | ||
+ | | ||
+ | public function rewind() { | ||
+ | fseek($this-> | ||
+ | $this-> | ||
+ | $this->i = 0; | ||
+ | } | ||
+ | | ||
+ | public function valid() { | ||
+ | return false !== $this-> | ||
+ | } | ||
+ | | ||
+ | public function current() { | ||
+ | return $this-> | ||
+ | } | ||
+ | | ||
+ | public function key() { | ||
+ | return $this-> | ||
+ | } | ||
+ | | ||
+ | public function next() { | ||
+ | if (false !== $this-> | ||
+ | $this-> | ||
+ | $this-> | ||
+ | } | ||
+ | } | ||
+ | | ||
+ | public function __destruct() { | ||
+ | fclose($this-> | ||
+ | } | ||
+ | } | ||
+ | |||
+ | $lines = new LineIterator($fileName); | ||
+ | foreach ($lines as $line) { | ||
+ | // do something with $line | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | As you can see a very simple piece of code can easily become very complicated when turned into an | ||
+ | iterator. Generators solve this problem and allow you to implement iterators in a very straightforward | ||
+ | manner: | ||
+ | |||
+ | <code php> | ||
+ | function getLinesFromFile($fileName) { | ||
+ | if (!$fileHandle = fopen($fileName, | ||
+ | return; | ||
+ | } | ||
+ | | ||
+ | while (false !== $line = fgets($fileHandle)) { | ||
+ | yield $line; | ||
+ | } | ||
+ | | ||
+ | fclose($fileHandle); | ||
+ | } | ||
+ | |||
+ | $lines = getLinesFromFile($fileName); | ||
+ | foreach ($lines as $line) { | ||
+ | // do something with $line | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | The code looks very similar to the array-based implementation. The main difference is that instead of pushing | ||
+ | values into an array the values are '' | ||
+ | |||
+ | Generators work by passing control back and forth between the generator and the calling code: | ||
+ | |||
+ | When you first call the generator function ('' | ||
+ | but nothing of the code is actually executed. Instead the function directly returns a '' | ||
+ | '' | ||
+ | loop: | ||
+ | |||
+ | Whenever the '' | ||
+ | hits a '' | ||
+ | |||
+ | Generator methods, together with the '' | ||
+ | classes too: | ||
+ | |||
+ | <code php> | ||
+ | class Test implements IteratorAggregate { | ||
+ | protected $data; | ||
+ | | ||
+ | public function __construct(array $data) { | ||
+ | $this-> | ||
+ | } | ||
+ | | ||
+ | public function getIterator() { | ||
+ | foreach ($this-> | ||
+ | yield $key => $value; | ||
+ | } | ||
+ | // or whatever other traversation logic the class has | ||
+ | } | ||
+ | } | ||
+ | |||
+ | $test = new Test([' | ||
+ | foreach ($test as $k => $v) { | ||
+ | echo $k, ' => ', $v, " | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Generators can also be used the other way around, i.e. instead of producing values they can also consume them. When | ||
+ | used in this way they are often referred to as enhanced generators, reverse generators or coroutines. | ||
+ | |||
+ | Coroutines are a rather advanced concept, so it very hard to come up with not too contrived an short examples. | ||
+ | For an introduction see an example [[https:// | ||
+ | If you want to know more, I highly recommend checking out [[http:// | ||
+ | on this subject]]. | ||
+ | |||
+ | ===== Specification ===== | ||
+ | |||
+ | ==== Recognition of generator functions ==== | ||
+ | |||
+ | Any function which contains a '' | ||
+ | |||
+ | The initial implementation required that generator functions are marked with an asterix modifier ('' | ||
+ | the advantage that generators are more explicit and also allows for yield-less coroutines. | ||
+ | |||
+ | The automatic detection was chosen over the asterix modifier for the following reasons: | ||
+ | |||
+ | * There is an existing generator implementation in HipHop PHP, which uses automatic-detection. Using the asterix modifier would break compatibility. | ||
+ | * All existing generator implementations in other language (that I know of) also use automatic detection. This includes Python, JavaScript 1.7 and C#. The only exception to this is the generator support as defined by ECMAScript Harmony, but I know no browser that actually implements it in the defined way. | ||
+ | * The syntax for by-reference yielding looks very ugly: '' | ||
+ | * yield-less coroutines are a very narrow use case and are also possible with automatic-detection using a code like '' | ||
+ | |||
+ | ==== Basic behavior ==== | ||
+ | |||
+ | When a generator function is called the execution is suspended immediately after parameter binding and a '' | ||
+ | is returned. | ||
+ | |||
+ | The '' | ||
+ | |||
+ | <code php> | ||
+ | final class Generator implements Iterator { | ||
+ | void rewind(); | ||
+ | bool valid(); | ||
+ | mixed current(); | ||
+ | mixed key(); | ||
+ | void next(); | ||
+ | | ||
+ | mixed send(mixed $value); | ||
+ | mixed throw(Exception $exception); | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | If the generator is not yet at a '' | ||
+ | '' | ||
+ | hit. | ||
+ | |||
+ | Consider this example: | ||
+ | |||
+ | <code php> | ||
+ | function gen() { | ||
+ | echo ' | ||
+ | yield ' | ||
+ | echo ' | ||
+ | } | ||
+ | |||
+ | // Initial call does not output anything | ||
+ | $gen = gen(); | ||
+ | |||
+ | // Call to current() resumes the generator, thus " | ||
+ | // Then the yield expression is hit and the string " | ||
+ | // as the result of current() and then echo' | ||
+ | echo $gen-> | ||
+ | |||
+ | // Execution of the generator is resumed again, thus echoing " | ||
+ | $gen-> | ||
+ | </ | ||
+ | |||
+ | A nice side-effect of this behavior is that coroutines do not have to be primed with a '' | ||
+ | required in Python and also the reason why coroutines in Python usually use some kind of decorator that automatically primes the coroutine.) | ||
+ | |||
+ | Apart from the above the '' | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | ==== Yield syntax ==== | ||
+ | |||
+ | The newly introduced '' | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | The return value of the '' | ||
+ | |||
+ | To avoid ambiguities the first two '' | ||
+ | |||
+ | <code php> | ||
+ | // these three are statements, so they don't need parenthesis | ||
+ | yield $key => $value; | ||
+ | yield $value; | ||
+ | yield; | ||
+ | |||
+ | // these are expressions, | ||
+ | $data = (yield $key => $value); | ||
+ | $data = (yield $value); | ||
+ | |||
+ | // to avoid strange (yield) syntax the parenthesis are not required here | ||
+ | $data = yield; | ||
+ | </ | ||
+ | |||
+ | If '' | ||
+ | |||
+ | <code php> | ||
+ | call(yield $value); | ||
+ | // instead of | ||
+ | call((yield $value)); | ||
+ | |||
+ | if (yield $value) { ... } | ||
+ | // instead of | ||
+ | if ((yield $value)) { ... } | ||
+ | </ | ||
+ | |||
+ | The only exception is the '' | ||
+ | |||
+ | <code php> | ||
+ | array(yield $key => $value) | ||
+ | // can be either | ||
+ | array((yield $key) => $value) | ||
+ | // or | ||
+ | array((yield $key => $value)) | ||
+ | </ | ||
+ | |||
+ | Python also has parentheses requirements for expression-use of '' | ||
+ | |||
+ | See also the [[# | ||
+ | |||
+ | ==== Yielding keys ==== | ||
+ | |||
+ | The languages that currently implement generators don't have support for yielding keys (only values). This though is just a side-effect | ||
+ | as these languages don't support keys in iterators in general. | ||
+ | |||
+ | In PHP on the other hand keys are explicitly part of the iteration process and it thus does not make sense to not add | ||
+ | key-yielding support. The syntax could be analogous to that of '' | ||
+ | |||
+ | <code php> | ||
+ | yield $key => $value; | ||
+ | </ | ||
+ | |||
+ | Furthermore generators need to generate keys even if no key was explicitly yielded. In this case it seems reasonable to behave | ||
+ | the same as arrays do: Start with the key '' | ||
+ | current auto-key is explicitly yielded, then that will be used as the starting point for new auto-keys. All other yielded keys | ||
+ | do not affect the auto-key mechanism. | ||
+ | |||
+ | <code php> | ||
+ | function gen() { | ||
+ | yield ' | ||
+ | yield ' | ||
+ | yield ' | ||
+ | yield ' | ||
+ | yield 10 => ' | ||
+ | yield ' | ||
+ | } | ||
+ | |||
+ | foreach (gen() as $key => $value) { | ||
+ | echo $key, ' => ', $value, " | ||
+ | } | ||
+ | |||
+ | // outputs: | ||
+ | 0 => a | ||
+ | 1 => b | ||
+ | key => c | ||
+ | 2 => d | ||
+ | 10 => e | ||
+ | 11 => f | ||
+ | </ | ||
+ | | ||
+ | This is the same behavior that arrays have (i.e. if '' | ||
+ | be same). The only difference occurs when the generator yield non-integer, | ||
+ | the are not. | ||
+ | |||
+ | ==== Yield by reference ==== | ||
+ | |||
+ | Generators can also yield by values by reference. To do so the ''&'' | ||
+ | for return by reference. | ||
+ | |||
+ | This for example allows you to create classes with by-ref iteration behavior (which is something that is completely impossible with | ||
+ | normal iterators): | ||
+ | |||
+ | <code php> | ||
+ | class DataContainer implements IteratorAggregate { | ||
+ | protected $data; | ||
+ | | ||
+ | public function __construct(array $data) { | ||
+ | $this-> | ||
+ | } | ||
+ | | ||
+ | public function & | ||
+ | foreach ($this-> | ||
+ | yield $key => $value; | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | The class can then be iterated using by-ref '' | ||
+ | |||
+ | <code php> | ||
+ | $dataContainer = new DataContainer([1, | ||
+ | foreach ($dataContainer as & | ||
+ | $value *= -1; | ||
+ | } | ||
+ | |||
+ | // $this-> | ||
+ | </ | ||
+ | |||
+ | Only generators specifying the ''&'' | ||
+ | |||
+ | ==== Sending values ==== | ||
+ | |||
+ | Values can be sent into a generator using the '' | ||
+ | of the current '' | ||
+ | the return value of '' | ||
+ | |||
+ | Values are always sent by-value. The reference modifier ''&'' | ||
+ | |||
+ | A simple example of sending values: Two (interchangeable) logging implementations: | ||
+ | |||
+ | <code php> | ||
+ | function echoLogger() { | ||
+ | while (true) { | ||
+ | echo 'Log: ' . yield . " | ||
+ | } | ||
+ | } | ||
+ | |||
+ | function fileLogger($fileName) { | ||
+ | $fileHandle = fopen($fileName, | ||
+ | while (true) { | ||
+ | fwrite($fileHandle, | ||
+ | } | ||
+ | } | ||
+ | |||
+ | $logger = echoLogger(); | ||
+ | // or | ||
+ | $logger = fileLogger(__DIR__ . '/ | ||
+ | |||
+ | $logger-> | ||
+ | $logger-> | ||
+ | </ | ||
+ | |||
+ | ==== Throwing into the generator ==== | ||
+ | |||
+ | Exceptions can be thrown into the generator using the '' | ||
+ | context and then resume the generator. It is roughly equivalent to replacing the current '' | ||
+ | resuming then. If the generator is already closed the exception will be thrown in the callers context instead (which is equivalent to replacing | ||
+ | the '' | ||
+ | other exception is thrown). | ||
+ | |||
+ | An example of the functionality: | ||
+ | |||
+ | <code php> | ||
+ | function gen() { | ||
+ | echo " | ||
+ | try { | ||
+ | yield; | ||
+ | } catch (Exception $e) { | ||
+ | echo " | ||
+ | } | ||
+ | echo " | ||
+ | } | ||
+ | |||
+ | $gen = gen(); | ||
+ | $gen-> | ||
+ | $gen-> | ||
+ | // and " | ||
+ | </ | ||
+ | |||
+ | ==== Rewinding a generator ==== | ||
+ | |||
+ | Rewinding to some degree goes against the concept of generators, as they are mainly intended as one-time data sources that are not | ||
+ | supposed to be iterated another time. On the other hand, most generators probably *are* rewindable and it might make sense to allow | ||
+ | it. One could argue though that rewinding a generator is really bad practice (especially if the generator is doing some expensive | ||
+ | calculation). Allowing it to rewind would look like it is a cheap operation, just like with arrays. Also rewinding (as in jumping | ||
+ | back to the execution context state at the initial call to the generator) can lead to unexpected behavior, e.g. in the following case: | ||
+ | |||
+ | <code php> | ||
+ | function getSomeStuff(PDOStatement $stmt) { | ||
+ | foreach ($stmt as $row) { | ||
+ | yield doSomethingWith($row); | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | | ||
+ | Here rewinding would simply result in an empty iterator as the result set is already depleted. | ||
+ | |||
+ | For the above reasons generators will not support rewinding. The '' | ||
+ | |||
+ | <code php> | ||
+ | $gen = createSomeGenerator(); | ||
+ | |||
+ | // the rewind() call foreach is doing here is okay, because | ||
+ | // the generator is before the first yield | ||
+ | foreach ($gen as $val) { ... } | ||
+ | |||
+ | // the rewind() call of a second foreach loop on the other hand | ||
+ | // throws an exception | ||
+ | foreach ($gen as $val) { ... } | ||
+ | </ | ||
+ | |||
+ | So basically calling '' | ||
+ | |||
+ | ==== Cloning a generator ==== | ||
+ | |||
+ | Generators cannot be cloned. | ||
+ | |||
+ | Support for cloning was included in the initial version, but removed in PHP 5.5 Beta 3 due to implementational difficulties, | ||
+ | |||
+ | ==== Closing a generator ==== | ||
+ | |||
+ | When a generator is closed it frees the suspended execution context (as well as all other held variables). After it has been closed | ||
+ | '' | ||
+ | |||
+ | A generator can be closed in two ways: | ||
+ | |||
+ | * Reaching a '' | ||
+ | * Removing all references to the generator object. In this case the generator will be closed as part of the garbage collection process. | ||
+ | |||
+ | If the generator contains (relevant) '' | ||
+ | allowed to use '' | ||
+ | |||
+ | The following resources are destructed while closing a generator: | ||
+ | |||
+ | * The current execution context ('' | ||
+ | * Stack arguments for the generator call, and the additional execution context which is used to manage them. | ||
+ | * The currently active symbol table (or the compiled variables if no symbol table is in use). | ||
+ | * The current '' | ||
+ | * If the generator is closed during a method call, the object which the method is invoked on ('' | ||
+ | * If the generator is closed during a call, the arguments pushed to the stack. | ||
+ | * Any '' | ||
+ | * The current generator key and value | ||
+ | |||
+ | Currently it can happen that temporary variables are not cleaned up properly in edge-case situations. Exceptions are also subject to | ||
+ | this problem: https:// | ||
+ | |||
+ | ==== Error conditions ==== | ||
+ | |||
+ | This is a list of generators-related error conditions: | ||
+ | |||
+ | * Using '' | ||
+ | * Using '' | ||
+ | * Manual construction of '' | ||
+ | * Yielding a key that isn't an integer or a key: '' | ||
+ | * Trying to iterate a non-ref generator by-ref: '' | ||
+ | * Trying to traverse an already closed generator: '' | ||
+ | * Trying to rewind a generator after the first yield: '' | ||
+ | * Yielding a temp/const value by-ref: '' | ||
+ | * Yielding a string offset by-ref: '' | ||
+ | * Yielding a by-val function return value by-ref: '' | ||
+ | |||
+ | This list might not be exhaustive. | ||
+ | |||
+ | ===== Performance ===== | ||
+ | |||
+ | You can find a small micro benchmark at https:// | ||
+ | |||
+ | * Using generators ('' | ||
+ | * Using iterators ('' | ||
+ | * Using arrays implemented in userland ('' | ||
+ | * Using arrays implemented internally ('' | ||
+ | |||
+ | For large ranges generators are consistently faster; about four times faster than an iterator implementation and even 40% faster than the native '' | ||
+ | |||
+ | For small ranges (around one hundred elements) the variance of the results is rather high, but from multiple runs it seems that in this case generators are slightly slower | ||
+ | than the native implementation, | ||
+ | |||
+ | The tests were run on a Ubuntu VM, so I'm not exactly sure how representative they are. | ||
+ | |||
+ | ===== Some points from the discussion ===== | ||
+ | |||
+ | ==== Why not just use callback functions? ==== | ||
+ | |||
+ | A question that has come up a few times during discussion: Why not use callback functions, instead of generators? For example the above '' | ||
+ | be rewritten using a callback: | ||
+ | |||
+ | <code php> | ||
+ | function processLinesFromFile($fileName, | ||
+ | if (!$fileHandle = fopen($fileName, | ||
+ | return; | ||
+ | } | ||
+ | | ||
+ | while (false !== $line = fgets($fileHandle)) { | ||
+ | $callback($line); | ||
+ | } | ||
+ | | ||
+ | fclose($fileHandle); | ||
+ | } | ||
+ | |||
+ | processLinesFromFile($fileName, | ||
+ | // do something | ||
+ | }); | ||
+ | </ | ||
+ | |||
+ | This approach has two main disadvantages: | ||
+ | |||
+ | Firstly, callbacks integrate badly into the existing PHP coding paradigms. Having quadruply-nested closures is something very normal in languages like JavaScript, but | ||
+ | rather rare in PHP. Many things in PHP are based on iteration and generators can nicely integrate with this. | ||
+ | |||
+ | A concrete example, which was actually my initial motivation to write the generators patch: | ||
+ | |||
+ | <code php> | ||
+ | protected function getTests($directory, | ||
+ | $it = new RecursiveDirectoryIterator($directory); | ||
+ | $it = new RecursiveIteratorIterator($it, | ||
+ | $it = new RegexIterator($it, | ||
+ | |||
+ | $tests = array(); | ||
+ | foreach ($it as $file) { | ||
+ | // read file | ||
+ | $fileContents = file_get_contents($file); | ||
+ | |||
+ | // parse sections | ||
+ | $parts = array_map(' | ||
+ | |||
+ | // first part is the name | ||
+ | $name = array_shift($parts); | ||
+ | |||
+ | // multiple sections possible with always two forming a pair | ||
+ | foreach (array_chunk($parts, | ||
+ | $tests[] = array($name, | ||
+ | } | ||
+ | } | ||
+ | |||
+ | return $tests; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | This is a function which I use to provide test vectors to PHPUnit. I point it to a directory containing test files and then split up those test files into individual tests | ||
+ | + expected output. I can then use the result of the function to feed some test function via '' | ||
+ | |||
+ | The problem with the above implementation obviously is that I have to read all tests into memory at once (instead of one-by-one). | ||
+ | |||
+ | How can I solve this problem? By turning it into an iterator obviously! But if you look closer, this isn't actually that easy, because I'm adding new tests in a nested loop. | ||
+ | So I would have to implement some kind of complex push-back mechanism to solve the problem. And - getting back on topic - I can't use callbacks here either, because I need | ||
+ | a traversable for use with '' | ||
+ | is replace '' | ||
+ | |||
+ | The second, more general problem with callbacks is that it's very hard to manage state across calls. The classic example is a lexer + parser system. If you implement the | ||
+ | lexer using a callback (i.e. '' | ||
+ | to build some kind of state machine, which can quickly get really ugly, even for simple problems (just look at the hundreds of states that a typical LALR parser has). Again, | ||
+ | generators solve this problem elegantly, because they maintain state implicitly, in the execution state. | ||
+ | |||
+ | ==== Alternative yield syntax considerations ==== | ||
+ | |||
+ | Andrew proposed to use a function-like syntax for '' | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | The main advantage of this syntax is that it would avoid the strange parentheses requirements for the '' | ||
+ | |||
+ | One of the main issues with the pseudo-function syntax is that it makes the semantics of '' | ||
+ | syntax. Both are very similar in a function, so it is desirable to keep them similar in syntax too. | ||
+ | |||
+ | Generally PHP uses the '' | ||
+ | '' | ||
+ | wouldn' | ||
+ | |||
+ | As '' | ||
+ | '' | ||
+ | |||
+ | So the function-like '' | ||
+ | |||
+ | ===== Patch ===== | ||
+ | |||
+ | The current implementation can be found in this branch: https:// | ||
+ | |||
+ | I also created a PR so that the diff can be viewed more easily: https:// | ||
+ | |||
+ | ===== Vote ===== | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
+ | |||
+ | ===== Further resources ===== | ||
+ | |||
+ | Implementation in Python: | ||
+ | |||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | |||
+ | Implementation in JavaScript: | ||
+ | |||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | |||
+ | Implementation in C#: | ||
+ | |||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | |||
+ | Extensive introductions into the topic: | ||
+ | |||
+ | * [[http:// | ||
+ | * [[http:// |
rfc/generators.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1