rfc:cachediterable
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:cachediterable [2021/02/06 20:39] – created tandre | rfc:cachediterable [2021/06/15 13:32] – tandre | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: CachedIterable | + | ====== PHP RFC: ImmutableIterable |
- | * Version: 0.1 | + | * Version: 0.4 |
* Date: 2021-02-06 | * Date: 2021-02-06 | ||
* Author: Tyson Andre, tandre@php.net | * Author: Tyson Andre, tandre@php.net | ||
- | * Status: | + | * Status: |
* Implementation: | * Implementation: | ||
* First Published at: https:// | * First Published at: https:// | ||
Line 11: | Line 11: | ||
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: | Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: | ||
- | - Creating a rewindable copy of a non-rewindable Traversable (e.g. Generator) before passing that copy to a function that consumes an iterable/ | + | - Creating a rewindable copy of a non-rewindable Traversable (e.g. a '' |
- | - Generating an '' | + | - Generating an '' |
- | | + | |
- Providing internal or userland helpers such as '' | - Providing internal or userland helpers such as '' | ||
- | - Providing | + | - Providing |
+ | |||
+ | Having this implemented as an internal class would also allow it to be [[# | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | Add a class '' | + | Add a class '' |
<code php> | <code php> | ||
- | final class CachedIterable | + | final class ImmutableIterable |
+ | | ||
+ | | ||
+ | JsonSerializable | ||
{ | { | ||
public function __construct(iterable $iterator) {} | public function __construct(iterable $iterator) {} | ||
public function getIterator(): | public function getIterator(): | ||
public function count(): int {} | public function count(): int {} | ||
- | public static function fromPairs(array $pairs): | + | |
+ | | ||
+ | // [[$key1, $value1], [$key2, $value2]] | ||
+ | public function toPairs(): array{} | ||
public function __serialize(): | public function __serialize(): | ||
public function __unserialize(array $data): void {} | public function __unserialize(array $data): void {} | ||
+ | public static function __set_state(array $array): ImmutableIterable {} | ||
// useful for converting iterables back to arrays for further processing | // useful for converting iterables back to arrays for further processing | ||
- | public function keys(): array {} | + | public function keys(): array {} // [$k1, $k2, ...] |
- | public function values(): array {} | + | public function values(): array {} // [$v1, $v2, ...] |
// useful to efficiently get offsets at the middle/end of a long iterable | // useful to efficiently get offsets at the middle/end of a long iterable | ||
public function keyAt(int $offset): mixed {} | public function keyAt(int $offset): mixed {} | ||
public function valueAt(int $offset): mixed {} | public function valueAt(int $offset): mixed {} | ||
+ | |||
+ | // ' | ||
+ | public function jsonSerialize(): | ||
// dynamic properties are forbidden | // dynamic properties are forbidden | ||
} | } | ||
</ | </ | ||
- | CachedIterables can easily be created from arrays for use cases where objects(Traversable) | + | ImmutableIterables |
<code php> | <code php> | ||
- | $x = new CachedIterable([0 => 100, ' | + | $x = new ImmutableIterable([0 => 100, ' |
foreach ($x as $key1 => $value1) { | foreach ($x as $key1 => $value1) { | ||
echo "$key1 $value1: | echo "$key1 $value1: | ||
Line 62: | Line 72: | ||
</ | </ | ||
- | ==== CachedIterables | + | ==== ImmutableIterables |
- | CachedIterables | + | ImmutableIterables |
+ | |||
+ | In comparison to php's '' | ||
+ | |||
+ | * Arrays can only store integers and strings | ||
+ | * Arrays coerce stringified integers to integers, potentially causing unexpected Errors/ | ||
+ | * Arrays cannot represent repeated keys | ||
<code php> | <code php> | ||
Line 71: | Line 87: | ||
yield from [' | yield from [' | ||
yield ' | yield ' | ||
- | yield ['an array' | + | yield ['an array' |
echo " | echo " | ||
} | } | ||
- | $x = new CachedIterable(my_generator()); | + | $x = new ImmutableIterable(my_generator()); |
foreach ($x as $k => $v) { | foreach ($x as $k => $v) { | ||
printf(" | printf(" | ||
Line 97: | Line 113: | ||
</ | </ | ||
- | ==== CachedIterables | + | ==== ImmutableIterables |
- | CachedIterable | + | ImmutableIterable |
- | Dynamic properties are forbidden on CachedIterables. | + | Dynamic properties are forbidden on ImmutableIterables. |
- | The keys and values of the CachedIterable | + | The keys and values of the ImmutableIterable |
+ | |||
+ | This makes it useful for returning to wrap the keys and values that would be returned by a generator or single-use '' | ||
+ | |||
+ | ==== ImmutableIterables can be created from pairs ==== | ||
+ | |||
+ | This can be done imperatively, | ||
+ | |||
+ | <code php> | ||
+ | $it = ImmutableIterable:: | ||
+ | foreach ($it as $key => $value) { | ||
+ | printf(" | ||
+ | } | ||
+ | /* | ||
+ | key=" | ||
+ | key={" | ||
+ | */ | ||
+ | var_dump($it); | ||
+ | /* | ||
+ | object(ImmutableIterable)# | ||
+ | [0]=> | ||
+ | array(2) { | ||
+ | [0]=> | ||
+ | string(5) " | ||
+ | [1]=> | ||
+ | string(1) " | ||
+ | } | ||
+ | } | ||
+ | */ | ||
+ | php > echo json_encode((array)$it), | ||
+ | [[" | ||
+ | </ | ||
+ | |||
+ | ImmutableIterables can also be converted back into pairs for further processing (e.g. using the wide array of helper methods php has for processing arrays): | ||
+ | |||
+ | <code php> | ||
+ | php > $reversedIt = ImmutableIterable:: | ||
+ | php > echo json_encode($reversedIt-> | ||
+ | [[{" | ||
+ | </ | ||
+ | |||
+ | ===== Benchmarks ===== | ||
+ | |||
+ | ==== ImmutableIterables are memory-efficient ==== | ||
+ | |||
+ | Similarly to how '' | ||
+ | |||
+ | <code php> | ||
+ | <?php | ||
+ | |||
+ | function show_array_memory(int $n) { | ||
+ | gc_collect_cycles(); | ||
+ | $before = memory_get_usage(); | ||
+ | $result = array_flip(range(10, | ||
+ | $after = memory_get_usage(); | ||
+ | printf(" | ||
+ | } | ||
+ | function show_cachediterable_memory(int $n) { | ||
+ | gc_collect_cycles(); | ||
+ | $before = memory_get_usage(); | ||
+ | // create a ImmutableIterable from an **associative** array of size $n | ||
+ | $result = new ImmutableIterable(array_flip(range(10, | ||
+ | $after = memory_get_usage(); | ||
+ | printf(" | ||
+ | } | ||
+ | foreach ([1, 8, 12, 16, 2**16] as $n) { | ||
+ | show_array_memory($n); | ||
+ | show_cachediterable_memory($n); | ||
+ | } | ||
+ | /* | ||
+ | array memory: | ||
+ | ImmutableIterable memory: (n= 1) 88 bytes | ||
+ | array memory: | ||
+ | ImmutableIterable memory: (n= 8) 312 bytes | ||
+ | array memory: | ||
+ | ImmutableIterable memory: (n= | ||
+ | array memory: | ||
+ | ImmutableIterable memory: (n= | ||
+ | array memory: | ||
+ | ImmutableIterable memory: (n=65536) 2097232 bytes | ||
+ | */ | ||
+ | </ | ||
+ | |||
+ | ==== ImmutableIterables are much more efficient than a polyfill object ==== | ||
+ | |||
+ | For a simple example, this uses much less time to construct. It is almost 6 times faster to iterate over and process results than a polyfill in that example, and uses half as much additional memory. | ||
+ | |||
+ | <code php> | ||
+ | <?php | ||
+ | /* | ||
+ | Time to construct PolyfillImmutableIterator: | ||
+ | Time to iterate: 0.183351, memory usage: 67117328 | ||
+ | result: | ||
+ | |||
+ | Time to construct | ||
+ | Time to iterate: 0.021905, memory usage: 32002128 | ||
+ | result: | ||
+ | */ | ||
+ | |||
+ | /** | ||
+ | * THIS IS AN INCOMPLETE POLYFILL THAT ONLY SUPPORTS ITERATION, AND DOES NOT INCLUDE ERROR HANDLING. | ||
+ | * | ||
+ | * Barely any of the functionality in the proposal is implemented. | ||
+ | * This is just here to compare a fast (in terms of time to iterate) userland polyfill | ||
+ | * against ImmutableIterable. | ||
+ | * | ||
+ | * Not an IteratorAggregate for simplicity. | ||
+ | */ | ||
+ | class PolyfillImmutableIterator implements Iterator { | ||
+ | public $i = 0; | ||
+ | public $count = 0; | ||
+ | public $keys; | ||
+ | public $values; | ||
+ | public function __construct(iterable $data) { | ||
+ | $keys = []; | ||
+ | $values = []; | ||
+ | foreach ($data as $key => $value) { | ||
+ | $keys[] = $key; | ||
+ | $values[] = $value; | ||
+ | } | ||
+ | $this-> | ||
+ | $this-> | ||
+ | $this-> | ||
+ | } | ||
+ | public function rewind() { $this->i = 0; } | ||
+ | public function valid(): bool { return $this->i < $this-> | ||
+ | public function key() { return $this-> | ||
+ | public function current() { return $this-> | ||
+ | public function next(): void { $this-> | ||
+ | } | ||
+ | |||
+ | function a_generator() { | ||
+ | for ($i = 0; $i < 1000; $i++) { | ||
+ | for ($j = 0; $j < 1000; $j++) { | ||
+ | yield $j => $i; | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | |||
+ | function benchmark(string $class) { | ||
+ | gc_collect_cycles(); | ||
+ | $memory_usage_1 = memory_get_usage(); | ||
+ | $t1 = microtime(true); | ||
+ | $it = new $class(a_generator()); | ||
+ | $t2 = microtime(true); | ||
+ | $total = 0; | ||
+ | foreach ($it as $k => $v) { | ||
+ | $total += $k + $v; | ||
+ | } | ||
+ | $t3 = microtime(true); | ||
+ | gc_collect_cycles(); | ||
+ | $memory_usage_2 = memory_get_usage(); | ||
+ | printf(" | ||
+ | $class, $t2 - $t1, $t3 - $t2, $memory_usage_2 - $memory_usage_1, | ||
+ | } | ||
+ | benchmark(PolyfillImmutableIterator:: | ||
+ | benchmark(ImmutableIterable:: | ||
+ | </ | ||
+ | ==== ImmutableIterables support constant-time access to keys and values ==== | ||
+ | |||
+ | '' | ||
+ | For example, it is possible to do binary search on keys (and/or values) without using any additional time or memory to create a copy of the keys. | ||
+ | (Same for values). | ||
+ | |||
+ | <code php> | ||
+ | <?php | ||
+ | /** | ||
+ | * @return int the offset of the first key in $it that is >= $target. | ||
+ | * Returns count($it) if all keys are smaller than $target. | ||
+ | */ | ||
+ | function do_binary_search_on_key(ImmutableIterable $it, int $target) { | ||
+ | $lowOffset = 0; | ||
+ | $highOffset = count($it) - 1; | ||
+ | while ($lowOffset <= $highOffset) { | ||
+ | $mid = $lowOffset + (($highOffset - $lowOffset) >> 1); | ||
+ | $key = $it-> | ||
+ | if ($key < $target) { | ||
+ | echo "at offset $mid: $key <= $target\n"; | ||
+ | $lowOffset = $mid + 1; | ||
+ | } else { | ||
+ | echo "at offset $mid: $key > $target\n"; | ||
+ | $highOffset = $mid - 1; | ||
+ | } | ||
+ | } | ||
+ | echo " | ||
+ | ": | ||
+ | return $lowOffset; | ||
+ | } | ||
+ | |||
+ | mt_srand(123); | ||
+ | $data = []; | ||
+ | $N = 1000; | ||
+ | for ($i = 0; $i < $N; $i++) { | ||
+ | $data[mt_rand()] = " | ||
+ | } | ||
+ | ksort($data); | ||
+ | $it = new ImmutableIterable($data); | ||
+ | |||
+ | do_binary_search_on_key($it, | ||
+ | /* | ||
+ | at offset 499: 1039143806 > 457052171 | ||
+ | at offset 249: 595271545 > 457052171 | ||
+ | at offset 124: 262516026 <= 457052171 | ||
+ | at offset 186: 438739745 <= 457052171 | ||
+ | at offset 217: 511637778 > 457052171 | ||
+ | at offset 201: 468958912 > 457052171 | ||
+ | at offset 193: 442664110 <= 457052171 | ||
+ | at offset 197: 455906707 <= 457052171 | ||
+ | at offset 199: 462794419 > 457052171 | ||
+ | at offset 198: 459587085 > 457052171 | ||
+ | offset 198 has the first key (459587085) >= 457052171 : associated value=value530 | ||
+ | */ | ||
+ | </ | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | None, except that the class name '' | + | None, except that the class name '' |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 113: | Line 341: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | * This will enable adding internal iterable functions such as '' | + | * This will enable adding internal iterable functions such as '' |
- | * More methods may be useful to add to CachingIterable, e.g. for returning a sorted copy, returning a slice(range of entries), returning a copy sorted by keys/ | + | * More methods may be useful to add to '' |
+ | * This may or may not be useful for future data types, e.g. a '' | ||
+ | * A new '' | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 121: | Line 351: | ||
===== References ===== | ===== References ===== | ||
- | [[https:// | + | * [[https:// |
+ | * [[rfc: | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
Line 131: | Line 362: | ||
I think '' | I think '' | ||
</ | </ | ||
+ | |||
+ | ==== Rejected: Lazy Evaluation ==== | ||
+ | |||
+ | '' | ||
+ | |||
+ | * Exceptions will be thrown during construction instead of during iteration or call to count()/ | ||
+ | * This is easier to understand, debug, serialize, and represent | ||
+ | * If the underlying iterable (e.g. a Generator) has side effects, having those side effects take place immediately instead of being interleaved with other parts of the program may be easier to reason about. | ||
+ | * The majority of use cases of '' | ||
+ | * Eagerly evaluating iterables reduces the memory needed by the implementation. The amount of memory needed to represent this is much lower (without the need to store the underlying iterable, potentially the most recent exception(s) thrown by the undlying iterable, etc). | ||
+ | |||
+ | The addition of an iterable library class that evaluates arguments on-demand is mentioned in the " | ||
+ | |||
+ | https:// | ||
+ | |||
+ | < | ||
+ | < | ||
+ | 2) Userland library/ | ||
+ | such as https:// | ||
+ | something that is easy to understand, debug, serialize or represent, etc. | ||
+ | I expect the inner iterable may be hidden entirely in a (lazy) CachedIterable from var_dump as an implementation detail. | ||
+ | |||
+ | 3) It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework | ||
+ | when a lazy (instead of eager) iterable is passed to some function that accepts a generic iterable, | ||
+ | and harder to write correct exception handling for it if done in a lazy generation style. | ||
+ | |||
+ | Many RFCs have been rejected due to being perceived as being likely to be misused in userland or | ||
+ | to make code harder to understand. | ||
+ | |||
+ | 4) It is possible to implement a lazy alternative to (ImmutableIterable) that only loads values as needed. | ||
+ | However, I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful | ||
+ | enough to be included in php rather than as a userland or PECL library. | ||
+ | </ | ||
+ | |||
+ | CachedIterable should load from the underlying | ||
+ | datastore lazily -- there is hardly any visible impact from the user | ||
+ | if this happens, because for the most part it looks and behaves the | ||
+ | same as it does today. The only visible changes are around loading | ||
+ | data from the underlying iterable. | ||
+ | |||
+ | For example, if the user calls the count method on the CachedIterable, | ||
+ | it would then load the remainder of the underlying data-store (and | ||
+ | then drop its reference to it). If the user asks for valueAt($n) and | ||
+ | it's beyond what's already loaded and we haven' | ||
+ | the underlying iterable, then it would load until $n is found or the | ||
+ | end of the store is reached. | ||
+ | |||
+ | I understand your concerns with map, filter, etc. CachedIterable | ||
+ | is different because it holds onto the data, can be iterated over more | ||
+ | than once, including the two nested loop cases, even if it loads data | ||
+ | from the underlying iterable on demand. | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | Thanks for explaining 4 months ago about my concern. | ||
+ | I think I understand the main real impact of an eager iterable cache vs a lazy iterable cache from a functional point of view: | ||
+ | |||
+ | * exceptions are thrown during construction vs during the first iteration | ||
+ | * predictable performance also on the first iteration. | ||
+ | |||
+ | How did you gather the information that eager implementation is more valuable than lazy one? I'm mostly curious also how to assess this as technically to me it also looks the other way around. Maybe mention that in the RFC. | ||
+ | I was even thinking that CachedIterable should be lazy and an EagerCachedIterable would be built upon that with more methods. Or have it in the same class with a constructor parameter. | ||
+ | </ | ||
+ | |||
+ | One of the reasons was size/ | ||
+ | point to the original iterable and the functions being applied to that iterable - so an application that creates lots of small/empty cached iterables would have a higher memory usage. | ||
+ | |||
+ | Having a data structure that tries to do everything would do other things poorly | ||
+ | (potentially not support serialization, | ||
+ | have unintuitive behaviors when attempting to var_export/ | ||
+ | surprisingly throw when being iterated over, etc) | ||
+ | </ | ||
+ | |||
+ | ==== Changelog ==== | ||
+ | |||
+ | * 0.2: Use optimized build with opcache enabled for benchmark timings | ||
+ | * 0.3: Rename from '' | ||
+ | * 0.3.1: Add '' | ||
+ | * 0.4.0: Rename from '' | ||
rfc/cachediterable.txt · Last modified: 2021/06/29 14:24 by tandre