rfc:cachediterable
Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
rfc:cachediterable [2021/02/11 03:00] tandre benchmarks |
rfc:cachediterable [2021/06/15 13:47] tandre |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: CachedIterable | + | ====== PHP RFC: ImmutableIterable |
- | * Version: 0.1 | + | * Version: 0.4 |
* Date: 2021-02-06 | * Date: 2021-02-06 | ||
* Author: Tyson Andre, tandre@php.net | * Author: Tyson Andre, tandre@php.net | ||
- | * Status: | + | * Status: |
* Implementation: | * Implementation: | ||
* First Published at: https:// | * First Published at: https:// | ||
Line 12: | Line 12: | ||
- Creating a rewindable copy of a non-rewindable Traversable (e.g. a '' | - Creating a rewindable copy of a non-rewindable Traversable (e.g. a '' | ||
- | - Generating an '' | + | - Generating an '' |
- Providing internal or userland helpers such as '' | - Providing internal or userland helpers such as '' | ||
- | - Providing | + | - Providing |
+ | |||
+ | Having this implemented as an internal class would also allow it to be [[# | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | Add a class '' | + | Add a class '' |
<code php> | <code php> | ||
- | final class CachedIterable | + | final class ImmutableIterable |
+ | | ||
+ | | ||
+ | | ||
{ | { | ||
public function __construct(iterable $iterator) {} | public function __construct(iterable $iterator) {} | ||
Line 27: | Line 32: | ||
public function count(): int {} | public function count(): int {} | ||
// [[$key1, $value1], [$key2, $value2]] | // [[$key1, $value1], [$key2, $value2]] | ||
- | public static function fromPairs(array $pairs): | + | public static function fromPairs(array $pairs): |
// [[$key1, $value1], [$key2, $value2]] | // [[$key1, $value1], [$key2, $value2]] | ||
- | public function toPairs(): array{} | + | public function toPairs(): array{} |
public function __serialize(): | public function __serialize(): | ||
public function __unserialize(array $data): void {} | public function __unserialize(array $data): void {} | ||
+ | public static function __set_state(array $array): ImmutableIterable {} | ||
// useful for converting iterables back to arrays for further processing | // useful for converting iterables back to arrays for further processing | ||
Line 39: | Line 45: | ||
public function keyAt(int $offset): mixed {} | public function keyAt(int $offset): mixed {} | ||
public function valueAt(int $offset): mixed {} | public function valueAt(int $offset): mixed {} | ||
- | | + | |
// ' | // ' | ||
public function jsonSerialize(): | public function jsonSerialize(): | ||
Line 46: | Line 52: | ||
</ | </ | ||
- | CachedIterables | + | ImmutableIterables |
<code php> | <code php> | ||
- | $x = new CachedIterable([0 => 100, ' | + | $x = new ImmutableIterable([0 => 100, ' |
foreach ($x as $key1 => $value1) { | foreach ($x as $key1 => $value1) { | ||
echo "$key1 $value1: | echo "$key1 $value1: | ||
Line 66: | Line 72: | ||
</ | </ | ||
- | ==== CachedIterables | + | ==== ImmutableIterables |
- | CachedIterables | + | ImmutableIterables |
+ | |||
+ | In comparison to php's '' | ||
+ | |||
+ | * Arrays can only store integers and strings | ||
+ | * Arrays coerce stringified integers to integers, potentially causing unexpected Errors/ | ||
+ | * Arrays cannot represent repeated keys | ||
<code php> | <code php> | ||
Line 79: | Line 91: | ||
} | } | ||
- | $x = new CachedIterable(my_generator()); | + | $x = new ImmutableIterable(my_generator()); |
foreach ($x as $k => $v) { | foreach ($x as $k => $v) { | ||
printf(" | printf(" | ||
Line 101: | Line 113: | ||
</ | </ | ||
- | ==== CachedIterables | + | ==== ImmutableIterables |
- | CachedIterable | + | ImmutableIterable |
- | Dynamic properties are forbidden on CachedIterables. | + | Dynamic properties are forbidden on ImmutableIterables. |
- | The keys and values of the CachedIterable | + | The keys and values of the ImmutableIterable |
- | ==== CachedIterables | + | This makes it useful for returning to wrap the keys and values that would be returned by a generator or single-use '' |
+ | |||
+ | ==== ImmutableIterables | ||
This can be done imperatively, | This can be done imperatively, | ||
<code php> | <code php> | ||
- | $it = CachedIterable:: | + | $it = ImmutableIterable:: |
foreach ($it as $key => $value) { | foreach ($it as $key => $value) { | ||
printf(" | printf(" | ||
Line 124: | Line 138: | ||
var_dump($it); | var_dump($it); | ||
/* | /* | ||
- | object(CachedIterable)#2 (1) { | + | object(ImmutableIterable)#2 (1) { |
[0]=> | [0]=> | ||
array(2) { | array(2) { | ||
Line 138: | Line 152: | ||
</ | </ | ||
- | CachedIterables | + | ImmutableIterables |
<code php> | <code php> | ||
- | php > $reversedIt = CachedIterable:: | + | php > $reversedIt = ImmutableIterable:: |
php > echo json_encode($reversedIt-> | php > echo json_encode($reversedIt-> | ||
[[{" | [[{" | ||
</ | </ | ||
- | ==== CachedIterables are memory-efficient | + | ===== Benchmarks ===== |
- | Similarly to how '' | + | ==== ImmutableIterables are memory-efficient ==== |
+ | |||
+ | Similarly to how '' | ||
<code php> | <code php> | ||
Line 156: | Line 172: | ||
gc_collect_cycles(); | gc_collect_cycles(); | ||
$before = memory_get_usage(); | $before = memory_get_usage(); | ||
- | $result = array_flip(range(10, | + | $result = array_flip(range(10, |
$after = memory_get_usage(); | $after = memory_get_usage(); | ||
- | printf(" | + | printf(" |
} | } | ||
function show_cachediterable_memory(int $n) { | function show_cachediterable_memory(int $n) { | ||
gc_collect_cycles(); | gc_collect_cycles(); | ||
$before = memory_get_usage(); | $before = memory_get_usage(); | ||
- | $result = new CachedIterable(array_flip(range(10, | + | |
+ | | ||
$after = memory_get_usage(); | $after = memory_get_usage(); | ||
- | printf(" | + | printf(" |
} | } | ||
foreach ([1, 8, 12, 16, 2**16] as $n) { | foreach ([1, 8, 12, 16, 2**16] as $n) { | ||
Line 172: | Line 189: | ||
} | } | ||
/* | /* | ||
- | array memory: | + | array memory: |
- | CachedIterable | + | ImmutableIterable |
- | array memory: | + | array memory: |
- | CachedIterable | + | ImmutableIterable |
- | array memory: | + | array memory: |
- | CachedIterable | + | ImmutableIterable |
- | array memory: | + | array memory: |
- | CachedIterable | + | ImmutableIterable |
- | array memory: | + | array memory: |
- | CachedIterable | + | ImmutableIterable |
*/ | */ | ||
</ | </ | ||
- | ==== CachedIterables | + | ==== ImmutableIterables |
- | For a simple example, this uses much less time to construct. It is also 6 times faster to iterate over and process results than a polyfill, and uses half as much additional memory. | + | For a simple example, this uses much less time to construct. It is almost |
<code php> | <code php> | ||
<?php | <?php | ||
/* | /* | ||
- | Time to construct | + | Time to construct |
- | Time to construct | + | Time to iterate: 0.183351, memory usage: |
+ | result: | ||
+ | |||
+ | Time to construct | ||
+ | Time to iterate: 0.021905, memory usage: | ||
+ | result: | ||
*/ | */ | ||
- | // Not an IteratorAggregate for simplicity | + | /** |
- | class PolyfillIterator | + | * THIS IS AN INCOMPLETE POLYFILL THAT ONLY SUPPORTS ITERATION, AND DOES NOT INCLUDE ERROR HANDLING. |
+ | * | ||
+ | * Barely any of the functionality in the proposal is implemented. | ||
+ | * This is just here to compare a fast (in terms of time to iterate) userland polyfill | ||
+ | * against ImmutableIterable. | ||
+ | * | ||
+ | | ||
+ | */ | ||
+ | class PolyfillImmutableIterator | ||
public $i = 0; | public $i = 0; | ||
public $count = 0; | public $count = 0; | ||
Line 241: | Line 271: | ||
gc_collect_cycles(); | gc_collect_cycles(); | ||
$memory_usage_2 = memory_get_usage(); | $memory_usage_2 = memory_get_usage(); | ||
- | printf(" | + | printf(" |
- | $class, $t2 - $t1, $t3 - $t2, $total, $memory_usage_2 - $memory_usage_1); | + | $class, $t2 - $t1, $t3 - $t2, $memory_usage_2 - $memory_usage_1, $total); |
} | } | ||
- | benchmark(PolyfillIterator::class); | + | benchmark(PolyfillImmutableIterator::class); |
- | benchmark(CachedIterable::class); | + | benchmark(ImmutableIterable::class); |
</ | </ | ||
- | ==== CachedIterables | + | ==== ImmutableIterables |
- | CachedIterables | + | '' |
For example, it is possible to do binary search on keys (and/or values) without using any additional time or memory to create a copy of the keys. | For example, it is possible to do binary search on keys (and/or values) without using any additional time or memory to create a copy of the keys. | ||
(Same for values). | (Same for values). | ||
Line 260: | Line 289: | ||
* Returns count($it) if all keys are smaller than $target. | * Returns count($it) if all keys are smaller than $target. | ||
*/ | */ | ||
- | function do_binary_search_on_key(CachedIterable | + | function do_binary_search_on_key(ImmutableIterable |
$lowOffset = 0; | $lowOffset = 0; | ||
$highOffset = count($it) - 1; | $highOffset = count($it) - 1; | ||
Line 274: | Line 303: | ||
} | } | ||
} | } | ||
- | echo " | + | echo " |
+ | " | ||
return $lowOffset; | return $lowOffset; | ||
} | } | ||
Line 285: | Line 315: | ||
} | } | ||
ksort($data); | ksort($data); | ||
- | $it = new CachedIterable($data); | + | $it = new ImmutableIterable($data); |
do_binary_search_on_key($it, | do_binary_search_on_key($it, | ||
Line 304: | Line 334: | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | None, except that the class name '' | + | None, except that the class name '' |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 311: | Line 341: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | * This will enable adding internal iterable functions such as '' | + | * This will enable adding internal iterable functions such as '' |
- | * More methods may be useful to add to CachingIterable, e.g. for returning a sorted copy, returning a slice(range of entries), returning a copy sorted by keys/ | + | * More methods may be useful to add to '' |
- | * This may or may not be useful for future data types, e.g. a MapObject (hash map on any key type) type and may potentially be useful for converting some existing internal/ | + | * This may or may not be useful for future data types, e.g. a '' |
+ | * A new '' | ||
- | ===== Proposed Voting Choices | + | ===== Vote ===== |
- | Yes/No, requiring a 2/3 majority. | + | This is a Yes/ |
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
+ | |||
+ | ==== Poll: Reason for voting against this RFC ==== | ||
+ | |||
+ | <doodle title=" | ||
+ | * Object to the namespace choice | ||
+ | * Object to the name | ||
+ | * Object to the implementation | ||
+ | * Don't see a use case | ||
+ | * Other | ||
+ | </ | ||
===== References ===== | ===== References ===== | ||
- | [[https:// | + | * [[https:// |
+ | * [[rfc: | ||
+ | * https:// | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
+ | |||
+ | ==== Rejected: Alternative namespaces ==== | ||
+ | |||
+ | [[rfc: | ||
+ | |||
==== Rejected: ArrayAccess ==== | ==== Rejected: ArrayAccess ==== | ||
Line 330: | Line 383: | ||
I think '' | I think '' | ||
</ | </ | ||
+ | |||
+ | ==== Rejected: Lazy Evaluation ==== | ||
+ | |||
+ | '' | ||
+ | |||
+ | * If this is generated from a data structure, the behavior may be unintuitive if the underlying data is modified while iterating over the sequence of keys and values if this were to be evaluated lazily instead of eagerly. | ||
+ | * Evaluating the entire iterable in the constructor ensures that exceptions will be thrown during construction instead of during iteration or call to count()/ | ||
+ | * This is easier to understand, debug, serialize, and represent | ||
+ | * If the underlying iterable (e.g. a Generator) has side effects, having those side effects take place immediately instead of being interleaved with other parts of the program may be easier to reason about. | ||
+ | * The majority of use cases of '' | ||
+ | * Eagerly evaluating iterables reduces the memory needed by the implementation. The amount of memory needed to represent this is much lower (without the need to store the underlying iterable, potentially the most recent exception(s) thrown by the undlying iterable, etc). | ||
+ | |||
+ | |||
+ | The addition of an iterable library class that evaluates arguments on-demand is mentioned in the " | ||
+ | |||
+ | https:// | ||
+ | |||
+ | < | ||
+ | < | ||
+ | 2) Userland library/ | ||
+ | such as https:// | ||
+ | something that is easy to understand, debug, serialize or represent, etc. | ||
+ | I expect the inner iterable may be hidden entirely in a (lazy) CachedIterable from var_dump as an implementation detail. | ||
+ | |||
+ | 3) It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework | ||
+ | when a lazy (instead of eager) iterable is passed to some function that accepts a generic iterable, | ||
+ | and harder to write correct exception handling for it if done in a lazy generation style. | ||
+ | |||
+ | Many RFCs have been rejected due to being perceived as being likely to be misused in userland or | ||
+ | to make code harder to understand. | ||
+ | |||
+ | 4) It is possible to implement a lazy alternative to (ImmutableIterable) that only loads values as needed. | ||
+ | However, I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful | ||
+ | enough to be included in php rather than as a userland or PECL library. | ||
+ | </ | ||
+ | |||
+ | CachedIterable should load from the underlying | ||
+ | datastore lazily -- there is hardly any visible impact from the user | ||
+ | if this happens, because for the most part it looks and behaves the | ||
+ | same as it does today. The only visible changes are around loading | ||
+ | data from the underlying iterable. | ||
+ | |||
+ | For example, if the user calls the count method on the CachedIterable, | ||
+ | it would then load the remainder of the underlying data-store (and | ||
+ | then drop its reference to it). If the user asks for valueAt($n) and | ||
+ | it's beyond what's already loaded and we haven' | ||
+ | the underlying iterable, then it would load until $n is found or the | ||
+ | end of the store is reached. | ||
+ | |||
+ | I understand your concerns with map, filter, etc. CachedIterable | ||
+ | is different because it holds onto the data, can be iterated over more | ||
+ | than once, including the two nested loop cases, even if it loads data | ||
+ | from the underlying iterable on demand. | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | Thanks for explaining 4 months ago about my concern. | ||
+ | I think I understand the main real impact of an eager iterable cache vs a lazy iterable cache from a functional point of view: | ||
+ | |||
+ | * exceptions are thrown during construction vs during the first iteration | ||
+ | * predictable performance also on the first iteration. | ||
+ | |||
+ | How did you gather the information that eager implementation is more valuable than lazy one? I'm mostly curious also how to assess this as technically to me it also looks the other way around. Maybe mention that in the RFC. | ||
+ | I was even thinking that CachedIterable should be lazy and an EagerCachedIterable would be built upon that with more methods. Or have it in the same class with a constructor parameter. | ||
+ | </ | ||
+ | |||
+ | One of the reasons was size/ | ||
+ | point to the original iterable and the functions being applied to that iterable - so an application that creates lots of small/empty cached iterables would have a higher memory usage. | ||
+ | |||
+ | Having a data structure that tries to do everything would do other things poorly | ||
+ | (potentially not support serialization, | ||
+ | have unintuitive behaviors when attempting to var_export/ | ||
+ | surprisingly throw when being iterated over, etc) | ||
+ | </ | ||
+ | |||
+ | ==== Changelog ==== | ||
+ | |||
+ | * 0.2: Use optimized build with opcache enabled for benchmark timings | ||
+ | * 0.3: Rename from '' | ||
+ | * 0.3.1: Add '' | ||
+ | * 0.4.0: Rename from '' | ||
rfc/cachediterable.txt · Last modified: 2021/06/29 14:24 by tandre