rfc:deque

PHP RFC: final class Collections\Deque

Introduction

While SplDoublyLinkedList and its subclass SplQueue/SplStack exist in the SPL, they have several drawbacks (that are addressed by this RFC to add a Deque class):

  1. SplDoublyLinkedList is internally represented by a doubly linked list, making it use roughly twice as much memory as the proposed Deque (in typical 64-bit php builds)
  2. SplDoublyLinkedList is an Iterator, not an IteratorAggregate, so different parts of the code can't independently iterate over the same SplDoublyLinkedList instance while iteration is in progress.
  3. push/pop/unshift/shift from SplDoublyLinkedList are slower due to needing to allocate or free the linked list nodes.
  4. Reading values in the middle of the SplDoublyLinkedList is proportional to the length of the list, due to needing to traverse the linked list nodes.
  5. foreach iteration behavior cannot be understood without knowing what most recently constructed the SplDoublyLinkedList instance or set the flags.

It would be useful to have an efficient Deque container in the standard library to provide an alternative without those drawbacks, as well as for the following reasons:

  1. To provide a better alternative to SplDoublyLinkedList, SplStack, and SplQueue for use cases that require stacks or queues.
  2. As a more efficient option than array and SplDoublyLinkedList as a queue or Deque, for workflows requiring shift or unshift. Repeatedly shifting/unshifting from the front of an array is slow, because each shift/unshift requires moving(shifting) every element of the array.
  3. To have a memory-efficient collection suitable for applications that run for long periods of time. Notably, PHP's array type will currently never release allocated capacity, and Deque has lower memory usage than SplDoublyLinkedList.

A Deque is more efficient than an array when used as a queue, more readable, and easier to use correctly. While it is possible to efficiently remove elements from the start of an array (in terms of insertion order), it is very inefficient to prepend elements to the start of a large array due to needing to either copy the array or move all elements in the internal array representation, and an array would use much more memory than a Deque when used that way (and be slower) (see benchmarks).

There are also several pitfalls to using an array as a queue for larger queue sizes, some of which are not obvious and were discovered while writing the benchmarks. (Having a better (double-ended) queue datastructure (Collections\Deque) than the SplDoublyLinkedList would save users from needing to write code with these pitfalls):

  1. array_key_first() takes time proportional to the number of elements unset from the start of an array, causing unset($array[array_key_first($array)]); to unexpectedly be extremely slow (linear to remove an element, and quadratic time to remove all elements) after unsetting many elements at the start of the queue. (when the array infrequently runs out of capacity, buckets are moved to the beginning of the array)
  2. reset() also takes time proportional to the number of elements unset from the start of an array, taking longer if many elements are removed from the beginning of the array.
  3. reset() or end() will convert a variable to a reference, adding some performance overhead and making opcache less efficient at optimizing uses of variables using references.
  4. More obviously, array_unshift and array_shift will take time proportional to the number of elements in the array (to reindex and move existing/remaining elements) (linear time), compared to amortized constant time for Collections\Deque.

Proposal

Add the class final class Collections\Deque to PHP. From the Wikipedia article for the Double-Ended Queue:

In computer science, a double-ended queue (abbreviated to deque, pronounced deck, like “cheque”) is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail).

(omitted sections, see article)

The dynamic array approach uses a variant of a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end. Three common implementations include:

  1. Storing deque contents in a circular buffer, and only resizing when the buffer becomes full. This decreases the frequency of resizings.
  2. (omitted, see article)

Similarly to deques in other languages (such as Python collections.deque, C++ std::deque, Rust std::collections::VecDeque, etc.) this is backed by a memory-efficient representation with good cache locality and provides constant amortized-time push/pop/popFront/pushFront operations. (this RFC's implementation is a circular buffer represented as a raw C array of values with a size, offset, and capacity)

This reclaims memory that is no longer used (when less than a quarter of the capacity is used), unlike array

Similarly to ArrayObject and SplDoublyLinkedList, the $x[] = $value shorthand for appending to the Deque is supported in addition to ArrayAccess functionality.

The proposed API is:

namespace Collections;
 
/**
 * A double-ended queue (Typically abbreviated as Deque, pronounced "deck", like "cheque")
 * represented internally as a circular buffer.
 *
 * This has much lower memory usage than SplDoublyLinkedList or its subclasses (SplStack, SplStack),
 * and operations are significantly faster than SplDoublyLinkedList.
 *
 * See https://en.wikipedia.org/wiki/Double-ended_queue
 *
 * This supports amortized constant time pushing and popping onto the start (i.e. front, first)
 * or back (i.e. end, last) of the Deque.
 */
final class Deque implements IteratorAggregate, Countable, JsonSerializable, ArrayAccess
{
    /** Construct the Deque from the values of the Traversable/array, ignoring keys */
    public function __construct(iterable $iterator = []) {}
    /**
     * Returns an iterator that accounts for calls to popFront/pushFront tracking the position of the start of the Deque.
     * Calls to popFront/pushFront will do the following:
     * - Increase/Decrease the value returned by the iterator's key()
     *   by the number of elements added/removed to/from the start of the Deque.
     *   (`$deque[$iteratorKey] === $iteratorValue` at the time the key and value are returned).
     * - Repeated calls to popFront will cause valid() to return false if the iterator's
     *   position ends up before the start of the Deque at the time iteration resumes.
     * - They will not cause the remaining values to be iterated over more than once or skipped.
     * insert()/offsetUnset() will have similar effects when inserting/removing elements before the iterator's position.
     */
    public function getIterator(): \InternalIterator {}
    /** Returns the number of elements in the Deque. */
    public function count(): int {}
    /** Returns true if there are 0 elements in the Deque. */
    public function isEmpty(): bool {}
    /** Removes all elements from the Deque. */
    public function clear(): void {}
 
    public function __serialize(): array {}
    public function __unserialize(array $data): void {}
    /** Construct the Deque from the values of the array, ignoring keys */
    public static function __set_state(array $array): Deque {}
 
    /** Appends value(s) to the end of the Deque. */
    public function push(mixed ...$values): void {}
    /** Prepends value(s) to the start of the Deque. */
    public function pushFront(mixed ...$values): void {}
    /**
     * Pops a value from the end of the Deque.
     * @throws \UnderflowException if the Deque is empty
     */
    public function pop(): mixed {}
    /**
     * Pops a value from the start of the Deque.
     * @throws \UnderflowException if the Deque is empty
     */
    public function popFront(): mixed {}
 
    /**
     * Peeks at the value at the start of the Deque.
     * @throws \UnderflowException if the Deque is empty
     */
    public function first(): mixed {}
    /**
     * Peeks at the value at the end of the Deque.
     * @throws \UnderflowException if the Deque is empty
     */
    public function last(): mixed {}
 
    /**
     * Returns a list of the elements from the start to the end.
     */
    public function toArray(): array {}
 
    // Must be mixed for compatibility with ArrayAccess
    /**
     * Insert 0 or more values at the given offset of the Deque.
     * @throws \OutOfBoundsException if the value of $offset is not within the bounds of this Deque.
     */
    public function insert(int $offset, mixed ...$values): void {}
    /**
     * Returns the value at offset (int)$offset (relative to the start of the Deque)
     * @throws \OutOfBoundsException if the value of (int)$offset is not within the bounds of this Deque
     */
    public function offsetGet(mixed $offset): mixed {}
    /**
     * Returns true if `0 <= (int)$offset && (int)$offset < $this->count().
     */
    public function offsetExists(mixed $offset): bool {}
    /**
     * Sets the value at offset $offset (relative to the start of the Deque) to $value
     * @throws \OutOfBoundsException if the value of (int)$offset is not within the bounds of this Deque
     */
    public function offsetSet(mixed $offset, mixed $value): void {}
    /**
     * Removes the value at (int)$offset from the deque.
     * @throws \OutOfBoundsException if the value of (int)$offset is not within the bounds of this Deque.
     */
    public function offsetUnset(mixed $offset): void {}
 
    /**
     * This is JSON serialized as a JSON array with elements from the start to the end.
     */
    public function jsonSerialize(): array {}
}

Iteration behavior

Collections\Deque is an implementation of the interface IteratorAggregate where getIterator returns an InternalIterator. Different parts of the code can independently iterate through instances of the same Deque (e.g. nested for loops).

  • For a Collections\Deque instance of size n, there are n+1 possible positions of an InternalIterator (at the positions 0..n-1, as well as being an invalid iterator at position n (the end of the Deque). After appending to the end of the deque (or inserting 1 or more values at position n), the iterator becomes valid again.
  • The value is the value found in the Deque at the time InternalIterator::current() is called. (or null if the iterator is invalid). (E.g. if you mutate the deque, you will see the new value when iterations such as foreach reach that value)
  • The key is the offset in the Deque at which the value would be found at the time InternalIterator::key() is called. (or null if the iterator is invalid)
  • popFront or pushFront changing the position of the start of the deque are accounted for - offsets in the deque won't be skipped or processed multiple times.
  • InternalIterator::valid() returns true if the InternalIterator's offset (computed from relative position to the start of the Deque) is within the Deque.
  • InternalIterator::next() advances the InternalIterator's position unless it is already at the end of the Deque.

The Deque WebAssembly demo can be used to preview the behavior of iteration and other functions

<?php
$deque = new Collections\Deque(['first', 'second', 'third']);
foreach ($deque as $offset => $value) {
    printf("value: %6s deque[offset=%s]=%s\n", $value, $offset, $deque[$offset]);
    if ($value === 'second') {
        printf("=> Removing from front while iterating: removed %s\n", $deque->popFront());
    } elseif ($value === 'third') {
        printf("=> Push while iterating\n");
        $deque->push('fourth');
    }
}
echo "\nAfter iteration\n";
var_dump($deque);
/*
value:  first deque[offset=0]=first
value: second deque[offset=1]=second
=> Removing from front while iterating: removed first
value:  third deque[offset=1]=third
=> Push while iterating
value: fourth deque[offset=2]=fourth
 
After iteration
object(Collections\Deque)#1 (3) {
  [0]=>
  string(6) "second"
  [1]=>
  string(5) "third"
  [2]=>
  string(6) "fourth"
}
*/

Arguments for adding this

What applications would benefit from Deque?

As mentioned in https://en.wikipedia.org/wiki/Queue_(abstract_data_type)

Queues provide services in computer science, transport, and operations research where various entities such as data, objects, persons, or events are stored and held to be processed later. In these contexts, the queue performs the function of a buffer. Another usage of queues is in the implementation of breadth-first search.

Some examples of how queues are used in php applications and libraries are below:

  1. Composer: https://github.com/composer/composer/blob/2.4.2/src/Composer/DependencyResolver/RuleSetGenerator.php#L161-L210 uses a SplQueue internally as a work queue in its dependency resolution logic.
  2. Event-Driven frameworks: https://github.com/reactphp/event-loop/blob/v1.3.0/src/Tick/FutureTickQueue.php#L21 uses SplQueue internally to hold callback functions to call later in the order they were added.
  3. Networking code: Predis, a userland client for the Redis database, currently uses SplQueue to build a pipeline of commands and associate pipelined outgoing requests with their incoming responses in the order the requests were sent. https://github.com/predis/predis/blob/v1.1.7/src/Pipeline/Pipeline.php#L121-L150

Once applications and libraries such as those were able to use Collections\Deque (e.g. when the minimum PHP version requirement was raised), they would benefit from the speedup that Collections\Deque has over SplDoublyLinkedList/SplQueue.

Looking at the benchmarks section, SplQueue(and the parent class SplDoublyLinkedList) is typically slower than array when used as a queue (though array has higher reported memory usage). By introducing a data structure (Deque) that's even faster and uses less memory than an array for use as a double-ended queue, even more applications would benefit from it.

  • Additionally, Deque would allow some code to be refactored or written in a more readable way in cases where php developers would previously be avoiding using SplDoublyLinkedList (or SplStack/SplQueue) due to older datastructures having time and memory usage inefficiencies.

https://www.php.net/manual/en/class.splqueue.php#116422 is another example

You can use shift/unshift and push/pop to dequeue/undequeue and queue/unqueue respectively. Really handy for those applications that use sockets where you might not know you can't send data until you attempt to.

for example, this is a function for an application that will un-dequeue the remainder of the data if socket_write indicated it did not write the entire contents of the provided data.

(omitted code snippet from https://www.php.net/manual/en/class.splqueue.php#116422)

Even users that don't write code using Deque may still benefit from its inclusion, due to improved performance and reduced memory usage in the applications and libraries that do adopt it.

Why use this instead of SplDoublyLinkedList?

  1. Pushing/popping/pushFront(unshifting)/popFront(shifting) takes roughly 30% less time.
  2. Deque typically uses less memory than SplDoublyLinkedList.
  3. Memory locality results in much better performance of Collections\Deque when processing large amounts of data (See benchmarks)
  4. The iteration order is predictable, unlike SplDoublyLinkedList, where it can be modified with flags. Collections\Deque is also an IteratorAggregate, allowing multiple iterators to run independently.
  5. Constant-time access for reading or modifying elements at any position in the Deque, unlike SplDoublyLinkedList where that would require walking the linked list.

Why use this instead of array?

  1. Faster than using array for queue-like workloads (e.g. around 13%-37% less time to add to the end and remove from the start depending on Deque size and access pattern in an array implementation that correctly avoids linear-time operations).
  2. It is impossible to prepend to an array (i.e. to be first in insertion order) in constant time. array_unshift takes time proportional to the length of an array. See Benchmarks: Unshifting n values to the start then shifting n values from the start
  3. Deque uses much less memory than an array when used as a queue (by unsetting), especially since that will eventually convert an array to an associative array. See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html and benchmarks. Note that starting in php 8.2, arrays that are lists (with no/few gaps) are represented in a more memory efficient way than associative arrays.
  4. Modifying an object by value can be faster than modifying an array by reference. Sometime, it is necessary for a library to modify a passed in collection/array in place (e.g. as an internal implementation detail) (e.g. appending to a list of errors and returning a boolean, converting a binary tree to a list of nodes where some predicate applies). The only way for a method to modify a passed in array parameter is by reference, and opcache is not able to optimize reference parameters/variables because their type at runtime cannot be guaranteed, passing in an object by value can be much faster than passing an array by reference.

    (Other approaches may end up copying arrays repeatedly, and take quadratic time instead of linear time, causing performance issues for large inputs).

    (Right now, some libraries do pass arrays by reference, because of how inefficient SplDoublyLinkedList is expected to be to create, modify, and read)

Also see the RFC introduction for pitfalls with trying to use a PHP array like a queue.

Caveats

For stack-like workflows that only require pushing and popping from the end of a data structure, array uses roughly the same amount of memory as Deque and is faster than Deque due to Use more compact representation for packed arrays. being merged into PHP 8.2

Usage

Examples of how each of these methods in Deque can be used can be found at https://www.php.net/spldoublylinkedlist - in most cases this can be used as a more efficient substitute for SplDoublyLinkedList (excluding relatively rare use cases such as removing from the middle of a SplDoublyLinkedList during iteration).

See Implementation for the public API of Deque.

Implementation Choices

"Collections\" namespace

This name was chosen in https://wiki.php.net/rfc/deque_straw_poll#arguments_foragainst_collections_deque - this has a different enough implementation from SplQueue/SplDoublyLinkedList and other data structures in the spl that a different naming pattern or namespace made sense for future additions.

There has also been widespread interest in adopting namespaces for new modules/categories of functionality. https://wiki.php.net/rfc/namespaces_in_bundled_extensions recently passed and allows for using namespaces in new categories of functionality (and provides guidance for using namespaces).

Accepting an iterable

This accepts the values of the iterable in the order of iteration. Keys of the iterable are ignored (Because this is meant to be a double-ended queue, adding placeholders would not make sense if there were gaps in the array.)

Final Class

If this were overridable, this would have the following drawbacks:

  1. There would not be as strong guarantees to readers of PHP code using Deque (or even opcache, if optimizations were added targeting objects) that elements were actually a vector or that certain methods would/wouldn't throw certain exceptions, or that iteration would be possible. (if getIterator, ArrayAccess methods, etc. were overridable)
  2. More memory and runtime checks would be required to check if this was the original class or a subclass when fetching a value
  3. More memory would be required to mitigate any unexpected issues seen in the way subclasses overrid or used the base class.
  4. There would be a larger chance the implementation would have discovered or undiscovered bugs due to userland subclasses of Deque, in serialization/unserialization, reads or writes, resizing, future functionality added to PHP, opcache (if opcache optimizations were added), or future methods added to Deque, or causes that were not even thought of yet.

Making all functionality final turns out to be the same approach that the PECL project https://github.com/php-ds/ext-ds used for its datastructures. The php-ds's maintainers reasons are mentioned in https://medium.com/@rtheunissen/efficient-data-structures-for-php-7-9dda7af674cd and summarized or quoted below

  • “Prefer composition over inheritance”
  • “Inheritance would also introduce unnecessary internal complexity”
  • Avoid multiple methods doing the same thing (e.g. SplDeque has a redundant enqueue methods to add to the end of an array because the base class already had push())
  • “They are designed to be self-contained, much like an array. You can't extend an array, so we design our own APIs around it by using an internal array to store the actual data.”

push/pop/pushFront/popFront (and first()/last())

This is consistent with the name used for array_push()/array_pop()/array_key_first()/array_key_last(), as well as names used for SplDoublyLinkedList.

The name of pushFront/popFront was chosen because the method names:

  • Is self-explanatory and obvious whether it is adding or removing values.
  • Are used for an amortized constant time operation. Unlike array_shift()/array_shift(), it is not actually shifting other elements to make room at the front

The names pushFront/popFront were chosen after feedback from Nikita Popov, in https://externals.io/message/116100#116214

5. The shift/unshift terminology is already used by our array API, but I'm

 not sure it's the most intuitive choice in the context of a deque. SplQueue
 has enqueue() and dequeue(). Another popular option from other languages
 (which I would personally favor) is pushFront() and popFront().

Backward Incompatible Changes

The class name \Collections\Deque is now used by PHP, and it will be a compilation error to declare classlikes of the same name in the namespace Collections, since the class already exists.

Proposed PHP Version(s)

8.3

RFC Impact

To Opcache

None

Unaffected PHP Functionality

PHP's type system remains unchanged (e.g. array) - final class Collections\Deque is a class, and instances of Collections\Deque are ordinary objects.

Benchmarks

To summarize:

  • Collections\Deque is faster than other objects in the spl for these benchmarks. In a benchmark exercising worst-case scenarios where linked list traversal results in lots of cache misses, it can be 3 times faster than SplDoublyLinkedList
  • Collections\Deque outperforms using an array as a queue (pushing/popping from the front) for even moderately sized arrays, and is significantly faster than array_shift()/array_unshift() (which need to move every element of the array)
  • array is heavily optimized in the php interpreter and the jit, so arrays generally outperform objects for reading and appending to the end of an array.

Unshifting n values to the start then shifting n values from the start

This benchmark uses array_unshift/array_shift to test the most natural way of removing from the start of an array, not the most optimized. Subsequent benchmarks use a less obvious but more time-optimized way to remove elements from the start of an array. key() and do not renumber for removing from the start of an array.

This takes quadratic time for an array - at n=1000, array takes 16.078 seconds, and Collections\Deque takes 0.160 seconds for the 2000 iterations (Deque is 100 times faster for n=1024).

Test creating a collection, then two cycles of push+shift (unshifting n values to the start of collections then shifting all of them from front of collection)
Moving the front of an array requires moving all elements to their new position, so the total time for array is quadratic
Results for php 8.3.0-dev debug=false with opcache enabled=true

2x Unshift then shift from start of array:               n=       1 iterations= 8000000
=> max memory=     248 bytes, create+destroy time=1.945 shift time = 1.145 total time = 3.090 result=0
2x Unshift then shift from start of Collections\Deque:   n=       1 iterations= 8000000
=> max memory=     144 bytes, create+destroy time=1.915 shift time = 1.084 total time = 2.999 result=0
2x Unshift then shift from start of Ds\Deque:            n=       1 iterations= 8000000
=> max memory=     216 bytes, create+destroy time=2.791 shift time = 1.105 total time = 3.896 result=0
2x Unshift then shift from start of SplDoublyLinkedList: n=       1 iterations= 8000000
=> max memory=     184 bytes, create+destroy time=2.324 shift time = 1.093 total time = 3.417 result=0

2x Unshift then shift from start of array:               n=       4 iterations= 2000000
=> max memory=     248 bytes, create+destroy time=1.149 shift time = 0.752 total time = 1.900 result=24000000
2x Unshift then shift from start of Collections\Deque:   n=       4 iterations= 2000000
=> max memory=     144 bytes, create+destroy time=0.627 shift time = 0.571 total time = 1.198 result=24000000
2x Unshift then shift from start of Ds\Deque:            n=       4 iterations= 2000000
=> max memory=     216 bytes, create+destroy time=1.288 shift time = 0.621 total time = 1.908 result=24000000
2x Unshift then shift from start of SplDoublyLinkedList: n=       4 iterations= 2000000
=> max memory=     280 bytes, create+destroy time=1.136 shift time = 0.621 total time = 1.757 result=24000000

2x Unshift then shift from start of array:               n=       8 iterations= 1000000
=> max memory=     248 bytes, create+destroy time=1.144 shift time = 0.764 total time = 1.908 result=56000000
2x Unshift then shift from start of Collections\Deque:   n=       8 iterations= 1000000
=> max memory=     208 bytes, create+destroy time=0.460 shift time = 0.558 total time = 1.018 result=56000000
2x Unshift then shift from start of Ds\Deque:            n=       8 iterations= 1000000
=> max memory=     216 bytes, create+destroy time=0.983 shift time = 0.567 total time = 1.550 result=56000000
2x Unshift then shift from start of SplDoublyLinkedList: n=       8 iterations= 1000000
=> max memory=     408 bytes, create+destroy time=0.922 shift time = 0.582 total time = 1.504 result=56000000

2x Unshift then shift from start of array:               n=      64 iterations=  100000
=> max memory=    1368 bytes, create+destroy time=2.594 shift time = 1.520 total time = 4.113 result=403200000
2x Unshift then shift from start of Collections\Deque:   n=      64 iterations=  100000
=> max memory=    1104 bytes, create+destroy time=0.206 shift time = 0.358 total time = 0.564 result=403200000
2x Unshift then shift from start of Ds\Deque:            n=      64 iterations=  100000
=> max memory=    1112 bytes, create+destroy time=0.624 shift time = 0.402 total time = 1.026 result=403200000
2x Unshift then shift from start of SplDoublyLinkedList: n=      64 iterations=  100000
=> max memory=    2200 bytes, create+destroy time=0.564 shift time = 0.371 total time = 0.936 result=403200000

2x Unshift then shift from start of array:               n=     128 iterations=   50000
=> max memory=    2648 bytes, create+destroy time=4.591 shift time = 2.538 total time = 7.129 result=812800000
2x Unshift then shift from start of Collections\Deque:   n=     128 iterations=   50000
=> max memory=    2128 bytes, create+destroy time=0.187 shift time = 0.352 total time = 0.539 result=812800000
2x Unshift then shift from start of Ds\Deque:            n=     128 iterations=   50000
=> max memory=    2136 bytes, create+destroy time=0.597 shift time = 0.383 total time = 0.980 result=812800000
2x Unshift then shift from start of SplDoublyLinkedList: n=     128 iterations=   50000
=> max memory=    4248 bytes, create+destroy time=0.547 shift time = 0.361 total time = 0.907 result=812800000

2x Unshift then shift from start of array:               n=    1024 iterations=    2000
=> max memory=   20568 bytes, create+destroy time=10.635 shift time = 5.443 total time = 16.078 result=2095104000
2x Unshift then shift from start of Collections\Deque:   n=    1024 iterations=    2000
=> max memory=   16464 bytes, create+destroy time=0.052 shift time = 0.107 total time = 0.160 result=2095104000
2x Unshift then shift from start of Ds\Deque:            n=    1024 iterations=    2000
=> max memory=   16472 bytes, create+destroy time=0.182 shift time = 0.116 total time = 0.298 result=2095104000
2x Unshift then shift from start of SplDoublyLinkedList: n=    1024 iterations=    2000
=> max memory=   32920 bytes, create+destroy time=0.180 shift time = 0.120 total time = 0.300 result=2095104000

****** Skip array n=1048576 iterations=20 quadratic time too slow ******
2x Unshift then shift from start of Collections\Deque:   n= 1048576 iterations=      20
=> max memory=16777320 bytes, create+destroy time=1.179 shift time = 1.247 total time = 2.426 result=21990211584000
2x Unshift then shift from start of Ds\Deque:            n= 1048576 iterations=      20
=> max memory=16777328 bytes, create+destroy time=2.548 shift time = 1.235 total time = 3.782 result=21990211584000
2x Unshift then shift from start of SplDoublyLinkedList: n= 1048576 iterations=      20
=> max memory=33554584 bytes, create+destroy time=1.800 shift time = 1.208 total time = 3.008 result=21990211584000

Two cycles of appending n values then removing them from the front

Note that it is possible to have constant time removal from the start of a PHP array efficiently (as long as key stays at the start of the array), but it is not possible to have constant time prepending (pushFront/unshift) to the start of an array. array_unshift is a linear time operation (takes time proportional to the current array size). So unshift is not included in these benchmarks.

Because there's a second cycle in this benchmark, array becomes an associative array and uses more memory than a packed array (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html). (At the time of writing, a packed array uses double the memory of a Deque, though there is an unrelated change in review proposing reducing the memory usage of packed arrays).

memory_get_usage is not counting the memory overhead of tracking the allocations of a lot of small objects, so the memory usage of SplDoublyLinkedList is under-reported. SplQueue is a subclass of SplDoublyLinkedList and I expect it would have the same performance.

Deque and array both always capacities that are powers of 2. This benchmark tests the best-case memory usage for Deque and array

Deque is faster than SplDoublyLinkedList at all sizes, and faster than array at medium and large sizes. The maximum memory usage is also noticeably lower than both array and SplDoublyLinkedList

Caveats for comparison with [https://github.com/php-ds/ext-ds](Ds\Deque PECL):

  • Ds\Deque was compiled and installed as a shared extension with phpize; ./configure; make install (shared reflects the default way to install PECL modules, e.g. what pecl install or an OS package manager would do or what copying the windows DLL would do), not statically compiled into php
  • Both Deque and Ds\Deque use circular buffers, so the performance is expected to be about the same.
  • This php-src PR has micro-optimizations such as storing the mask (instead of capacity) for faster bitwise ands, as well as the use of the EXPECTED/UNEXPECTED macros to hint the fast path to the compiler.
  • Having the extra step of installing a PECL may discourage application/library authors from writing code using a PECL or discourage some users from using those applications/libraries. While php-ds has a polyfill, the polyfill would be slower than native solutions - https://github.com/php-ds/ext-ds#compatibility
Test creating a collection, then two cycles of push+shift (adding n values to the end of collections then shifting all of them from the start of the collection)
Results for php 8.3.0-dev debug=false with opcache enabled=true
 
2x Push then shift from array:               n=       1 iterations= 8000000
=> max memory=     216 bytes, create+destroy time=1.323 shift time = 1.176 total time = 2.499 result=0
2x Push then shift from Collections\Deque:   n=       1 iterations= 8000000
=> max memory=     144 bytes, create+destroy time=1.894 shift time = 1.037 total time = 2.931 result=0
2x Push then shift from Ds\Deque:            n=       1 iterations= 8000000
=> max memory=     216 bytes, create+destroy time=2.289 shift time = 1.112 total time = 3.401 result=0
2x Push then shift from SplDoublyLinkedList: n=       1 iterations= 8000000
=> max memory=     184 bytes, create+destroy time=2.411 shift time = 1.117 total time = 3.528 result=0
 
2x Push then shift from array:               n=       4 iterations= 2000000
=> max memory=     216 bytes, create+destroy time=0.490 shift time = 0.832 total time = 1.322 result=24000000
2x Push then shift from Collections\Deque:   n=       4 iterations= 2000000
=> max memory=     144 bytes, create+destroy time=0.629 shift time = 0.557 total time = 1.187 result=24000000
2x Push then shift from Ds\Deque:            n=       4 iterations= 2000000
=> max memory=     216 bytes, create+destroy time=0.731 shift time = 0.606 total time = 1.337 result=24000000
2x Push then shift from SplDoublyLinkedList: n=       4 iterations= 2000000
=> max memory=     280 bytes, create+destroy time=1.189 shift time = 0.621 total time = 1.811 result=24000000
 
2x Push then shift from array:               n=       8 iterations= 1000000
=> max memory=     376 bytes, create+destroy time=0.357 shift time = 0.763 total time = 1.120 result=56000000
2x Push then shift from Collections\Deque:   n=       8 iterations= 1000000
=> max memory=     208 bytes, create+destroy time=0.466 shift time = 0.552 total time = 1.018 result=56000000
2x Push then shift from Ds\Deque:            n=       8 iterations= 1000000
=> max memory=     216 bytes, create+destroy time=0.498 shift time = 0.549 total time = 1.048 result=56000000
2x Push then shift from SplDoublyLinkedList: n=       8 iterations= 1000000
=> max memory=     408 bytes, create+destroy time=0.938 shift time = 0.578 total time = 1.516 result=56000000
 
2x Push then shift from array:               n=    1024 iterations=   20000
=> max memory=   41016 bytes, create+destroy time=0.522 shift time = 1.746 total time = 2.268 result=20951040000
2x Push then shift from Collections\Deque:   n=    1024 iterations=   20000
=> max memory=   16464 bytes, create+destroy time=0.524 shift time = 1.057 total time = 1.582 result=20951040000
2x Push then shift from Ds\Deque:            n=    1024 iterations=   20000
=> max memory=   16472 bytes, create+destroy time=0.571 shift time = 1.175 total time = 1.746 result=20951040000
2x Push then shift from SplDoublyLinkedList: n=    1024 iterations=   20000
=> max memory=   32920 bytes, create+destroy time=1.806 shift time = 1.174 total time = 2.980 result=20951040000
 
2x Push then shift from array:               n= 1048576 iterations=      20
=> max memory=41943120 bytes, create+destroy time=1.320 shift time = 1.825 total time = 3.146 result=21990211584000
2x Push then shift from Collections\Deque:   n= 1048576 iterations=      20
=> max memory=16777320 bytes, create+destroy time=1.235 shift time = 1.264 total time = 2.500 result=21990211584000
2x Push then shift from Ds\Deque:            n= 1048576 iterations=      20
=> max memory=16777328 bytes, create+destroy time=1.313 shift time = 1.239 total time = 2.552 result=21990211584000
2x Push then shift from SplDoublyLinkedList: n= 1048576 iterations=      20
=> max memory=33554584 bytes, create+destroy time=1.887 shift time = 1.213 total time = 3.100 result=21990211584000

Only appending to a Deque and reading elements without removal

Note that the proposed Deque as well as the existing SplDoublyLinkedList/SplStack are expected to perform equally well at shifting to (pushFront, i.e. adding to) or unshifting from(popFront, i.e. removing from) the start(first elements) of an array. (Shifting or unshifting (or unsetting) the start is inefficient for arrays)

For this benchmark, Deque can be created and read from faster than the fastest methods of reading SplStack or SplFixedArray.

  • Note that a foreach is used instead of random access for SplStack, since SplStack random access time is proportional to the number of linked list nodes that need to be iterated over to read a value.
  • SplFixedArray would be faster to append to if it had a push() method, but there isn't one.

Major notes:

  • Collections\Deque and SplFixedArray use less memory than other available options.
  • Collections\Deque is faster than object data structures currently available in the SPL.
  • In php 8.2+, array is faster than all object data structures, and memory usage of large packed arrays is half of what would be used in 8.1 due to https://github.com/php/php-src/pull/7491
Results for php 8.3.0-dev debug=false with opcache enabled=true
 
Appending to array:             n=       1 iterations= 8000000 total time=0.951 memory=     216 bytes, create+destroy time=0.636 read time = 0.315 result=0
Appending to Collections\Deque: n=       1 iterations= 8000000 total time=1.514 memory=     144 bytes, create+destroy time=1.132 read time = 0.382 result=0
Appending to Ds\Deque:          n=       1 iterations= 8000000 total time=1.765 memory=     216 bytes, create+destroy time=1.353 read time = 0.412 result=0
Appending to SplStack:          n=       1 iterations= 8000000 total time=2.566 memory=     184 bytes, create+destroy time=1.813 read time = 0.752 result=0
Appending to SplFixedArray:     n=       1 iterations= 8000000 total time=2.381 memory=      80 bytes, create+destroy time=1.928 read time = 0.453 result=0
 
 
Appending to array:             n=       4 iterations= 2000000 total time=0.358 memory=     216 bytes, create+destroy time=0.233 read time = 0.124 result=12000000
Appending to Collections\Deque: n=       4 iterations= 2000000 total time=0.542 memory=     144 bytes, create+destroy time=0.363 read time = 0.179 result=12000000
Appending to Ds\Deque:          n=       4 iterations= 2000000 total time=0.629 memory=     216 bytes, create+destroy time=0.424 read time = 0.205 result=12000000
Appending to SplStack:          n=       4 iterations= 2000000 total time=1.102 memory=     280 bytes, create+destroy time=0.796 read time = 0.306 result=12000000
Appending to SplFixedArray:     n=       4 iterations= 2000000 total time=1.537 memory=     128 bytes, create+destroy time=1.280 read time = 0.257 result=12000000
 
 
Appending to array:             n=       8 iterations= 1000000 total time=0.252 memory=     216 bytes, create+destroy time=0.161 read time = 0.091 result=28000000
Appending to Collections\Deque: n=       8 iterations= 1000000 total time=0.422 memory=     208 bytes, create+destroy time=0.256 read time = 0.166 result=28000000
Appending to Ds\Deque:          n=       8 iterations= 1000000 total time=0.463 memory=     216 bytes, create+destroy time=0.282 read time = 0.181 result=28000000
Appending to SplStack:          n=       8 iterations= 1000000 total time=0.824 memory=     408 bytes, create+destroy time=0.577 read time = 0.246 result=28000000
Appending to SplFixedArray:     n=       8 iterations= 1000000 total time=1.367 memory=     192 bytes, create+destroy time=1.140 read time = 0.227 result=28000000
 
 
Appending to array:             n= 1048576 iterations=      20 total time=0.687 memory=16781392 bytes, create+destroy time=0.526 read time = 0.161 result=10995105792000
Appending to Collections\Deque: n= 1048576 iterations=      20 total time=0.918 memory=16777320 bytes, create+destroy time=0.603 read time = 0.315 result=10995105792000
Appending to Ds\Deque:          n= 1048576 iterations=      20 total time=0.972 memory=16777328 bytes, create+destroy time=0.622 read time = 0.350 result=10995105792000
Appending to SplStack:          n= 1048576 iterations=      20 total time=1.380 memory=33554584 bytes, create+destroy time=0.961 read time = 0.420 result=10995105792000
Appending to SplFixedArray:     n= 1048576 iterations=      20 total time=3.414 memory=16777304 bytes, create+destroy time=2.962 read time = 0.452 result=10995105792000

Comparing read performance to other data structures (for fragmented SplDoublyLinkedList)

Collections\Deque has faster read performance than other objects currently available in the spl.)

This benchmark demonstrates this for a large variety of array sizes.

  • In use cases where the SplDoublyLinkedList is fragmented and there are a lot of processor cache misses (due to assigning to 16 different linked lists), Collections\Deque is over 3 times faster than SplDoublyLinkedList and its subclasses.
  • Collections\Deque is somewhat faster than the other objects at all other cache sizes.

(Ds\Deque is from a PECL (not part of php), and is included for comparison by request (installed as a shared library))

Test repeatedly reading from 16 arrays of variable sizes 'n'.
This shows how the poor memory locality of linked lists(SplStack) can cause noticeably worse performance than continuous memory of arrays/Deques when the linked lists are fragmented.
 
Caveats:
- SplDoublyLinked subclasses were around 3 times slower for very large arrays when the linked list entries (and other data being accessed) aren't found in the processor's caches (large values of 'n')
 
  (This is a worst-case scenario and it usually is less than that - the outer size of 16 was chosen to demonstrate the worst-case performance impact that processor cache misses could have on SplDoublyLinkedList)
- ArrayObject currently doesn't implement optimized array access handlers like SplFixedArray or the 2 Deque implementations. It calls ArrayAccess->offsetGet.
- Spl data structures aren't final and need to check if offsetGet has been overridden.
- PHP has optimized code paths for arrays.
Results for php 8.3.0-dev debug=false with opcache enabled=true
 
Reading from array:             n=       1 outer size=16 iterations=  800000 memory=     3832 bytes, read time=0.278 result=96000000
Reading from Collections\Deque  n=       1 outer size=16 iterations=  800000 memory=     2640 bytes, read time=0.442 result=96000000
Reading from Ds\Deque:          n=       1 outer size=16 iterations=  800000 memory=     3800 bytes, read time=0.477 result=96000000
Reading from ArrayObject:       n=       1 outer size=16 iterations=  800000 memory=     6008 bytes, read time=0.898 result=96000000
Reading from SplStack:          n=       1 outer size=16 iterations=  800000 memory=     3608 bytes, read time=1.020 result=96000000
Reading from SplFixedArray:     n=       1 outer size=16 iterations=  800000 memory=     1600 bytes, read time=0.540 result=96000000
 
Reading from array:             n=       4 outer size=16 iterations=  200000 memory=     3832 bytes, read time=0.149 result=115200000
Reading from Collections\Deque  n=       4 outer size=16 iterations=  200000 memory=     2640 bytes, read time=0.247 result=115200000
Reading from Ds\Deque:          n=       4 outer size=16 iterations=  200000 memory=     3800 bytes, read time=0.272 result=115200000
Reading from ArrayObject:       n=       4 outer size=16 iterations=  200000 memory=     6008 bytes, read time=0.533 result=115200000
Reading from SplStack:          n=       4 outer size=16 iterations=  200000 memory=     5144 bytes, read time=0.437 result=115200000
Reading from SplFixedArray:     n=       4 outer size=16 iterations=  200000 memory=     2368 bytes, read time=0.297 result=115200000
 
Reading from array:             n=       8 outer size=16 iterations=  100000 memory=     3832 bytes, read time=0.137 result=140800000
Reading from Collections\Deque  n=       8 outer size=16 iterations=  100000 memory=     3664 bytes, read time=0.247 result=140800000
Reading from Ds\Deque:          n=       8 outer size=16 iterations=  100000 memory=     3800 bytes, read time=0.265 result=140800000
Reading from ArrayObject:       n=       8 outer size=16 iterations=  100000 memory=     6008 bytes, read time=0.438 result=140800000
Reading from SplStack:          n=       8 outer size=16 iterations=  100000 memory=     7192 bytes, read time=0.364 result=140800000
Reading from SplFixedArray:     n=       8 outer size=16 iterations=  100000 memory=     3392 bytes, read time=0.273 result=140800000
 
Reading from array:             n=    1024 outer size=16 iterations=    1024 memory=   328952 bytes, read time=0.136 result=8707375104
Reading from Collections\Deque  n=    1024 outer size=16 iterations=    1024 memory=   263760 bytes, read time=0.243 result=8707375104
Reading from Ds\Deque:          n=    1024 outer size=16 iterations=    1024 memory=   263896 bytes, read time=0.266 result=8707375104
Reading from ArrayObject:       n=    1024 outer size=16 iterations=    1024 memory=   331128 bytes, read time=0.453 result=8707375104
Reading from SplStack:          n=    1024 outer size=16 iterations=    1024 memory=   527384 bytes, read time=0.342 result=8707375104
Reading from SplFixedArray:     n=    1024 outer size=16 iterations=    1024 memory=   263488 bytes, read time=0.282 result=8707375104
 
Reading from array:             n=   65536 outer size=16 iterations=      20 memory= 16844024 bytes, read time=0.176 result=687341568000
Reading from Collections\Deque  n=   65536 outer size=16 iterations=      20 memory= 16778832 bytes, read time=0.309 result=687341568000
Reading from Ds\Deque:          n=   65536 outer size=16 iterations=      20 memory= 16778968 bytes, read time=0.327 result=687341568000
Reading from ArrayObject:       n=   65536 outer size=16 iterations=      20 memory= 16846200 bytes, read time=0.559 result=687341568000
Reading from SplStack:          n=   65536 outer size=16 iterations=      20 memory= 33557528 bytes, read time=0.823 result=687341568000
Reading from SplFixedArray:     n=   65536 outer size=16 iterations=      20 memory= 16778560 bytes, read time=0.356 result=687341568000
 
Reading from array:             n=  262144 outer size=16 iterations=      10 memory= 67176056 bytes, read time=0.353 result=5497851740160
Reading from Collections\Deque  n=  262144 outer size=16 iterations=      10 memory= 67110864 bytes, read time=0.618 result=5497851740160
Reading from Ds\Deque:          n=  262144 outer size=16 iterations=      10 memory= 67111000 bytes, read time=0.656 result=5497851740160
Reading from ArrayObject:       n=  262144 outer size=16 iterations=      10 memory= 67178232 bytes, read time=1.124 result=5497851740160
Reading from SplStack:          n=  262144 outer size=16 iterations=      10 memory=134220824 bytes, read time=2.115 result=5497851740160
Reading from SplFixedArray:     n=  262144 outer size=16 iterations=      10 memory= 67110592 bytes, read time=0.719 result=5497851740160

Future Scope

If Collections\Deque is added, there would be plenty of time for myself or others to propose and discuss additional methods before PHP 8.3's feature freeze (probably in July 2023)

Future additions to https://github.com/TysonAndre/pecl-teds that are general purpose enough such as hash sets/maps and sorted sets/maps and vectors may be possible as well.

The php-ds maintainers are also open to borrowing from the implementation of php-ds to implement new datastructures.

My position currently is that I think we should start from scratch and borrow whatever is good from php-ds that we want to keep to implement something natively in core. In the 4 or 5 years since I wrote this extension I've been studying persistent data structures in-depth and there are a lot of decisions that I made then that I would do differently now. I would like to be a part of the design and implementation of the data structures themselves, but I do not have the understanding or capacity to be involved in work relating to the engine or its integration. It seems unlikely that a 2.0 of this extension will come about, I'm not convinced that a complete rework would be a good investment of our time.

Proposed Voting Choices

Yes/No vote, requiring a 2/3 majority

References

Rejected Features

Why not use the php-ds/ext-ds PECL instead (as an end user)?

  1. No matter how useful or popular a PECL is, datastructures available in PHP's core will have much, much wider adoption in applications and libraries that are available in PECLs, allowing those applications and libraries to write faster and/or more memory efficient code.
  2. End users can make much stronger assumptions about the backwards compatibility and long-term availability of data structures that are included in core.
  3. The php-ds maintainers do not plan to merge the extension into php-src, and believe php-ds should coexist with new functionality being added in a separate namespace instead (see quote and later clarifications for full context)
  4. Opcache may be able to make stronger optimizations of internal classes found in php-src than any third party PECL. (e.g. because Collections\Deque::push() or Collections\Vector::push() would never throw or emit notices, it may be possible to optimize it to be even faster than appending to an array in the Opcache JIT compiler)

php-ds maintainer response

This has been asked about multiple times in threads on unrelated proposals (https://externals.io/message/112639#112641 and https://externals.io/message/93301#93301 years ago) throughout the years.

The php-ds's maintainer's position is that “we should start from scratch and borrow whatever is good from php-ds that we want to keep to implement something natively in core.”

On September 24, 2021, https://github.com/php-ds/ext-ds/issues/156#issuecomment-926353779 clarified their position

Hi everyone, I am happy to see this discussion and I thank you all for taking part. My reservation to merge ds into core has always been because I wanted to make sure we get it right before we do that and the intention behind the mythical v2 was to achieve that, based on learnings from v1 and feedback from the community. I have no personal attachment to this project, I only want what is best for PHP and the community.

I would love to see a dedicated, super-lean vec data structure in core that has native iteration and all the other same internal benefits as arrays. In my opinion, the API should be very minimal and potentially compatible with all the non-assoc array functions. An OO interface can easily be designed around that. I'm imagining something similar to Golang's slices.

As for the future of ds itself, I think these can co-exist and ds can remain external. I've been researching and designing immutable data structures over the last 4 years and I still hope to develop a v2 that simplifies the interfaces and introduces immutable structures. Attempting to implement a suite of structures in core or an OO vector would take a lot of work and might be difficult to reach consensus on with the API. I don't think we should attempt to merge ds into core at any time.

I am currently traveling and have not followed this discussion in detail on the mailing list. I'd be happy to assist in any way I can and will catch up as soon as I am home again this week. Feel free to quote this response on the mailing list as well.

On December 13, 2021, https://github.com/php-ds/ext-ds/issues/156#issuecomment-992773289

My position currently is that I think we should start from scratch and borrow whatever is good from php-ds that we want to keep to implement something natively in core. In the 4 or 5 years since I wrote this extension I've been studying persistent data structures in-depth and there are a lot of decisions that I made then that I would do differently now. I would like to be a part of the design and implementation of the data structures themselves, but I do not have the understanding or capacity to be involved in work relating to the engine or its integration. It seems unlikely that a 2.0 of this extension will come about, I'm not convinced that a complete rework would be a good investment of our time.

Minor differences in API design goals

Traditionally, PHP has been a very batteries included language. Existing functionality such as strings and arrays have very large standard libraries. This makes it easier to write code without depending on too many third party composer libraries, and knowledge of the standard library can transfer to any codebase a developer works on.

My hopes for ease of use, readability, speed, and static analysis in future data structures such as Vector are similar to those mentioned by Benjamin Morel in the GitHub issue:

Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface.

Again, I understand the rationale behind this decision, like reducing duplication and keeping only the core functionality in DS. However, sometimes you have to take into consideration ease of use vs purity of the code.

Ease of use / DX / readability: it seems more logical to me to do:

$map->filter(fn(...) => ...);

Rather than:

Some\filter($map, fn(...) => ...);

Speed: as you said, internal iteration is faster. And speed is one of the selling points of DS vs arrays.

Static analysis: I love the fact that Map::filter() can be strictly typed as returning Map<TKey, TValue> in Psalm, for example. If you rely on a generic filter() function, I'm not sure such strict typing will be easy or even possible.

Thank you for your work on DS anyway, I already use the extension in my closed-source project, in particular Map. I would love to use data structures in my open-source projects, one day! 🤞

Additionally, it may be inconvenient for end users (e.g. new contributors to projects) to remember specifics of multiple libraries or utility classes when working on different codebases, to deal with dependency conflicts after major version upgrades, or to deal with libraries dropping support for older php versions, getting abandoned, etc.

Discussion

Request: Returning $this instead of void (in php's internal classes, in general)

The Deque::shift() and Deque::push() methods currently return void. Larry Garfield suggests in https://externals.io/message/116100#116967

Request:

push() and unshift() currently return void. That's not helpful. It would be vastly more useful if they both returned $this. Not as much for chaining, but more so that you can add a value to a queue and pass it as an argument to another call (often recursive, but not necessarily) in a single operation.

Example: I was doing last year's Advent of Code in functional PHP, and had a stack walker that looked like this:

function parse(string $line, $pos = 0, array $stack = []): Result|string
{
    $next = $line[$pos] ?? null;
    $head = $stack[0] ?? null;
 
    return match ($next) {
        // Opening brace, push an "expected" onto the stack.
        '{' => parse($line, $pos + 1, ['}', ...$stack]),
        '<' => parse($line, $pos + 1, ['>', ...$stack]),
        '(' => parse($line, $pos + 1, [')', ...$stack]),
        '[' => parse($line, $pos + 1, [']', ...$stack]),
        '}', '>', ')', ']' => $next === $head ? parse($line, $pos + 1, array_slice($stack, 1)) : $next,
        null => count($stack) ? Result::Incomplete : Result::OK,
    };
}

The interesting part is the ['<', ...$stack], to pass on a modified version of an array-as-stack. That's of course annoyingly slow with arrays right now, and a Deque would be better, but only if it could be “modified and passed” like that. If not, it would be incompatible with single-expression usages (match statements, short lambdas, etc.)

Returning $this would resolve that, I think. (Making it return a new, immutable copy of the Deque would be even nicer, but I realize that's probably not an argument I'm going to win at this point on this RFC.)

My personal preference is against making this fluent and continuing to use void like we're already doing for other methods. I'd rather expose an efficient datastructure that's consistent with the rest of PHP's functionality to the extent possible, which userland can use to write their own fluent/non-fluent classes. There's drawbacks to returning $this, including:

1. Inconsistency with existing APIs making remembering what does what harder. Barely anything in php that I remember returns $this.

https://www.php.net/manual/en/arrayobject.append.php returns void.

https://www.php.net/manual/en/function.array-push returns an int.

2. Inconsistency with possible new datastructures/methods

If a Map/Set function were to be added, then methods for add/remove would return booleans (or the old value), not $this

3. Slight additional performance overhead for functionality I assume will be used relatively infrequently

(php has to increment reference counts and opcache can't eliminate the opcode to decrease reference counts and possibly free the return value of $deque->pushFront()/push() with the return type info being an object)

4. Returning $this makes code easier to write at some cost to readability (for users new to PHP or this class, though it may help readability to developers deeply familiar with parts of standard library using it) - Developers new to php or using Collections\Deque for the first time would not immediately know what the code they're reading is doing. (less of a problem with a good IDE, typechecker, and a typed codebase, but this isn't universal) Having it return void, return $deque->push() would be less common and this would force the meaning to be clear.

Developers might have several guesses/assumptions based on their experience with other methods in php/elsewhere

  • It returns the new count (JavaScript Array.push, array_push)
  • It returns $this (Ruby)
  • It returns a lazy copy, like you'd wanted, not modifying the original
  • It's returning void and the code in question is shorthand for return null.

(Python, C++ https://www.cplusplus.com/reference/vector/vector/push_back/ , offsetSet and spl push()/unshift() methods)

Adding RETURN_OBJ_COPY(Z_OBJ_P(ZEND_THIS)); and a different method with the : Deque return type, fluent(push+return $this) was around 6-8% slower for me for push operations than push (returning void) (aside: method calls in php have a high overhead, the engine is better at optimizing []= if specialized handlers exist. Benchmarks were run twice to give an better idea of how much of the difference was random.):

5. I won't know if anyone is strongly against fluent interfaces belonging in core, or strongly against adopting fluent interfaces (return $this) in an ad-hoc way. Changing it here would make it unclear if the vote was for/against the overall functionality or the new design choice of return $this;.
I'd rather not do this without evidence this is widely considered a better design by the community (e.g. the change to a namespace had a policy RFC https://wiki.php.net/rfc/namespaces_in_bundled_extensions).

6. If we decide we want to change the return type of a method (in general, not for push/pushFront), then it's less of a breaking change to change it from void (where less code would use the return value) than from $this (where there may be existing code such as $obj->method()->otherMethod()).

Also, that feedback isn't a request specifically about Collections\Deque or collections in general, but rather about the general design of classes in php

(example of performance overhead in build returning $this vs push having a type of void)

Results for php 8.2.0-dev debug=false with opcache enabled=true

Appending to Collections\Deque []=   : n=       4 iterations=10000000, create+destroy time=1.204 result=0
Appending to Collections\Deque push  : n=       4 iterations=10000000, create+destroy time=1.551 result=0
Appending to Collections\Deque fluent: n=       4 iterations=10000000, create+destroy time=1.640 result=0
Appending to Collections\Deque []=   : n=       4 iterations=10000000, create+destroy time=1.213 result=0
Appending to Collections\Deque push  : n=       4 iterations=10000000, create+destroy time=1.549 result=0
Appending to Collections\Deque fluent: n=       4 iterations=10000000, create+destroy time=1.636 result=0

Appending to Collections\Deque []=   : n=       8 iterations= 5000000, create+destroy time=0.923 result=0
Appending to Collections\Deque push  : n=       8 iterations= 5000000, create+destroy time=1.263 result=0
Appending to Collections\Deque fluent: n=       8 iterations= 5000000, create+destroy time=1.335 result=0
Appending to Collections\Deque []=   : n=       8 iterations= 5000000, create+destroy time=0.921 result=0
Appending to Collections\Deque push  : n=       8 iterations= 5000000, create+destroy time=1.261 result=0
Appending to Collections\Deque fluent: n=       8 iterations= 5000000, create+destroy time=1.335 result=0

Appending to Collections\Deque []=   : n=    1024 iterations=  100000, create+destroy time=1.246 result=0
Appending to Collections\Deque push  : n=    1024 iterations=  100000, create+destroy time=1.972 result=0
Appending to Collections\Deque fluent: n=    1024 iterations=  100000, create+destroy time=2.150 result=0
Appending to Collections\Deque []=   : n=    1024 iterations=  100000, create+destroy time=1.251 result=0
Appending to Collections\Deque push  : n=    1024 iterations=  100000, create+destroy time=1.969 result=0
Appending to Collections\Deque fluent: n=    1024 iterations=  100000, create+destroy time=2.146 result=0

Appending to Collections\Deque []=   : n= 1048576 iterations=     100, create+destroy time=2.450 result=0
Appending to Collections\Deque push  : n= 1048576 iterations=     100, create+destroy time=3.172 result=0
Appending to Collections\Deque fluent: n= 1048576 iterations=     100, create+destroy time=3.371 result=0
Appending to Collections\Deque []=   : n= 1048576 iterations=     100, create+destroy time=2.455 result=0
Appending to Collections\Deque push  : n= 1048576 iterations=     100, create+destroy time=3.183 result=0
Appending to Collections\Deque fluent: n= 1048576 iterations=     100, create+destroy time=3.376 result=0

Appendix

Benchmark Source Code

Benchmarking repeated push and shift operations

<?php
 
const PUSH_SHIFT_CYCLES = 2;
 
function bench_array(int $n, int $iterations) {
    $totalReadTime = 0.0;
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = [];
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                $values[] = $i;
            }
            $maxMemory = memory_get_usage();
            $startReadTime = hrtime(true);
            while (count($values) > 0) {
                // Pretend we don't know the actual position of the first array key for this simulated benchmark
                // array_shift is not used because it is linear time (to move all other elements)
                // rather than constant time.
                //
                // This approach is simple at the cost of memory - it converts a packed array to a non-packed array
                // NOTE: Adding a call to reset() is not necessary in this case, and would result in worse performance.
                // NOTE: array_key_first results in quadratic performance for this synthetic benchmark.
                // $key = array_key_first($values);
                $key = key($values);
                $total += $values[$key];
                unset($values[$key]);
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
 
        unset($values);
    }
    $endTime = hrtime(true);
 
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Push then shift from array:               n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
function bench_deque(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new Collections\Deque();
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                $values[] = $i;
            }
            $maxMemory = memory_get_usage();
 
            $startReadTime = hrtime(true);
            while (count($values) > 0) {
                $total += $values->shift();
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
 
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Push then shift from Collections\Deque:   n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
function bench_ds_deque(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new Ds\Deque();
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                $values[] = $i;
            }
            $maxMemory = memory_get_usage();
 
            $startReadTime = hrtime(true);
            while (count($values) > 0) {
                $total += $values->shift();
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
 
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Push then shift from Ds\Deque:            n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
// SplDoublyLinkedList is a linked list that takes more memory than needed.
// Access to values in the middle of the SplDoublyLinkedList is also less efficient.
function bench_spl_dll(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new SplDoublyLinkedList();
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                $values->push($i);
            }
            $maxMemory = memory_get_usage();
            $startReadTime = hrtime(true);
            // Random access is linear time in a linked list, so use foreach instead
            while (count($values) > 0) {
                $total += $values->shift();
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Push then shift from SplDoublyLinkedList: n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
$n = 2**20;
$iterations = 10;
$sizes = [
    [1, 8000000],
    [4, 2000000],
    [8, 1000000],
    [1024, 20000],
    [2**20, 20],
];
echo "Test creating a collection, then two cycles of push+shift (adding n values to the end of collections then shifting all of them from the start of the collection)\n";
printf(
    "Results for php %s debug=%s with opcache enabled=%s\n\n",
    PHP_VERSION,
    PHP_DEBUG ? 'true' : 'false',
    json_encode(function_exists('opcache_get_status') && (opcache_get_status(false)['opcache_enabled'] ?? false))
);
 
foreach ($sizes as [$n, $iterations]) {
    bench_array($n, $iterations);
    bench_deque($n, $iterations);
    bench_ds_deque($n, $iterations);
    bench_spl_dll($n, $iterations);
    echo "\n";
}

Benchmarking only appending to a Deque and reading elements without removal

<?php
 
function bench_array(int $n, int $iterations) {
    $totalReadTime = 0.0;
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = [];
        for ($i = 0; $i < $n; $i++) {
            $values[] = $i;
        }
        $startReadTime = hrtime(true);
        for ($i = 0; $i < $n; $i++) {
            $total += $values[$i];
        }
        $endReadTime = hrtime(true);
        $totalReadTime += $endReadTime - $startReadTime;
 
        $endMemory = memory_get_usage();
        unset($values);
    }
    $endTime = hrtime(true);
 
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("Appending to array:             n=%8d iterations=%8d total time=%.3f memory=%8d bytes, create+destroy time=%.3f read time = %.3f result=%d\n",
        $n, $iterations, $totalTime, $endMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $total);
}
function bench_deque(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new Collections\Deque();
        for ($i = 0; $i < $n; $i++) {
            $values[] = $i;
        }
 
        $startReadTime = hrtime(true);
        for ($i = 0; $i < $n; $i++) {
            $total += $values[$i];
        }
        $endReadTime = hrtime(true);
        $totalReadTime += $endReadTime - $startReadTime;
 
        $endMemory = memory_get_usage();
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("Appending to Collections\Deque: n=%8d iterations=%8d total time=%.3f memory=%8d bytes, create+destroy time=%.3f read time = %.3f result=%d\n",
        $n, $iterations, $totalTime, $endMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $total);
}
function bench_ds_deque(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new Ds\Deque();
        for ($i = 0; $i < $n; $i++) {
            $values[] = $i;
        }
 
        $startReadTime = hrtime(true);
        for ($i = 0; $i < $n; $i++) {
            $total += $values[$i];
        }
        $endReadTime = hrtime(true);
        $totalReadTime += $endReadTime - $startReadTime;
 
        $endMemory = memory_get_usage();
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("Appending to Ds\Deque:          n=%8d iterations=%8d total time=%.3f memory=%8d bytes, create+destroy time=%.3f read time = %.3f result=%d\n",
        $n, $iterations, $totalTime, $endMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $total);
}
// SplStack is a subclass of SplDoublyLinkedList, so it is a linked list that takes more memory than needed.
// Access to values in the middle of the SplStack is also less efficient.
function bench_spl_stack(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new SplStack();
        for ($i = 0; $i < $n; $i++) {
            $values->push($i);
        }
        $startReadTime = hrtime(true);
        // Random access is linear time in a linked list, so use foreach instead
        foreach ($values as $value) {
            $total += $value;
        }
        $endReadTime = hrtime(true);
        $totalReadTime += $endReadTime - $startReadTime;
        $endMemory = memory_get_usage();
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("Appending to SplStack:          n=%8d iterations=%8d total time=%.3f memory=%8d bytes, create+destroy time=%.3f read time = %.3f result=%d\n",
        $n, $iterations, $totalTime, $endMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $total);
}
function bench_spl_fixed_array(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new SplFixedArray();
        for ($i = 0; $i < $n; $i++) {
            // Imitate how push() would be implemented in a situation
            // where the number of elements wasn't actually known ahead of time.
            // erealloc() tends to extend the existing array when possible.
            $size = $values->getSize();
            $values->setSize($size + 1);
            $values->offsetSet($size, $i);
        }
        $startReadTime = hrtime(true);
        for ($i = 0; $i < $n; $i++) {
            $total += $values[$i];
        }
        $endReadTime = hrtime(true);
        $totalReadTime += $endReadTime - $startReadTime;
        $endMemory = memory_get_usage();
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("Appending to SplFixedArray:     n=%8d iterations=%8d total time=%.3f memory=%8d bytes, create+destroy time=%.3f read time = %.3f result=%d\n\n",
        $n, $iterations, $totalTime, $endMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $total);
}
$n = 2**20;
$iterations = 10;
$sizes = [
    [1, 8000000],
    [4, 2000000],
    [8, 1000000],
    [2**20, 20],
];
printf(
    "Results for php %s debug=%s with opcache enabled=%s\n\n",
    PHP_VERSION,
    PHP_DEBUG ? 'true' : 'false',
    json_encode(function_exists('opcache_get_status') && (opcache_get_status(false)['opcache_enabled'] ?? false))
);
 
foreach ($sizes as [$n, $iterations]) {
    bench_array($n, $iterations);
    bench_deque($n, $iterations);
    bench_ds_deque($n, $iterations);
    bench_spl_stack($n, $iterations);
    bench_spl_fixed_array($n, $iterations);
    echo "\n";
}

Benchmark adding to the front then removing from the front

<?php
 
const PUSH_SHIFT_CYCLES = 2;
 
function bench_array(int $n, int $iterations) {
    $totalReadTime = 0.0;
    $startTime = hrtime(true);
    $total = 0;
    if ($n > 1024) {
        echo "****** Skip array n=$n iterations=$iterations quadratic time too slow ******\n";
        return;
    }
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = [];
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                array_unshift($values, $i);
            }
            $maxMemory = memory_get_usage();
            $startReadTime = hrtime(true);
            while (count($values) > 0) {
                $total += array_shift($values);
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
 
        unset($values);
    }
    $endTime = hrtime(true);
 
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Unshift then shift from start of array:               n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
function bench_deque(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new Collections\Deque();
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                $values[] = $i;
            }
            $maxMemory = memory_get_usage();
 
            $startReadTime = hrtime(true);
            while (count($values) > 0) {
                $total += $values->popFront();
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
 
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Unshift then shift from start of Collections\Deque:   n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
function bench_ds_deque(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new Ds\Deque();
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                $values->unshift($i);
            }
            $maxMemory = memory_get_usage();
 
            $startReadTime = hrtime(true);
            while (count($values) > 0) {
                $total += $values->shift();
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
 
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Unshift then shift from start of Ds\Deque:            n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
// SplDoublyLinkedList is a linked list that takes more memory than needed.
// Access to values in the middle of the SplDoublyLinkedList is also less efficient.
function bench_spl_dll(int $n, int $iterations) {
    $startTime = hrtime(true);
    $totalReadTime = 0.0;
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        $startMemory = memory_get_usage();
        $values = new SplDoublyLinkedList();
        for ($times = 0; $times < PUSH_SHIFT_CYCLES; $times++) {
            for ($i = 0; $i < $n; $i++) {
                $values->unshift($i);
            }
            $maxMemory = memory_get_usage();
            $startReadTime = hrtime(true);
            // Random access is linear time in a linked list, so use foreach instead
            while (count($values) > 0) {
                $total += $values->shift();
            }
            $endReadTime = hrtime(true);
            $totalReadTime += $endReadTime - $startReadTime;
        }
        unset($values);
    }
    $endTime = hrtime(true);
    $totalTime = ($endTime - $startTime) / 1000000000;
    $totalReadTimeSeconds = $totalReadTime / 1000000000;
    printf("2x Unshift then shift from start of SplDoublyLinkedList: n=%8d iterations=%8d\n=> max memory=%8d bytes, create+destroy time=%.3f shift time = %.3f total time = %.3f result=%d\n",
        $n, $iterations, $maxMemory - $startMemory, $totalTime - $totalReadTimeSeconds, $totalReadTimeSeconds, $totalTime, $total);
}
$n = 2**20;
$iterations = 10;
$sizes = [
    [1, 8000000],
    [4, 2000000],
    [8, 1000000],
    [64, 100000],
    [128, 50000],
    [1024, 2000],
    [2**20, 20],
];
echo "Test creating a collection, then two cycles of push+shift (unshifting n values to the start of collections then shifting all of them from front of collection)\n";
echo "Moving the front of an array requires moving all elements to their new position, so the total time for array is quadratic\n";
printf(
    "Results for php %s debug=%s with opcache enabled=%s\n\n",
    PHP_VERSION,
    PHP_DEBUG ? 'true' : 'false',
    json_encode(function_exists('opcache_get_status') && (opcache_get_status(false)['opcache_enabled'] ?? false))
);
 
foreach ($sizes as [$n, $iterations]) {
    bench_array($n, $iterations);
    bench_deque($n, $iterations);
    bench_ds_deque($n, $iterations);
    bench_spl_dll($n, $iterations);
    echo "\n";
}

Benchmark read performance

<?php
 
const OUTER_SIZE = 16;
 
function bench_array(int $n, int $iterations) {
    gc_collect_cycles();
    $startMemory = memory_get_usage();
 
    $values = [];
    for ($i = 0; $i < OUTER_SIZE; $i++) {
        $values[] = [];
    }
 
    for ($j = 0; $j < $n; $j++) {
        for ($i = 0; $i < OUTER_SIZE; $i++) {
            $values[$i][] = $i + $j;
        }
    }
 
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        foreach ($values as $inner) {
            foreach ($inner as $v) {
                $total += $v;
            }
        }
    }
    $endMemory = memory_get_usage();
    $endTime = hrtime(true);
 
    $totalReadTime = ($endTime - $startTime) / 1000000000;
    printf("Reading from array:             n=%8d outer size=%d iterations=%8d memory=%9d bytes, read time=%.3f result=%d\n",
        $n, OUTER_SIZE, $iterations, $endMemory - $startMemory, $totalReadTime, $total);
}
function bench_deque(int $n, int $iterations) {
    gc_collect_cycles();
    $startMemory = memory_get_usage();
 
    $values = new Collections\Deque();
    for ($i = 0; $i < OUTER_SIZE; $i++) {
        $values[] = new Collections\Deque();
    }
    for ($j = 0; $j < $n; $j++) {
        foreach ($values as $i => $inner) {
            $inner[] = $i + $j;
        }
    }
 
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        foreach ($values as $inner) {
            for ($i = 0; $i < $n; $i++) {
                $total += $inner[$i];
            }
        }
    }
    $endMemory = memory_get_usage();
    $endTime = hrtime(true);
 
    $totalReadTime = ($endTime - $startTime) / 1000000000;
    printf("Reading from Collections\Deque  n=%8d outer size=%d iterations=%8d memory=%9d bytes, read time=%.3f result=%d\n",
        $n, OUTER_SIZE, $iterations, $endMemory - $startMemory, $totalReadTime, $total);
}
function bench_ds_deque(int $n, int $iterations) {
    gc_collect_cycles();
    $startMemory = memory_get_usage();
    $values = new Ds\Deque();
    for ($i = 0; $i < OUTER_SIZE; $i++) {
        $values[] = new Ds\Deque();
    }
    for ($j = 0; $j < $n; $j++) {
        foreach ($values as $i => $inner) {
            $inner[] = $i + $j;
        }
    }
 
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        foreach ($values as $inner) {
            for ($i = 0; $i < $n; $i++) {
                $total += $inner[$i];
            }
        }
    }
    $endMemory = memory_get_usage();
    $endTime = hrtime(true);
 
    $totalReadTime = ($endTime - $startTime) / 1000000000;
    printf("Reading from Ds\Deque:          n=%8d outer size=%d iterations=%8d memory=%9d bytes, read time=%.3f result=%d\n",
        $n, OUTER_SIZE, $iterations, $endMemory - $startMemory, $totalReadTime, $total);
}
// SplStack is a subclass of SplDoublyLinkedList, so it is a linked list that takes more memory than needed.
// In this case it has worse cache locality
function bench_spl_stack(int $n, int $iterations) {
    gc_collect_cycles();
    $startMemory = memory_get_usage();
    $values = new SplStack();
    for ($i = 0; $i < OUTER_SIZE; $i++) {
        $values[] = new SplStack();
    }
    for ($j = 0; $j < $n; $j++) {
        foreach ($values as $i => $inner) {
            $inner[] = $i + $j;
        }
    }
 
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        foreach ($values as $inner) {
            // random access in doubly linked list is slow, use foreach
            foreach ($inner as $v) {
                $total += $v;
            }
        }
    }
    $endMemory = memory_get_usage();
    $endTime = hrtime(true);
 
    $totalReadTime = ($endTime - $startTime) / 1000000000;
    printf("Reading from SplStack:          n=%8d outer size=%d iterations=%8d memory=%9d bytes, read time=%.3f result=%d\n",
        $n, OUTER_SIZE, $iterations, $endMemory - $startMemory, $totalReadTime, $total);
}
function bench_array_object(int $n, int $iterations) {
    gc_collect_cycles();
    $startMemory = memory_get_usage();
    $values = new ArrayObject();
    for ($i = 0; $i < OUTER_SIZE; $i++) {
        $values[] = new ArrayObject();
    }
    for ($j = 0; $j < $n; $j++) {
        foreach ($values as $i => $inner) {
            $inner[] = $i + $j;
        }
    }
 
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        foreach ($values as $inner) {
            for ($i = 0; $i < $n; $i++) {
                $total += $inner[$i];
            }
        }
    }
    $endMemory = memory_get_usage();
    $endTime = hrtime(true);
 
    $totalReadTime = ($endTime - $startTime) / 1000000000;
    printf("Reading from ArrayObject:       n=%8d outer size=%d iterations=%8d memory=%9d bytes, read time=%.3f result=%d\n",
        $n, OUTER_SIZE, $iterations, $endMemory - $startMemory, $totalReadTime, $total);
}
function bench_spl_fixed_array(int $n, int $iterations) {
    gc_collect_cycles();
    $startMemory = memory_get_usage();
    $values = new SplFixedArray(OUTER_SIZE);
    for ($i = 0; $i < OUTER_SIZE; $i++) {
        $values[$i] = new SplFixedArray($n);
    }
    for ($j = 0; $j < $n; $j++) {
        foreach ($values as $i => $inner) {
            $inner[$j] = $i + $j;
        }
    }
    $startTime = hrtime(true);
    $total = 0;
    for ($j = 0; $j < $iterations; $j++) {
        foreach ($values as $inner) {
            for ($i = 0; $i < $n; $i++) {
                $total += $inner[$i];
            }
        }
    }
    $endMemory = memory_get_usage();
    $endTime = hrtime(true);
 
    $totalReadTime = ($endTime - $startTime) / 1000000000;
    printf("Reading from SplFixedArray:     n=%8d outer size=%d iterations=%8d memory=%9d bytes, read time=%.3f result=%d\n",
        $n, OUTER_SIZE, $iterations, $endMemory - $startMemory, $totalReadTime, $total);
}
$n = 2**20;
$iterations = 10;
$sizes = [
    [1, 800000],
    [4, 200000],
    [8, 100000],
    [2**10, 2**10],
    [2**16, 20],
    [2**18, 10],
];
ini_set('memory_limit', '1G');
printf(<<<EOT
Test repeatedly reading from %d collections of variable sizes 'n'.
This shows how the poor memory locality of linked lists(SplStack) can cause noticeably worse performance than continuous memory of arrays/Deques when the linked lists are fragmented.
 
Caveats:
- SplDoublyLinked subclasses were around 3 times slower for very large arrays when the linked list entries (and other data being accessed) aren't found in the processor's caches (large values of 'n')
 
  (This is a worst-case scenario and it usually is less than that - the outer size of 16 was chosen to demonstrate the worst-case performance impact that processor cache misses could have on SplDoublyLinkedList)
- ArrayObject currently doesn't implement optimized array access handlers like SplFixedArray or the 2 Deque implementations. It calls ArrayAccess->offsetGet.
- Spl data structures aren't final and need to check if offsetGet has been overridden.
- PHP has optimized code paths for arrays.
 
EOT
, OUTER_SIZE);
printf(
    "Results for php %s debug=%s with opcache enabled=%s\n\n",
    PHP_VERSION,
    PHP_DEBUG ? 'true' : 'false',
    json_encode(function_exists('opcache_get_status') && (opcache_get_status(false)['opcache_enabled'] ?? false))
);
 
foreach ($sizes as [$n, $iterations]) {
    bench_array($n, $iterations);
    bench_deque($n, $iterations);
    bench_ds_deque($n, $iterations);
    bench_array_object($n, $iterations);
    bench_spl_stack($n, $iterations);
    bench_spl_fixed_array($n, $iterations);
    echo "\n";
}

Changelog

  • 0.2: Link to definition of Deque.
  • 0.3: Add php-ds maintainer's response clarifications
  • 0.4: Update benchmarks after https://github.com/php/php-src/pull/7491 memory and time optimizations for packed arrays were merged into PHP 8.2
  • 0.5: Change proposed iteration behavior to iterate over the original Deque rather than a copy, having the Deque account for shift/unshift modifying the position of the start of the array (to avoid iterating over elements too many/few times if the front of the internal representation changes position). Update benchmark result. Update responses from php-ds maintainers.
  • 0.6: Rename to Collections\Deque
  • 0.6.1: Update section for rationale on namespace choice. Link to WebAssembly demo.
  • 0.6.2: Update benchmarks code/results after iteration implementation change in 0.5.
  • 0.7.0: Remove get()/set() helpers due to overlap with offsetGet/offsetSet functionality - introduction of typed get()/set() was suggested as better suited for a separate proposal for all data structures.
  • 0.8.0: Change bottom()/top() to first()/last()
  • 0.9.0: Add benchmark for repeated unshift then repeated shift. Add discussion section for request to return $this. Remove duplicate quotes. Clarify notes on performance.
  • 1.0.0: Support removing element with offsetUnset. Add insert(int $offset, mixed ...$values): void {} Improve iterator behavior and make it consistent with other iterators. Update benchmarks for php 8.3. Add benchmark of impact of memory locality on read performance vs SplDoublyLinkedList. Update proposed php version to 8.3.
  • 1.1.0: Rename shift/unshift methods to popFront/pushFront
rfc/deque.txt · Last modified: 2022/11/12 00:59 by tandre