This is an old revision of the document!
PHP RFC: List\unique() and Assoc\unique()
- Date: 2022-12-01
- Author: Ilija Tovilo ilutov@php.net
- Status: Draft
- Proposed Version: PHP 8.3
- Implementation: https://github.com/php/php-src/pull/9882
Introduction
The <php>array_unique()</php> function allows getting all unique values of a given array. Unfortunately, PHP has multiple definitions of equality and thus uniqueness. The most obvious one (i.e. <php>$a === $b</php>) is not supported by <php>array_unique()</php>.
This RFC proposes adding two new functions, <php>List\unique()</php> and <php>Assoc\unique()</php> as alternatives using strict equality (<php>===</php>) semantics, the former discarding and the latter preserving keys.
<php> List\unique([1, 2, 3, 1, '2', 3.0, new Foo, ['bar']]) > [1, 2, 3, '2', 3.0, Foo, ['bar']] Assoc\unique(['foo' => 'foo', 'bar' => 'bar', 'baz' => 'foo']) > ['foo' => 'foo', 'bar' => 'bar'] </php>
Proposal
Two new functions are added to PHP:
- <php>List\unique(bool $array): array</php>
- <php>Assoc\unique(bool $array): array</php>
Both functions return a new array containing unique values of the <php>$array</php> parameter. <php>List\unique()</php> will return a list, meaning the array will have continuous keys, starting from 0. <php>Assoc\unique()</php> will reuse the original arrays keys instead.
Uniqueness is based on the strict equality operator (<php>===</php>). Any two values that are strictly equal are considered duplicates and thus only once added to the resulting array. References are preserved.
Equality
Removing duplicates from arrays is a common use case provided by many programming languages. PHPs <php>array_unique()</php> has been there for ~23 years. However, PHP has multiple definitions of equality, four in particular supported by <php>array_unique</php>:
- <php>SORT_STRING</php> - Converts values to strings and compares with <php><</php>/<php>==</php>/<php>></php>
- <php>SORT_REGULAR</php> - Compares values directly with <php><</php>/<php>==</php>/<php>></php>
- <php>SORT_NUMERIC</php> - Converts values to doubles
- <php>SORT_LOCALE_STRING</php> - Converts values to strings and compares with
strcoll
Internally, <php>array_unique()</php> sorts the array to avoid comparing each value with every other value which would scale badly. For this reason, the second parameter <php>$flags</php> accepts the same <php>SORT_*</php> options as <php>sort()</php> function and friends.
None of these options support arrays and objects, and other primitive types are subject to subtle coercion issues. Additionally, coercion can lead to warnings that are likely undesirable, as the fact that these values are coerced for comparison is an implementation detail.
Keys
A common issue with many array functions is that they make no distinction between lists and associative arrays. Thus, it is often unclear whether a functions should discard or preserve keys. This is made evident by how the functions are used in user code, only considering them when some issue arises. This RFC proposes adding two separate functions specifically to force users to make a deliberate choice between the two, rather than doing so only after encountering issues with array keys.
Implementation
The new functions use a temporary hashmap internally. The array is iterated and each value is added to the hashmap. If
the value has not been added to the hashmap before, it is added to the resulting array. If it has been added, the value is
skipped. These new functions have a time complexity of O(n)
, whereas <php>array_unique()</php> has O(n log n)
(with the exception of <php>SORT_STRING</php> which also has O(n)
).
Backward Incompatible Changes
There are no backwards-incompatible changes in this RFC.
Alternative approaches
Previously, adding a new parameter to <php>array_unique()</php> was discussed. The discussion has revealed that most people would prefer a new function over fixing the existing function with a new parameter that might be more difficult to discover.
Vote
Voting opened on xxxx-xx-xx and closes on xxxx-xx-xx.