rfc:list_assoc_unique

This is an old revision of the document!


PHP RFC: List\unique() and Assoc\unique()

Introduction

The <php>array_unique()</php> function allows getting all unique values of a given array. Unfortunately, PHP has multiple definitions of equality and thus uniqueness. The most obvious one (i.e. <php>$a === $b</php>) is not supported by <php>array_unique()</php>.

This RFC proposes adding two new functions, <php>List\unique()</php> and <php>Assoc\unique()</php> as alternatives using strict equality (<php>===</php>) semantics, the former discarding and the latter preserving keys.

<php> List\unique([1, 2, 3, 1, '2', 3.0, new Foo, ['bar']]) > [1, 2, 3, '2', 3.0, Foo, ['bar']] Assoc\unique(['foo' => 'foo', 'bar' => 'bar', 'baz' => 'foo']) > ['foo' => 'foo', 'bar' => 'bar'] </php>

Proposal

Two new functions are added to PHP:

  • <php>List\unique(bool $array): array</php>
  • <php>Assoc\unique(bool $array): array</php>

Both functions return a new array containing unique values of the <php>$array</php> parameter. <php>List\unique()</php> will return a list, meaning the array will have continuous keys, starting from 0. <php>Assoc\unique()</php> will reuse the original arrays keys instead.

Uniqueness is based on the strict equality operator (<php>===</php>). Any two values that are strictly equal are considered duplicates and thus only once added to the resulting array. References are preserved.

Equality

Removing duplicates from arrays is a common use case provided by many programming languages. PHPs <php>array_unique()</php> has been there for ~23 years. However, PHP has multiple definitions of equality, four in particular supported by <php>array_unique</php>:

  • <php>SORT_STRING</php> - Converts values to strings and compares with <php><</php>/<php>==</php>/<php>></php>
  • <php>SORT_REGULAR</php> - Compares values directly with <php><</php>/<php>==</php>/<php>></php>
  • <php>SORT_NUMERIC</php> - Converts values to doubles
  • <php>SORT_LOCALE_STRING</php> - Converts values to strings and compares with strcoll

Internally, <php>array_unique()</php> sorts the array to avoid comparing each value with every other value which would scale badly. For this reason, the second parameter <php>$flags</php> accepts the same <php>SORT_*</php> options as <php>sort()</php> function and friends.

None of these options support arrays and objects, and other primitive types are subject to subtle coercion issues. Additionally, coercion can lead to warnings that are likely undesirable, as the fact that these values are coerced for comparison is an implementation detail.

Keys

A common issue with many array functions is that they make no distinction between lists and associative arrays. Thus, it is often unclear whether a functions should discard or preserve keys. This is made evident by how the functions are used in user code, only considering them when some issue arises. This RFC proposes adding two separate functions specifically to force users to make a deliberate choice between the two, rather than doing so only after encountering issues with array keys.

Implementation

The new functions use a temporary hashmap internally. The array is iterated and each value is added to the hashmap. If the value has not been added to the hashmap before, it is added to the resulting array. If it has been added, the value is skipped. These new functions have a time complexity of O(n), whereas <php>array_unique()</php> has O(n log n) (with the exception of <php>SORT_STRING</php> which also has O(n)).

Backward Incompatible Changes

There are no backwards-incompatible changes in this RFC.

Alternative approaches

Previously, adding a new parameter to <php>array_unique()</php> was discussed. The discussion has revealed that most people would prefer a new function over fixing the existing function with a new parameter that might be more difficult to discover.

Vote

Voting opened on xxxx-xx-xx and closes on xxxx-xx-xx.

Add List\unique() and Assoc\unique() functions?
Real name Yes No
Final result: 0 0
This poll has been closed.
rfc/list_assoc_unique.1669917182.txt.gz · Last modified: 2022/12/01 17:53 by ilutov