rfc:query_params

PHP RFC: Query Parameter Manipulation Support

Introduction

PHP being the most prominent web-focused programming language still doesn't have satisfactory built-in functionality to parse, manipulate and compose query parameters, even after 25 years of existence. Currently, one can only rely on the $_GET superglobal, and a handful of functions (e.g. parse_str(), http_build_query()), but a lot of functionality is missing, not to mention the fact that the only format which is truly supported is RFC 1866's application/x-www-form-urlencoded, because the query string of the current request is parsed to the $_GET superglobal according to this specification.

Proposal

The following classes and methods are proposed for addition:

namespace Uri {
    final readonly class QueryParams implements \IteratorAggregate, \Countable
    {
        public static function parseRfc3986(string $queryString): ?\Uri\QueryParams {}
 
        public static function parseRfc1866(string $queryString): \Uri\QueryParams {}
 
        public static function parseWhatWg(string $queryString): \Uri\QueryParams {}
 
        public static function fromArray(array $queryParams): \Uri\QueryParams {}
 
        public function __construct() {}
 
        public function append(string $name, mixed $value): static {}
 
        public function appendArray(string $name, array $value): static {}
 
        public function delete(string $name): static {}
 
        public function deleteValue(string $name, mixed $value): static {}
 
        public function has(string $name): bool {}
 
        public function hasValue(string $name, mixed $value): bool {}
 
        public function getFirst(string $name): ?string {}
 
        public function getLast(string $name): ?string {}
 
        public function getAll(string $name): array {}
 
        public function getArray(string $name): array {}
 
        public function list(): array {}
 
        public function getIterator(): \Traversable
 
        public function count(): int {}
 
        public function set(string $name, mixed $value): static {}
 
        public function setArray(string $name, array $value): static {}
 
        public function sort(): static {}
 
        public function toRfc3986String(): string {}
 
        public function toRfc1866String(): string {}
 
        public function toWhatWgString(): string {}
 
        public function __serialize(): array {}
 
        public function __unserialize(array $data): void {}
 
        public function __debugInfo(): array {}
    }
    final readonly class Uri
    {
        ...
 
        public function getQueryParams(): ?\Uri\QueryParams {}
 
        ...
    }
}
    final readonly class Url
    {
        ...
 
        public function getQueryParams(): ?\Uri\QueryParams {}
 
        ...
    }
}

Construction

QueryParams supports the following methods for instantiation:

  • parseRfc1866(): It parses a query string into a list of query parameters according to the processing and percent-decoding rules of the application/x-www-form-urlencoded media type, as defined by RFC 1866. This specification regards query parameters as a list of name-value pairs, where the two parts are separated by a “='” character, and the individual parameters are separated from each other by a “&” character (e.g. name1=value1&name2=value2).
  • parseRfc3986(): It parses a query string into a list of query parameters according to the percent-decoding rules of RFC 3986, with the caveat that this specification in fact does not specify exactly how query parameters are composed. That's why the implementation defines query parameters based on the definition of RFC 1866.
  • parseWhatWg(): It parses a query string into a list of query parameters according to the percent-decoding rules of the application/x-www-form-urlencoded media type, as defined by the WHATWG URL specification.
  • fromArray(): It takes an array of query parameters and directly composes the query parameter list object based on it. Besides scalar values, it can also accept complex types such as arrays according to the rules discussed in the "Supported types" section.
  • __construct(): It accepts an empty parameter list, and results in an empty query parameter list. This method allows building query parameters by starting from scratch.
$params = Uri\QueryParams::parseRfc3986("a=foo&b=bar");  // Successful instantiation
$params = Uri\QueryParams::parseFormData("a=foo&b=bar"); // Successful instantiation
$params = Uri\QueryParams::parseWhatWg("a=foo&b=bar");   // Successful instantiation
$params = Uri\QueryParams::fromArray(
    [
        "a" => "foo",
        "b" => "bar",
    ]
);                                                       // Successful instantiation - same result as above
$params = new Uri\QueryParams();                         // Successful instantiation - creates an empty query parameter list

It is also possible to create a QueryParams instance from an Uri\Rfc3986\Uri or an Uri\WhatWg\Url object:

$uri = new Uri\Rfc3986\Uri("https://example.com/?foo=bar");
$params = $uri->getQueryParams();          // First call creates a Uri\QueryParams instance
$params = $uri->getQueryParams();          // Subsequent calls reuse the already existing Uri\QueryParams instance
$uri = $uri->withQuery("foo=baz");         // Modification of the query string invalidates the already existing Uri\QueryParams instance
 
$url = new Uri\WhatWg\Url("https://example.com/?foo=bar");
$params = $url->getQueryParams();          // First call creates a Uri\QueryParams instance
$params = $url->getQueryParams();          // Subsequent calls reuse the already existing Uri\QueryParams instance
$url = $url->withQuery("foo=baz");         // Modification of the query string invalidates the already existing Uri\QueryParams instance

Uri\Rfc3986\Uri::getQueryParams() uses the normalized query string to instantiate QueryParams when possible. If the URI has not been normalized before, then the non-normalized query string is used. In practice, this doesn't make a big difference, because QueryParams itself also normalizes (percent-decodes) the input — you can read more on this topic later.

Uri\Rfc3986\Uri::getQueryParams() and Uri\WhatWg\Url::getQueryParams() return null if the query string is missing (e.g. https://example.com/), and an empty query parameter list is returned if the query string is empty (e.g. https://example.com/?).

$uri = new Uri\Rfc3986\Uri("https://example.com/");
echo $uri->getQueryParams();               // null
 
$uri = new Uri\Rfc3986\Uri("https://example.com/?");
echo $uri->getQueryParams();               // A new Uri\QueryParams containing zero items

The same example for WHATWG URL:

$url = new Uri\WhatWg\Url("https://example.com/");
echo $url->getQueryParams();               // null
 
$url = new Uri\WhatWg\Url("https://example.com/?");
echo $url->getQueryParams();               // A new Uri\QueryParams containing zero items

It's important to note that QueryParams doesn't validate query parameters appropriately during construction. This behavior is by design, because the idea of WHATWG URL's URLSearchParams class is that it's tolerant for reading, and QueryParams follow the same principle. Validation happens anyway when the recomposed query parameters are attempted to be written to a URI (via Uri\Rfc3986\Uri::withQuery() and Uri\WhatWg\Url::withQuery()). Although, as we'll see, invalid characters are automatically percent-encoded during query parameter recomposition, so the withQuery() calls won't fail in practice either.

$params = Uri\QueryParams::parseRfc3986("#foo=bar");     // Parses an invalid parameter name "#foo"
 
$uri = new Uri\Rfc3986\Uri("https://example.com/");
 
$uri = $uri->withQuery($params->toRfc3986String());      // Success: the query is automatically percent-encoded to "%23foo=bar"

The same example for WHATWG URL:

$params = Uri\QueryParams::parseWhatWg("#foo=bar");      // Parses an invalid parameter name "#baz"
 
$url = new Uri\WhatWg\Url("https://example.com/");
 
$url = $url->withQuery($params->toString());             // Success: the query is automatically percent-encoded to "%23foo=bar"

Please note that this RFC doesn't propose a Uri\Rfc3986\Uri::withQueryParams() method for updating the query string directly based on a query parameter list, because Uri\QueryParams supports multiple recomposition formats - and the user should choose from them explicitly. The example above recomposes the query parameters according to RFC 3986.

Neither the parse*(), nor the fromArray() factory methods can fail in practice: they only have memory-related failure cases which are handled by the PHP engine as a fatal error.

According to the WHATWG URL algorithm, the leading “?” character is removed during parsing. As opposed to this behavior, the leading “?” becomes part of the first query parameter name for RFC 3986 query params.

$params = Uri\QueryParams::parseRfc3986("?abc=foo");
// $params internally contains the ["?abc" => "foo"] key-value pair
 
$params = Uri\QueryParams::parseWhatWg("?abc=foo");
// $params internally contains the ["abc" => "foo"] key-value pair

All parse*() variants percent-decode the input automatically when constructing the QueryParams instances. This is necessary so that the classes can work with the unencoded query parameters.

$params = Uri\QueryParams::parseRfc3986("foo%5B%5D=b%61r");  // Percent-encoded form of "foo[]=bar"
// $params internally contains the ["foo[]" => "bar"] key-value pair
 
$params = Uri\QueryParams::parseRfc1866("foo%5B%5D=b%61r");  // Percent-encoded form of "foo[]=bar"
// $params internally contains the ["foo[]" => "bar"] key-value pair
 
$params = Uri\QueryParams::parseWhatWg("foo%5B%5D=b%61r");   // Percent-encoded form of "foo[]=bar"
// $params internally contains the ["foo[]" => "bar"] key-value pair

Parameter Retrieval

The has() and hasValue() methods can be used to find out if a parameter exists:

$params = Uri\QueryParams::parseRfc3986("foo=bar&baz=qux&baz=quux");
 
echo $params->has("baz");                 // true
echo $params->has("non-existent");        // false
 
echo $params->hasValue("foo", "bar");     // true 
echo $params->hasValue("foo", "baz");     // false

The has() method returns true if there is at least one parameter in the parameter list with the given name, false otherwise. On the other hand, hasValue() returns true if the given name and value both matches at least one parameter, otherwise it returns false.

The number of query parameters can be retrieved by calling the count() method:

$params = Uri\QueryParams::parseRfc3986("foo=bar&baz=qux&baz=quux");
 
echo $params->count();                    // 3

There are also a number of methods that can return a query parameter or an array of query parameters:

  • getFirst(): Retrieves the first parameter with the given name. This actually implements the get() method from the WHATWG URL specification.
  • getLast(): Retrieves the last parameter with the given name. It's a custom, PHP-specific method which doesn't have a WHATWG URL equivalent.
  • getAll(): Retrieves all parameters with the given name. This actually implements the getAll() method from the WHATWG URL specification.
  • list(): Retrieves all query parameters. It's also a custom, PHP-specific method which doesn't have a WHATWG URL equivalent.
$params = Uri\QueryParams::parseRfc3986("foo=bar&foo=baz&qux=quux");
 
echo $params->getFirst("foo");            // bar
echo $params->getFirst("non-existent");   // null
 
echo $params->getLast("foo");             // baz
echo $params->getLast("non-existent");    // null
 
echo $params->getAll("foo");              // ["bar", "baz"]
echo $params->getAll("non-existent");     // []
 
echo $params->list();                     // [["foo", "bar"], ["foo", "baz"], ["qux", "quux"]]

All these methods return the natively stored values without applying any transformations. That is, percent-encoding or decoding neither happens in the input, nor in the output.

$params = Uri\QueryParams::parseRfc3986("foo%5B%5D=b%61r");  // Internally stored as "foo[]=bar"
 
echo $params->getFirst("foo%5B%5D");     // null
echo $params->getFirst("foo[]");         // bar
 
echo $params->getLast("foo%5B%5D");      // null
echo $params->getLast("foo[]");          // bar
 
echo $params->getAll("foo%5B%5D");       // []
echo $params->getAll("foo[]");           // ["bar"]
 
echo $params->list();                     // [["foo[]", "bar"]]

Percent-Encoding and Decoding

QueryParams only performs percent-encoding when query parameters are recomposed to a query string (via to*String() methods), and they only perform percent-decoding when a query string is parsed into a query parameter list (via parse*() methods). The rest of the functionalities don't use percent-encoding or decoding.

QueryParams supports percent-encoding and decoding according to three specifications

  • RFC 1866 which specifies the percent-encoding and decoding rules of the application/x-www-form-urlencoded media type
  • RFC 3986 which defines the generic query string syntax.
  • URLSearchParams class specified by WHATG URL, which yet again builds upon the application/x-www-form-urlencoded media type for historic reasons, albeit slightly differently than how RFC 1866 specifies it.

The current section is going to have an overview about the percent-encoding and decoding details, as well as the differences between the aforementioned specifications.

According to RFC 1866, space characters are replaced by the plus character (+) during percent-encoding, and any characters that fall outside of the unreserved character set are percent-encoded. Percent-decoding inverts these operations.

This behavior clearly deviates from the percent-encoding rules of the query component of RFC 3986 which allows quite a few reserved characters to be present in the query component without percent-encoding (a few examples: “:”, “@”, “?”, “/”), not to mention the difference in how the space character is handled.

Regarding WHATWG URL's URLSearchParams class, as usually, a dedicated percent-encoding set is defined:

The application/x-www-form-urlencoded percent-encode set contains all code points, except the ASCII alphanumeric, U+002A (*), U+002D (-), U+002E (.), and U+005F (_).

WHATWG URL also defines a dedicated algorithm for “serialization” (in this context, serialization means recomposition - converting the list to a query string): the space code point is percent-encoded as the plus code point (+), and the rest of the code points in the percent-encoding set are encoded how WHATWG URL normally does so.

This behavior deviates from the percent-encoding rules of the query component of WHATWG URL, as the query percent-encode set contains much less characters, and the space code point is handled differently again.

It's also important to compare how the percent encoding rules of RFC 1866's as well as WHATWG URL's application/x-www-form-urlencoded media type differ: they handle the asterisk (*) and the tilde (~) symbols differently: RFC 1866 percent-encodes the first one, but WHATWG URL doesn't, however RFC 1866 doesn't percent-encode the latter one, but WHATWG URL does so.

Even though it comes from the percent-encoding definition directly, it may still be difficult to realize that the application/x-www-form-urlencoded media type (both RFC 1866's and WHATWG URL's definition) even also percent-encodes “%” itself, no matter that it's part of an existing percent-encoded octet. It's counterintuitive (normally, RFC 3986 and WHATWG URL does not percent-encode “%” twice) and quite unsafe behavior due to the double encoding vulnerability.

$params = Uri\QueryParams::fromArray(
    [
        ["foo" => "b%61r"],
    ]
);
echo $params->toFormDataString();                         // foo=b%2561r

As surprising as is, the Uri\QueryParams::toRfc1866String() and Uri\QueryParams::toWhatWgString() methods percent-encode “%” itself (thus “%” becomes “%25” first, and then “61r” is appended), rather than leaving the already percent-encoded octet alone. Another conclusion to note is that it's very important to pass unencoded input to the UriQueryParams and UrlQueryParams classes so that double-encoding cannot happen (the only exception when it is not a problem are the parse*() methods because they automatically percent-decode their input).

Recomposition

In order to be consistent with the design of Uri\Rfc3986\Uri and the Uri\WhatWg\Url classes, neither UriQueryParams, nor UrlQueryParams have a __toString() magic method. Instead, they contain custom to*String() methods to recompose the query string from the parsed query parameters.

$params = Uri\Rfc3986\UriQueryParams::parseRfc3986("foo=bar&foo=baz");
echo $params->toRfc3986String();          // foo=bar&foo=baz
echo $params->toFormDataString();         // foo=bar&foo=baz
 
$params = Uri\WhatWg\UrlQueryParams::parse("foo=bar&foo=baz");
echo $params->toString();                // foo=bar&foo=baz

All to*String() methods (Uri\Rfc3986\UriQueryParams::toRfc3986String(), Uri\Rfc3986\UriQueryParams::toFormDataString(), Uri\WhatWg\UrlQueryParams::toString()) automatically percent-encode their output according to the rules outlined in the previous section, otherwise it would be possible that an invalid output is returned.

$params = Uri\Rfc3986\UriQueryParams::fromArray([["foo[]" => "bar baz"]]);
echo $params->toRfc3986String();         // foo%5B%5D=bar%20baz
echo $params->toFormDataString();        // foo%5B%5D=bar+baz
 
$params = Uri\WhatWg\UrlQueryParams::fromArray([["foo[]" => "bar baz"]]);
echo $params->toString();                // foo%5B%5D=bar+baz

Unlike Uri\Rfc3986\Uri, the Uri\Rfc3986\UriQueryParams class doesn't have a toRawString() method because it could be misleading what it exactly does: toRawString() cannot really provide a “raw” representation of the query string, since automatic percent-encoding must happen any way to make the produced query string valid.

Relation to the query component

After learning about the details of the percent-encoding and decoding behavior of UriQueryParams and UrlQueryParams, it should be clarified how the new classes can interoperate with the existing Uri\Rfc3986\Uri and Uri\WhatWg\Url classes?

In case of UriQueryParams, full compatibility with Uri\Rfc3986\Uri can be achieved via the fromRfc3986() and toRfc3986String() methods:

$uri = new Uri\Rfc3986\Uri("https://example.com?foo=a b");
 
$params = $uri->getQueryParams();
// The above line is effectively the same as the following one:
$params = Uri\Rfc3986\UriQueryParams::fromRfc3986($uri->getQuery());
 
$uri = $uri->withQuery($params->toRfc3986String());
 
echo $uri->getQuery();                     // foo=a b

As it can be seen in the example above, the behavior is roundtripable: parsing a query string to a UriQueryParams instance and then modifying the original query string to the parsed one will result in the original query string. Unfortunately, this won't necessarily be the case when using parseFormData() or toFormDataString(), if the query string contains some specific characters (most notably, the space character):

$uri = new Uri\Rfc3986\Uri("https://example.com?foo=a b");
 
$params = $uri->getQueryParams();
 
$uri = $uri->withQuery($params->toFormDataString());
 
echo $uri->getQuery();                     // foo=a+b

Uri\WhatWg\UrlQueryParams and Uri\WhatWg\Url have the very same incompatibility due to the different percent-encoding and decoding algorithm, and this is even encoded in the WHATWG URL specification itself, so it's not possible to work around on PHP's side:

$url = new Uri\WhatWg\Url("https://example.com?foo=a b");
 
$params = $url->getQueryParams();
 
$url = $url->withQuery($params->toString());
 
echo $uri->getQuery();                     // foo=a+b

Modification

The append() method can be used to append a parameter to the end of the list. As normally, the same query parameter can be added multiple times:

$params = Uri\Rfc3986\UriQueryParams::parseRfc3986("foo=bar");
$params->append("baz", "qux");
$params->append("baz", "qaz");             // Appends "baz" twice
 
echo $params->toString();                  // foo=bar&baz=qux&baz=qaz

Updating a parameter is possible via the set() method:

$params = Uri\Rfc3986\UriQueryParams::parseRfc3986("foo=bar&foo=baz");
$params->set("foo", "baz");                // Overwrites the first item "foo", and removes the second one
$params->set("qux", "qaz");                // Appends a new item "qux"
 
echo $params->toString();                  // foo=bar&baz=qux&baz=qaz

Actually, the set() method has a hybrid behavior: if a parameter is not present in the list, then it adds it just like append() does. Otherwise, it overwrites the first item, and removes the rest of the occurrences.

Neither append(), nor set() do any percent-encoding or decoding of their arguments.

$params = new Uri\WhatWg\UrlQueryParams::parse();
$params->append("foo%5B%5D", "ab%63");     // Percent-encoded form of "foo[]=abc"
$params->set("bar%5B%5D", "de%66");        // Percent-encoded form of "bar[]=def"
 
echo $params->getFirst("foo%5B%5D");            // ab%63
echo $params->getFirst("bar%5B%5D");            // de%66

Removing parameters is possible via either the delete() or the deleteValue() method: the former one removes all occurrences of the given parameter name, while the latter one removes all occurrences of a parameter if the given name and value both matches it, as demonstrated below:

$params = Uri\Rfc3986\UriQueryParams::parseRfc3986("foo=bar&foo=baz&foo=qux");
$params->deleteValue("foo", "baz");        // Deletes the "foo=baz" parameter
$params->delete("foo");                    // Deletes the rest of the occurrences: "foo=bar" and "foo=qux"
$params->delete("non-existent");           // The parameter is not present: nothing happens

Finally, sort() sorts the query parameter list alphabetically:

$params = Uri\Rfc3986\UriQueryParams::parseRfc3986("foo=bar&baz=qux&baz=quux");
$params->sort();
 
echo $params->toString();                  // baz=qux&baz=quux&foo=bar

The main purpose of sort() is to provide a consistent order of the key-value pairs (e.g. to increase cache hits), therefore more advanced features such as sorting in descending order, or user-provided comparison methods are not proposed.

Supported types

What's also important to clarify is how non-string values are mapped to query parameters which inherently have string type? PHP's https://www.php.net/manual/en/function.http-build-query.php and functions can basically map any type to query parameters, however, this is purely a PHP-specific behavior, and as such, type mapping rules are out of scope of both RFC 3986 and WHATWG URL: RFC 3986 completely omits any information how query parameters should be built, while WHATWG URL's URLSearchParams only accepts and returns string data.

The position of this RFC is that it's important to follow the road that http_build_query() has already paved because of better developer experience and better interoperability with the existing ecosystem. That's why the following type mapping behavior is proposed when a query parameter is added/updated:

  • bool: Becomes string “0” (in case of false) or string “1” (in case of true)
  • int: Becomes a numeric string (123 -> “123”)
  • float: Becomes a decimal string (3.14 -> “3.14”)
  • resource: Invalid mapping, an exception is thrown
  • array:
    • empty array: An empty array has zero items, therefore empty arrays are omitted from the query parameter list.
    • list: An array is a list if its keys are consecutive integers starting from 0. Lists are converted to query parameters by repeating the given query parameter name appended by a bracket pair ([]) along with each value in the list mapped recursively according to the currently described type mapping rules. E.g. adding a query parameter with the array name and the [1, false, “foo”] value will result in an array[]=1&array[]=0&array[]=foo query string.
    • map: An array is a map if it is not a list. Maps are converted to query parameters by appending the array keys contained within brackets ([]) to the given query parameter name along with each value in the map mapped recursively according to the currently described type mapping rules. E.g. adding a query parameter with an array name and the [1 => 1, 2 => true, 3 => “foo”] value will result in an array[1]=1&array[2]=1&array[3]=foo query string.
  • enum:
    • backed enums are converted to their backing value
    • enums without backing type are invalid, and an exception is thrown
  • object: invalid mapping, an exception is thrown

The above conversion rules work for both UriQueryParams and UrlQueryParams. However, Uri\Rfc3986\UriQueryParams can additionally properly handle null values: a null input is mapped to a query component so that only the parameter name is present — the “=” and the parameter value is omitted. On the other hand, Uri\WhatWg\UrlQueryParams converts null values to an empty string. For reference, http_build_query() omits parameters with null values.

A few examples demonstrating how UriQueryParams handles scalar types:

$params = new Uri\Rfc3986\UriQueryParams();
 
$params->append("null", null);
$params->append("bool", true);
$params->append("int", 123);
$params->append("float", 3.14);
 
var_dump($params->getFirst("null"));        // NULL
var_dump($params->getFirst("bool"));        // string(1) "1"
var_dump($params->getFirst("int"));         // string(3) "123"
var_dump($params->getFirst("float"));       // string(4) "3.14"
 
echo $params->toString();                   // null&bool=1&int=123&float=3.14

Let's also see a few examples about how UrlQueryParams handles scalar types. Note how null is represented differently than in case of UriQueryParams:

$params = new Uri\WhatWg\UrlQueryParams();
 
$params->append("null", null);
$params->append("bool", true);
$params->append("int", 123);
$params->append("float", 3.14);
 
var_dump($params->getFirst("null"));        // string(0) ""
var_dump($params->getFirst("bool"));        // string(1) "1"
var_dump($params->getFirst("int"));         // string(3) "123"
var_dump($params->getFirst("float"));       // string(4) "3.14"
 
echo $params->toString();                   // null=&bool=1&int=123&float=3.14

Array API

In order to better support arrays (again, which is a purely PHP-specific feature), the current RFC proposes a dedicated API. This way, the rest of the methods can follow WHATWG URL without any customization, and the array API can have its custom behavior.

In order to instantiate UriQueryParams and UrlQueryParams containing arrays, one can use the fromArray() factory method:

$params = Uri\Rfc3986\UriQueryParams::fromArray(
    [
        "empty" => []
        "list" => ["a", "b", "c"],
        "map" => ["a" => 0, "b" => 1, "c" => 2],
    ]
);

In order to retrieve an array of query parameters, the getArray() method can be used. This behaves similarly to the getAll() method, but it actually retrieves all query params whose name start with the supplied $name argument, and possibly only differ from it by the [...] suffix. Let's see an example:

$params = Uri\Rfc3986\UriQueryParams::fromArray(
    [
        "empty" => []
        "list" => ["a", "b", "c"],
        "map" => ["a" => 0, "b" => 1, "c" => 2],
    ]
);
 
/*
Internally, this results in the the following array:
 
array(4) {
  ["list"]=>
  array(3) {
    [0]=>
    string(1) "a"
    [1]=>
    string(1) "b"
    [2]=>
    string(1) "c"
  }
  ["map"]=>
  array(3) {
    ["a"]=>
    string(1) "0"
    ["b"]=>
    string(1) "1"
    ["c"]=>
    string(1) "2"
  }
}
*/
 
echo $params->getFirst("empty");            // null
echo $params->getAll("empty");              // []
echo $params->getArray("empty");            // []
 
echo $params->getFirst("list");             // "a"
echo $params->getAll("list");               // []
echo $params->getAll("list[]");             // ["a", "b", "c"]
echo $params->getArray("list");             // ["a", "b", "c"]
 
echo $params->getFirst("map");              // 0
echo $params->getAll("map");                // []
echo $params->getArray("map");              // ["a" => "0", "b" => "1", "c" => "2"]

Similarly to the append() and set() methods, there are appendArray() and setArray() methods:

$params = new Uri\Rfc3986\UriQueryParams();
 
$params->appendArray("empty", []);
$params->appendArray("list", ["a", "b", "c"]);
$params->appendArray("map", ["a" => 0, "b" => 1, "c" => 2]);
$params = new Uri\Rfc3986\UriQueryParams();
 
$params->appendArray("empty", []);
$params->appendArray("list", ["a", "b", "c"]);
$params->appendArray("map", ["a" => 0, "b" => 1, "c" => 2]);
 
echo $params->getFirst("empty");            // null
echo $params->getAll("empty");              // []
echo $params->getArray("empty");            // []
 
echo $params->getFirst("list");             // "a"
echo $params->getAll("list");               // []
echo $params->getAll("list[]");             // ["a", "b", "c"]
echo $params->getArray("list");             // ["a", "b", "c"]
 
echo $params->getFirst("map");              // 0
echo $params->getAll("map");                // []
echo $params->getArray("map");              // ["a" => "0", "b" => "1", "c" => "2"]
 
echo $params->toString();                   // list=a&list=b&list=c&map%5Ba%5D=0&map%5Bb%5D=1&map%5Bc%5D=2

And a few examples demonstrating how UrlQueryParams handles complex types:

$params = new Uri\WhatWg\UrlQueryParams();
 
$params->appendArray("empty", []);
$params->appendArray("list", ["a", "b", "c"]);
$params->appendArray("map", ["a" => 0, "b" => 1, "c" => 2]);
 
echo $params->getFirst("empty");            // null
echo $params->getAll("empty");              // []
echo $params->getArray("empty");            // []
 
echo $params->getFirst("list");             // "a"
echo $params->getAll("list");               // []
echo $params->getAll("list[]");             // ["a", "b", "c"]
echo $params->getArray("list");             // ["a", "b", "c"]
 
echo $params->getFirst("map");              // 0
echo $params->getAll("map");                // []
echo $params->getArray("map");              // ["a" => "0", "b" => "1", "c" => "2"]
 
echo $params->toString();                   // list=a&list=b&list=c&map%5Ba%5D=0&map%5Bb%5D=1&map%5Bc%5D=2

Finally, let's see how multi-dimensional arrays are represented:

$params = new Uri\Rfc3986\UriQueryParams();
 
$params->appendArray(
    "array",
    [
        "list" => [1, 2, 3],
        "map" => ["foo" => 1, "bar" => 2, "baz" => 3]
    ]
);
 
var_dump($params->getArray("array"));
 
/*
array(4) {
  ["array[list]"]=>
  array(3) {
    [0]=>
    string(1) "1"
    [1]=>
    string(1) "2"
    [2]=>
    string(1) "3"
  }
  ["array[map][foo]"]=>
  string(1) "1"
  ["array[map][bar]"]=>
  string(1) "2"
  ["array[map][baz]"]=>
  string(1) "3"
}
*/

Class signature

The UriQueryParams and UrlQueryParams classes are final for the same reason as all the other URI classes are final: mainly, in order to make followup changes possible without breaking backward compatibility.

Additionally, UriQueryParams and UrlQueryParams could be readonly classes, but this still has to be decided.

The UriQueryParams and UrlQueryParams classes implement the IteratorAggregate and the Countable interfaces. Implementing IteratorAggregate seems straightforward at the first sight (query parameter names could be returned as iterator keys, while query parameter values could be returned as iterator values), unfortunately, it's more tricky than that due to query components that share the same name, e.g.: param=foo&param=bar&param=baz. In this case, the same key (param) would be repeated by default 3 times - and it's actually not possible to support with iterators.

That's why the iterator returns each query parameter name and value as a list of pairs. Similarly to the get*() methods, the iterator returns the “raw” parameter names and values without percent-encoding. Let's see an example:

$params = Uri\Rfc3986\UriQueryParams::parseRfc3986("param=foo&param=bar&param=baz");
 
foreach ($params as $key => $value) {
    echo "$key => $value[0], $value[1]";
}
 
/*
0 => param, foo
1 => param, bar
2 => param, baz
*/

Cloning

Cloning of UriQueryParams and UrlQueryParams is supported.

$params1 = Uri\Rfc3986\UriQueryParams::parseRfc3986("foo=bar&foo=baz");
$params2 = clone $params1;
$params2->append("foo", "qux");
 
echo $params1->toRfc3986String();        // foo=bar&foo=baz
echo $params2->toRfc3986String();        // foo=bar&foo=baz&foo=qux

UrlQueryParams works the same way:

$params1 = Uri\WhatWg\UrlQueryParams::parse("foo=bar&foo=baz");
$params2 = clone $params1;
$params2->append("foo", "qux");
 
echo $params1->toString();                // foo=bar&foo=baz
echo $params2->toString();                // foo=bar&foo=baz&foo=qux

Serialization

Both classes support serialization and deserialization via the the new serialization API. The only implementation gotcha is that the serialized format is slightly unexpected: instead of recomposing the query parameters into a query string, the individual query parameter name and value pairs are serialized as an array of key-value pairs, similarly to the output of the list() method. During deserialization, the query parameter list is directly created from this array without any transformation (the same way how the fromArray() method works).

The main advantage of this choice is that the query parameters can be serialized and deserialized as-is, without any modifications (remember, the recomposition algorithms must percent-encode their output, and percent-decoding is needed during parsing, both of which processes modify the original data). Additionally, this behavior is more efficient than the former one, because it eliminates the overhead of parsing, including percent-encoding and decoding.

Debugging

Both classes contain a __debugInfo() method that returns all items in the query parameter list in order to make debugging easier. Effectively, this has a similar output to the list() method.

$params = Uri\Rfc3986\UriQueryParams::parseRfc3986("foo=bar&foo=baz&foo=qux");
var_dump($params);
 
/*
object(Uri\Rfc3986\UriQueryParams)#1 (1) {
  ["params"]=> array(3) {
    [0]=>
    array(2) {
      [0]=>
      string(3) "foo",
      [1]=>
      string(3) "bar"
    }
    [1]=>
    array(2) {
      [0]=>
      string(3) "foo",
      [1]=>
      string(3) "baz"
    }
    [2]=>
    array(2) {
      [0]=>
      string(3) "foo",
      [1]=>
      string(3) "qux"
    }
  }
}
*/
 
$params = Uri\WhatWg\UrlQueryParams::parse("foo=bar&foo=baz&foo=qux");
var_dump($params);
 
/*
object(Uri\WhatWg\UrlQueryParams)#1 (1) {
  ["params"]=> array(3) {
    [0]=>
    array(2) {
      [0]=>
      string(3) "foo",
      [1]=>
      string(3) "bar"
    }
    [1]=>
    array(2) {
      [0]=>
      string(3) "foo",
      [1]=>
      string(3) "baz"
    }
    [2]=>
    array(2) {
      [0]=>
      string(3) "foo",
      [1]=>
      string(3) "qux"
    }
  }
}
*/

Relation to $_GET

The $_GET superglobal stores the query parameters of the current request, percent decoded according to RFC 1866. That's why the proposed UriQueryParams and UrlQueryParams classes are its direct alternatives when it comes to processing the current request. The current RFC lays the foundations for populating $_GET according to the other relevant specifications besides RFC 1866 (RFC 3986 and WHATWG URL), for example, by adding support for a new php.ini configuration option.

The position of this RFC though is that $_GET (and superglobals in general) shouldn't be changed in any way, but rather gradually phased out on the long term by offering better alternatives. In this case, UriQueryParams and UrlQueryParams can be used directly instead of $_GET, so migrating away from the superglobal usage should be straightforward in most cases.

Given the following piece of code:

$order = isset($_GET["order"]) ? (string) $_GET["order"] : null;
$limit = isset($_GET["limit"]) ? (int) $_GET["limit"] : null;

It becomes possible to migrate to the new API roughly the following way:

$queryParams = UriQueryParams::parseRfc3986($_SERVER["QUERY_STRING"]);
 
$order = $queryParams->getFirst("order");
$limit = $queryParams->has("limit") ? (int) $queryParams->getFirst("limit") : null;

It should also be noted that introducing a php.ini option for controlling the rules how $_GET is filled in is not a safe solution, and could possibly cause security vulnerabilities due to parsing confusion, not to mention the headache for libraries which should prepare for all possible configuration options. That's why the current RFC leaves $_GET out of its scope.

Backward Incompatible Changes

All the proposed features are backward compatible with existing code.

Proposed PHP Version(s)

Next minor version (PHP 8.6).

RFC Impact

To the Ecosystem

What effect will the RFC have on IDEs, Language Servers (LSPs), Static Analyzers, Auto-Formatters, Linters and commonly used userland PHP libraries?

To Existing Extensions

Existing extensions that manipulate or access query parameters can continue to use the $_GET superglobal without any changes, however, they are encouraged to migrate to the newly added QueryParams class and its own API.

To SAPIs

None. SAPIs should continue to fill in the raw query string to the sapi_globals.request_info.query_string global variable.

Open Issues

None.

Future Scope

  • Adding support for passing objects to QueryParams: if a class implements a new interface (QueryStringable?), then it could become possible to serialize the object when appending/setting it to QueryParams.

Voting Choices

Add support for query parameter manipulation as outlined in the RFC?
Real name Yes No Abstain
Final result: 0 0 0
This poll has been closed.

Patches and Tests

Links to proof of concept PR.

If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.

Implementation

After the RFC is implemented, this section should contain:

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Links to external references, discussions, or RFCs.

Rejected Features

Keep this updated with features that were discussed on the mail lists.

Changelog

If there are major changes to the initial proposal, please include a short summary with a date or a link to the mailing list announcement here, as not everyone has access to the wikis' version history.

rfc/query_params.txt · Last modified: by kocsismate