====== PHP RFC: Followup Improvements for ext/uri ======
* Version: 0.1
* Date: 2025-10-17
* Author: Máté Kocsis, kocsismate@php.net
* Status: Under Discussion
* Target version: next minor version (PHP 8.6)
* Implementation: https://github.com/kocsismate/php-src/pull/9
===== Introduction =====
This RFC proposes a set of follow-up improvements to the [[rfc:url_parsing_api|URL Parsing API RFC]]. It extends the ''Uri\Rfc3986\Uri'' and ''Uri\WhatWg\Url'' classes with additional capabilities, several of which were already discussed or requested during the original RFC’s review process.
While these features improve the overall usability of the URI extension, they were intentionally left out of the initial proposal. In some cases, they required further discussion; in others, they were considered non-essential at the time. Deferring them allowed the original RFC to remain focused and avoided further increasing its scope.
===== Proposal =====
The following new functionality is introduced in this proposal:
- [[#uri_building|URI Building]]
- [[#uri_type_detection|URI Type Detection]]
- [[#host_type_detection|Host Type Detection]]
- [[#percent-encoding_support|Percent-Encoding Support]]
Originally, the proposal included two more topics:
* Query Parameter manipulation support, which was later separated to its own RFC at https://wiki.php.net/rfc/query_params
* URI/URL Path Segment Support, which was also separated to its own RFC at https://wiki.php.net/rfc/uri_path_segments
Each feature proposed is voted separately and requires a 2/3 majority.
==== URI Building ====
Currently, only **already existing (and validated)** URIs can be manipulated via [[https://wiki.php.net/rfc/url_parsing_api#component_modification|wither methods]]. These calls always create a new instance so that immutability of URIs is preserved. Even though this behavior has plenty of advantages, there's at least one disadvantage: instance creation has a performance overhead. This is especially problematic if a lot of URI components have to be modified in the same time, because a lot of objects are "wasted" through intermediate instantiations.
$uri1 = Uri\Rfc3986\Uri::parse("http://example.com");
$uri2 = $uri1
->withScheme("https")
->withHost("example.net")
->withPath("/foo/bar"); // This creates 3 objects altogether!
Besides its suboptimal performance, another drawback of the current wither-based solution is that URI creation from the scratch is currently not possible: one always has to create a valid URI first. The empty string is a valid RFC 3986 URI, that's why it may seem a good candidate for an initial URI for URI building, but unfortunately, it's not valid for WHATWG URL. And anyway, the success of some transformations depend on the current state (which is a form of temporal coupling):
$uri1 = Uri\Rfc3986\Uri::parse("");
$uri2 = $uri1
->withScheme("https")
->withUserInfo("user:pass") // throws Uri\InvalidUriException: Cannot set a userinfo without having a host
->withHost("example.com");
$uri2 = $uri1
->withScheme("https")
->withHost("example.com")
->withUserInfo("user:pass") // No exception is thrown
In order to provide a more ergonomic and efficient solution for URI building, a fluent API is proposed that implements the [[https://refactoring.guru/design-patterns/builder|Builder pattern]].
$uriBuilder = new Uri\Rfc3986\UriBuilder()
->setScheme("https")
->setUserInfo("user:pass")
->setHost("example.com")
->setPort(8080)
->setPath("/foo/bar")
->setQuery("a=1&b=2"])
->setFragment("section1")
$uri = $uriBuilder->build(); // URI instance creation is only done at this point
echo $uri->toRawString(); // https://user:pass@example.com:8080/foo/bar?a=1&b=2#section1
The same works for WHATWG URL:
$urlBuilder = new Uri\WhatWg\UrlBuilder()
->setScheme("https")
->setUsername("user")
->setPassword("pass")
->setHost("example.com")
->setPort(8080)
->setPath("/foo/bar")
->setQuery("a=1&b=2"]
->setFragment("section1")
$url = $urlBuilder->build(); // URL instance creation is only done at this point
echo $url->toAsciiString(); // https://user:pass@example.com:8080/foo/bar?a=1&b=2#section1
When a Builder instance is not instantiated by ourselves or a trusted party, one cannot be sure whether it already has any components set. Therefore, it's highly recommended to reset the instance state before usage, if a completely clean state is needed:
function buildUri(Uri\Rfc3986\UriBuilder $builder): void
{
// Was there any component set before?
$builder->reset();
// Further usage is safe now...
}
function buildUrl(Uri\WhatWg\UrlBuilder $builder): void
{
// Was there any component set before?
$builder->reset();
// Further usage is safe now...
}
The ''reset()'' method also comes handy when the same Builder instance is reused to instantiate multiple URIs/URLs in a row.
The complete class signatures to be added are the following:
namespace Uri\Rfc3986 {
final class UriBuilder
{
public function __construct() {}
public function reset(): static {}
/**
* @throws Uri\InvalidUriException
*/
public function setScheme(?string $scheme): static {}
/**
* @throws Uri\InvalidUriException
*/
public function setUserInfo(#[\SensitiveParameter] ?string $userInfo): static {}
/**
* @throws Uri\InvalidUriException
*/
public function setHost(?string $host): static {}
/**
* @throws Uri\InvalidUriException
*/
public function setPath(string $path): static {}
/**
* @throws Uri\InvalidUriException
*/
public function setQuery(?string $query): static {}
/**
* @throws Uri\InvalidUriException
*/
public function setFragment(?string $fragment): static {}
/**
* @throws Uri\InvalidUriException
*/
public function build(?\Uri\Rfc3986\Uri $baseUrl = null): \Uri\Rfc3986\Uri {}
}
}
namespace Uri\WhatWg {
final class UrlBuilder
{
public function __construct() {}
public function reset(): static {}
/**
* @throws Uri\WhatWg\InvalidUrlException
*/
public function setScheme(?string $scheme): static {}
/**
* @throws Uri\WhatWg\InvalidUrlException
*/
public function setUsername(?string $username): static {}
/**
* @throws Uri\WhatWg\InvalidUrlException
*/
public function setPassword(#[\SensitiveParameter] ?string $password): static {}
/**
* @throws Uri\WhatWg\InvalidUrlException
*/
public function setHost(?string $host): static {}
/**
* @throws Uri\WhatWg\InvalidUrlException
*/
public function setPath(string $path): static {}
/**
* @throws Uri\WhatWg\InvalidUrlException
*/
public function setQuery(?string $query): static {}
/**
* @throws Uri\WhatWg\InvalidUrlException
*/
public function setFragment(?string $fragment): static {}
/**
* @param array $errors
* @throws Uri\WhatWg\InvalidUrlException
*/
public function build(?\Uri\WhatWg\Url $baseUrl = null, &$errors = null): \Uri\WhatWg\Url {}
}
}
The builder objects perform validation at two distinct levels:
* **Validation of individual components:** Each setter method validates the syntactic correctness of its input as early as possible. For example, the scheme component cannot contain percent-encoded octets, therefore ''setScheme()'' will throw if a "%" character is encountered. These checks are performed on a best-effort basis, as the URI/URL grammar is context-sensitive and not all constraints can be validated in isolation. For instance, the valid form of the path component depends on whether a scheme is present.
* **Validation of the global state:** Certain constraints depend on the overall structure of the URI/URL and can only be validated once all components are known. Such validations are deferred until the ''build()'' method is invoked. For example, RFC 3986 requires the host component to be present when the userinfo is set. Deferring these checks avoids temporal coupling between setter calls and allows the builder to be used in a flexible way.
An example for component syntax validation:
$uriBuilder = new Uri\Rfc3986\UriBuilder()
->setScheme("http%80"); // Throws a Uri\InvalidUriException because the scheme is not well formed
An example for validation of the global state:
$uriBuilder = new Uri\Rfc3986\UriBuilder()
->setScheme("https")
->setUserInfo("user:pass"); // Doesn't throw an exception yet
$uri = $uriBuilder->build(); // Throws an Uri\InvalidUriException because the host is not present, but the userinfo is
=== Design considerations ===
== Builder design pattern ==
Why is a complex Builder pattern based approach is proposed instead of a much simpler [[https://refactoring.guru/design-patterns/factory-method|Factory Method]] based one? The factory method could be as simple as the following:
namespace Uri\Rfc3986 {
final readonly class Uri
{
...
public static function fromComponents(
?string $scheme = null, ?string $host = null, string $path = "",
?string $userInfo = null, ?string $queryString = null, ?string $fragment = null
) {}
...
}
}
namespace Uri\WhatWg {
final readonly class Url
{
...
public static function fromComponents(
string $scheme, ?string $host = "", string $path = "",
?string $username = null, ?string $password = null,
?string $queryString = null, ?string $fragment = null
) {}
...
}
}
The current RFC proposes the Builder pattern based approach because of its flexibility: it makes it possible to add more convenience methods in the future.
== Dedicated classes ==
This RFC proposes a dedicated Builder class for both RFC 3986 and WHATWG URL, instead of a single, unified implementation with 2 ''build()'' methods (e.g. ''buildUri()'' and ''buildUrl()''). This decision has the following reasons:
* The two specifications don't recognize the same components. RFC 3986 has the userinfo component, while WHATWG URL has a separate ''username'' and ''password'' component instead. Even though these incompatibilities are probably possible to workaround, the position of this RFC is that it's better not to try to maintain compatibility artificially.
* RFC 3986 only requires the ''path'' component to be present (that's why the empty string is a valid RFC 3986 URI), while WHATWG URL mandates the presence of the ''scheme'' component too. This distinction is visible from the proposed signatures: while the ''Uri\Rfc3986\UriBuilder::setScheme()'' method accepts a ''string'' or ''null'', ''Uri\WhatWg\UrlBuilder::setScheme()'' only accepts a ''string'' parameter. The same distinction is already present in the ''Uri\Rfc3986\Uri::withScheme()'' and the ''Uri\WhatWg\Url::withScheme()'' methods.
* Setter methods validate the input based on the rules of the specification they implement. For example, RFC 3986 URIs cannot contain Unicode characters, so all setters fail when such a character is passed to them. On the other hand, WHATWG URL can handle Unicode characters, and setters won't fail when they encounter one. If a single, unified Builder class was proposed, performing validations early during the setter calls wouldn't be possible, only during the ''build*()'' method calls. According to the proposal, this would lead to a counterintuitive behavior because of the delayed feedback loop.
== Mutability ==
The ''UriBuilder'' and ''UrlBuilder'' classes are intentionally designed as mutable objects.
A Builder’s primary purpose is incremental construction and modification. In such workflows, immutability would require creating a new instance after every state change (e.g. setting the scheme, host, etc.).
In contrast to value objects such as ''Uri\Rfc3986\Uri'' or ''Uri\WhatWg\Url'', which represent fully parsed and normalized identifiers and therefore benefit from immutability, a Builder is inherently a transitional construct. It is not meant to represent a stable value but to facilitate step-by-step assembly.
Making the Builder classes to be immutable would:
* introduce avoidable performance overhead due to repeated allocations
* more complicate usage
* provide limited practical safety benefits, since the Builder is not intended for concurrent sharing.
== Setter naming convention ==
Setter methods of the ''UriBuilder'' and ''UrlBuilder'' classes follow the naming convention which is already widespread among internal functions: they use a ''set'' prefix, e.g. ''setScheme()'', ''setHost()''. The current RFC rejects the usage of any other naming convention, most notably the omission of the ''set'' prefix (e.g. ''scheme()'', ''host()'') due to the following reasons:
* The ''set'' prefix adds additional context about the intended behavior: all proposed setters completely overwrite the related component. E.g. ''setQuery()'' neither prepends nor appends its input to the existing query string, but it overwrites the whole component. If ''set'' was omitted from the method name, then this additional context was completely missing, and people could have even less idea about what was going to happen when they use this method.
* Using the ''set'' prefix for the setters would allow the addition of other convenience methods in the future more naturally: e.g. ''appendQueryParams()'', ''appendPathSegments()'' etc.
=== Voting ===
namespace Uri\Rfc3986 {
enum UriType
{
case AbsolutePathReference;
case RelativePathReference;
case NetworkPathReference;
case Uri;
}
final readonly class Uri
{
...
public function getUriType(): Uri\Rfc3986\UriType {}
...
}
}
This way, it becomes easier to detect URI types:
$uri = new Uri\Rfc3986\Uri("https://example.com");
var_dump($uri->getUriType()); // Uri\Rfc3986\UriType::Uri
$uri = new Uri\Rfc3986\Uri("https:");
var_dump($uri->getUriType()); // Uri\Rfc3986\UriType::Uri
$uri = new Uri\Rfc3986\Uri("/foo");
var_dump($uri->getUriType()); // Uri\Rfc3986\UriType::AbsolutePathReference
$uri = new Uri\Rfc3986\Uri("foo");
var_dump($uri->getUriType()); // Uri\Rfc3986\UriType::RelativePathReference
$uri = new Uri\Rfc3986\Uri("//host.com/foo");
var_dump($uri->getUriType()); // Uri\Rfc3986\UriType::NetworkPathReference
The position of this RFC is that identifying the distinction between URIs and [[https://datatracker.ietf.org/doc/html/rfc3986#section-4.3|absolute URIs]] (such URIs that don't include a fragment component) doesn't need special support, therefore a dedicated ''Uri\Rfc3986\UriType::AbsoluteUri'' enum case is omitted from the proposal.
The WHATWG URL specification defines some [[https://url.spec.whatwg.org/#is-special|special schemes]] (''http'', ''https'', ''ftp'', ''file'', ''ws'', ''wss''), which have distinct parsing and serialization rules. In order to make checks for special URLs easier to perform, a new ''Uri\WhatWg\Url::isSpecialScheme()'' method is added:
namespace Uri\WhatWg {
final readonly class Url
{
...
public function isSpecialScheme(): bool {}
...
}
}
This enables low-level control for applications that need to mirror WHATWG behavior in parsing or normalization.
$url = new Uri\WhatWg\Url("https://example.com");
var_dump($url->isSpecialScheme()); // true
$url = new Uri\WhatWg\Url("custom:example");
var_dump($url->isSpecialScheme()); // false
namespace Uri\Rfc3986 {
enum UriHostType
{
case IpV4;
case IpV6;
case IpVFuture;
case RegisteredName;
}
final readonly class Uri
{
...
public function getHostType(): ?\Uri\Rfc3986\UriHostType {}
...
}
}
namespace Uri\WhatWg {
enum UrlHostType
{
case IpV4;
case IpV6;
case Domain;
case Opaque;
case Empty;
}
final readonly class Url
{
...
public function getHostType(): ?\Uri\WhatWg\UrlHostType {}
...
}
}
The new ''getHostType()'' methods return the type of the host component for both specifications.
Let's see a few examples for RFC 3986:
$uri = new Uri\Rfc3986\Uri("https://192.168.0.1/");
echo $uri->getHostType(); // UriHostType::IpV4
$uri = new Uri\Rfc3986\Uri("https://[2001:db8::1]/");
echo $uri->getHostType(); // UriHostType::IpV6
$uri = new Uri\Rfc3986\Uri("https://[v1.1.2.3]/");
echo $uri->getHostType(); // UriHostType::IpVFuture
$uri = new Uri\Rfc3986\Uri("https://example.com/");
echo $uri->getHostType(); // UriHostType::RegisteredName
$uri = new Uri\Rfc3986\Uri("file:///C:/a.txt");
echo $uri->getHostType(); // null
$uri = new Uri\Rfc3986\Uri("foo:bar/baz");
echo $uri->getHostType(); // null
$uri = new Uri\Rfc3986\Uri("/foo/bar");
echo $uri->getHostType(); // null
$uri = new Uri\Rfc3986\Uri("mailto:john.doe@example.com");
echo $uri->getHostType(); // null
According to RFC 3986, the host can be either an IPv4 or an IPv6 address, a so-called IPvFuture address (a potential IP address type that might be developed after IPv6), or a registered name (usually but not exclusively a DNS name). RFC 3986 also allows the host to be empty ("%%https://%%"), in which case, ''UriHostType::RegisteredName'' is returned, since the ''reg-name'' syntax of RFC 3986 supports empty strings. When the host is missing, ''Uri\Rfc3986\Uri::getHostType()'' returns ''null''.
Some examples for WHATWG URL:
$url = new Uri\WhatWg\Url("https://192.168.0.1/");
echo $url->getHostType(); // UrlHostType::IpV4
$url = new Uri\WhatWg\Url("https://[2001:db8::1]/");
echo $uri->getHostType(); // UrlHostType::IpV6
$url = new Uri\WhatWg\Url("https://example.com/");
echo $url->getHostType(); // UrlHostType::Domain
$url = new Uri\WhatWg\Url("file:///C:/a.txt");
echo $url->getHostType(); // UrlHostType::Empty
As it can be seen, the behavior of the WHATWG URL specification is straightforward in case of [[https://wiki.php.net/rfc/uri_followup#uri_type_detection|special URLs]]: the host can be either an IPv4 or an IPv6 address, a domain, or it can be empty (when the host is missing).
As a side note, let's also mention that WHATWG URL accepts much more IPv4 address formats than RFC 3986:
$url = new Uri\WhatWg\Url("https://127.1/");
echo $url->getAsciiHost(); // 127.0.0.1
$url = new Uri\WhatWg\Url("https://0x7f.0x0.0x0.0x1");
echo $url->getAsciiHost(); // 127.0.0.1
$url = new Uri\WhatWg\Url("https://2130706433/");
echo $url->getAsciiHost(); // 127.0.0.1
Things are getting more complicated when we look at non-special URLs:
$url = new Uri\WhatWg\Url("git://example.com/whatwg/url.git");
echo $url->getHostType(); // UrlHostType::Opaque
$url = new Uri\WhatWg\Url("scheme://127.0.0.1/");
echo $url->getHostType(); // UrlHostType::Opaque
$url = new Uri\WhatWg\Url("mailto:john.doe@example.com");
echo $url->getHostType(); // null
$url = new Uri\WhatWg\Url("foo:/bar/baz");
echo $url->getHostType(); // null
Hosts of non-special URL can be either opaque (note the various opaque hosts!), or ''null'' (when the host is missing). While treating the host of any non-special URL as opaque may seem unusual at first, this follows directly from the design principles of the WHATWG URL specification: the specification intentionally avoids making assumptions about the syntax of schemes it does not define. For example, it cannot know how a hypothetical ''foo'' scheme structures its host component. Therefore, such hosts are treated as opaque and are not subject to further parsing or validation.
These considerations explain why this RFC defines two separate enums (''UriHostType'' and ''UrlHostType'') for the two specifications, even though they contain similar or partially overlapping cases.
RFC 3986 defines a generic URI syntax. Its host categorization reflects this generality as it uses the "registered name" phrasing, without assuming any particular name resolution mechanism. In particular, a registered name is a syntactic production and does not imply a DNS domain.
In contrast, the WHATWG URL specification defines a host model mostly tailored to web interoperability. Because these two specifications operate at different abstraction levels and assign different semantics to superficially similar host forms, their host type systems are not compatible. For example, an RFC 3986 registered name cannot in general be mapped to a WHATWG domain, and WHATWG’s opaque host concept has no direct equivalent in RFC 3986.
For example, consider the URI ''%%app://my-application/resource%%'' using the ''app'' scheme as specified by [[https://www.w3.org/TR/app-uri/|https://www.w3.org/TR/app-uri/]]. According to RFC 3986, the host ''my-application'' is a valid registered name, even though it does not represent a DNS domain and is not expected to be resolved via DNS.
Therefore, a unified host type enum would either blur these semantic distinctions or incorrectly suggest full compatibility, but only partial compatibility exists. Using separate enums ensures that the host classification faithfully reflects the underlying specification being applied.
It should also be mentioned that in fact, ''urlencode()'' should rather be used for the ''application/x-www-form-urlencoded'' media type, while ''rawurlencode()'' more closely implements RFC 3986.For example, the path component dedicates special meaning for the ''/'' character. Therefore, this character doesn't necessarily have to be percent-encoded in the path component. There are some cases though when it makes sense to percent-encode them, as highlighted by the [[https://wiki.php.net/rfc/url_parsing_api#advanced_examples|first example]] within the "Advanced examples" section of the original URI RFC. Unfortunately, ''rawurlencode()'' doesn't take the component into account, and replaces the "/" with "%2F" unconditionally.
echo rawurlencode("/foo/bar/baz"); // %2Ffoo%2Fbar%2Fbaz
In order to correctly handle percent-encoding based on the rules of RFC 3986 and WHATWG URL, the following methods and enums are proposed to be added:
namespace Uri\Rfc3986 {
enum UriPercentEncodingMode
{
case UserInfo;
case RegisteredNameHost;
case Path;
case PathSegment;
case Query;
case FormQuery;
case Fragment;
case AllReservedCharacters;
case AllButUnreservedCharacters;
}
function uri_percent_encode(string $input, \Uri\Rfc3986\UriPercentEncodingMode $mode): string {}
}
namespace Uri\WhatWg {
enum UrlPercentEncodingMode
{
case Username;
case Password;
case OpaqueHost;
case Path;
case OpaquePath;
case PathSegment;
case Query;
case SpecialQuery;
case FormQuery;
case Fragment;
}
function url_percent_encode(string $input, \Url\WhatWg\UrlPercentEncodingMode $mode): string {}
}
The ''uri_percent_encode()'' and ''url_percent_encode()'' functions percent-encode the ''$input'' parameter according to the ''$mode'' percent-encoding mode. These functions are infallible.
The following modes are supported:
* **Uri\Rfc3986\UriPercentEncoder**
* **UserInfo:** Besides [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.3|unreserved characters]] and [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.2|sub-delimiters]], it also allows the following characters to be present: "**:**". Any other characters (including "%" as well) are percent-encoded.
* **RegisteredNameHost:** [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.3|Unreserved characters]], and [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.2|sub-delimiters]] are allowed to be present. Any other characters (including "%" as well) are percent-encoded. Please note that IPv4, IPv6, as well as IPvFuture hosts don't support percent-encoding, and they have a much more restricted syntax, therefore this enum case is only applicable for registered names.
* **Path:** Besides [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.3|unreserved characters]] and [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.2|sub-delimiters]], it also allows the following characters to be present: "**/**", "**:**", "**@**". Any other characters (including "%" as well) are percent-encoded.
* **PathSegment:** Besides [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.3|unreserved characters]] and [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.2|sub-delimiters]], it also allows the following characters to be present: "**:**", "**@**". Any other characters (including "%" as well) are percent-encoded.
* **Query:** Besides [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.3|unreserved characters]] and [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.2|sub-delimiters]], it also allows the following characters to be present: "**:**", "**@**", "**/**", and "**?**". Any other characters (including "%" as well) are percent-encoded.
* **FormQuery:** It is mostly the same as ''Uri\Rfc3986\UriPercentEncoder::Query'', but it behaves according to the ''application/x-www-form-urlencode'' media type rather than RFC 3986. The only difference between the two is that " " is encoded as "**+**".
* **Fragment:** Besides [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.3|unreserved characters]] and [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.2|sub-delimiters]], it also allows the following characters to be present: "**:**", "**@**", "**/**", and "**?**". Any other characters (including "%" as well) are percent-encoded.
* **AllReservedCharacters:** All [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.2|reserved characters]] are percent-encoded. The rest of the characters are left as-is.
* **AllButUnreservedCharacters:** Besides [[https://datatracker.ietf.org/doc/html/rfc3986#section-2.3|unreserved characters]], all other characters (including "%" as well) are percent-encoded.
For the complete ABNF syntax of each component, consult [[https://datatracker.ietf.org/doc/html/rfc3986#appendix-A|Appendix A]] of RFC 3986.
* **Uri\WhatWg\UrlPercentEncoder**
* **Username:** Besides the code points percent-encoded by ''Uri\WhatWg\UrlPercentEncoder::Path'', the following code points are percent-encoded: U+002F (**/**), U+003A (**:**), U+003B (**;**), U+003D (**=**), U+0040 (**@**), U+005B (**[**) to U+005D (**]**), inclusive, and U+007C (**|**).
* **Password:** Besides the code points percent-encoded by ''Uri\WhatWg\UrlPercentEncoder::Path'', the following code points are percent-encoded: U+002F (**/**), U+003A (**:**), U+003B (**;**), U+003D (**=**), U+0040 (**@**), U+005B (**[**) to U+005D (**]**), inclusive, and U+007C (**|**).
* **OpaqueHost:** [[https://infra.spec.whatwg.org/#c0-control|Control characters]], and all [[https://url.spec.whatwg.org/#c0-control-percent-encode-set|code points greater than ~]] are percent-encoded.
* **Path:** Besides the code points percent-encoded by ''Uri\WhatWg\UrlPercentEncoder::Query'', the following code points are percent-encoded: U+003F (**?**), U+005E (**^**), U+0060 (**`**), U+007B (**{**), and U+007D (**}**).
* **OpaquePath:** Besides [[https://infra.spec.whatwg.org/#c0-control|Control characters]], and all [[https://url.spec.whatwg.org/#c0-control-percent-encode-set|code points greater than ~]].
* **PathSegment:** Besides the code points percent-encoded by ''Uri\WhatWg\UrlPercentEncoder::Query'', the following code points are percent-encoded: U+003F (**?**), U+005E (**^**), U+0060 (**`**), U+007B (**{**), U+007D (**}**), and U+002F (**/**).
* **Query:** Besides [[https://infra.spec.whatwg.org/#c0-control|Control characters]], and all [[https://url.spec.whatwg.org/#c0-control-percent-encode-set|code points greater than ~]], the following code points are percent-encoded: U+0020 SPACE, U+0022 (**"**), U+0023 (**#**), U+003C (**<**), and U+003E (**>**).
* **SpecialQuery:** Besides the code points percent-encoded by ''Uri\WhatWg\UrlPercentEncoder::Query'', the following code points are percent-encoded: U+0027 (**'**)
* **FormQuery:** Besides the code points percent-encoded by ''Uri\WhatWg\UrlPercentEncoder::Username'', the following code points are percent-encoded: U+0024 (**$**) to U+0026 (**&**), inclusive, U+002B (**+**), U+002C (**,**), U+0021 (**!**), U+0027 (**'**) to U+0029 RIGHT PARENTHESIS, inclusive, and U+007E (**~**).
* **Fragment:** Besides [[https://infra.spec.whatwg.org/#c0-control|Control characters]], and all [[https://url.spec.whatwg.org/#c0-control-percent-encode-set|code points greater than ~]], the following code points are percent-encoded: U+0020 SPACE, U+0022 (**"**), U+003C (**<**), U+003E (**>**), and U+0060 (**`**).
Since neither RFC 3986, nor WHATWG URL support percent-encoded characters inside the scheme component, none of the enums contain a ''Scheme'' case. WHATWG URL automatically percent-decodes the host for [[https://wiki.php.net/rfc/uri_followup#determining_if_the_whatwg_url_is_special|special URLs]], so ''Uri\WhatWg\UrlPercentEncoder'' doesn't contain a ''Host'' case. For opaque URLs, the ''Uri\WhatWg\UrlPercentEncoder::OpaqueHost'' case can be used.
By using the proposed percent-encoding capabilities, many use-cases will become possible to implement in a specification-compliant way which were difficult to achieve before.
For example, paths can be properly percent-encoded when they contain various special characters:
$uri = new Uri\Rfc3986\Uri("https://example.com");
$uri = $uri->withPath(
Uri\Rfc3986\uri_percent_encode("/foo/bar/[baz]", Uri\Rfc3986\UriPercentEncodingMode::Path)
);
$uri->getPath(); // /foo/bar/%5Bbaz%5D
The current RFC doesn't propose the percent-decoding counterpart, because this functionality may cause confusion. Let's take an example:
$uri = new Uri\Rfc3986\Uri("https://example.com/?a=b%26c"); // The query is the percent-encoded form of "a=b&c=d"
echo Uri\Rfc3986\uri_percent_decode(
$uri->getQuery(),
Uri\Rfc3986\UriPercentEncodingMode::Query
); // a=b&c
The result is probably not what we expected, because percent-decoding changed the meaning of the component:
* Originally, the query contained a parameter "a" with value "b%26c" ("b&c")
* Now, there is a parameter "a" with value "b", as well as a parameter "c" without value
In order to avoid such situations, the present RFC only includes percent-encoding capabilities.
The WHATWG URL specification defines the allowed code points for each component indirectly, by stating which code points should be percent-encoded automatically: thus, the rest of the [[https://url.spec.whatwg.org/#url-code-points|URL code points]] are allowed. It's the opposite what RFC 3986 does, which specifies the exact syntax with the allowed characters for each component.