This is an old revision of the document!
PHP RFC: New Curl URL API
- Version: 0.9
- Date: 2022-06-21
- Author: Pierrick Charron pierrick@php.net
- Status: Under discussion
- First Published at: https://wiki.php.net/rfc/curl-url-api
Introduction
Since its version 7.62.0 [1], libcurl features a brand new URL API [2] that can be used to parse and generate URLs, using libcurl’s own parser. One of the goal of this API is to tighten a problematic vulnerable area for applications where the URL parser library would believe one thing and libcurl another. This could and has sometimes led to security problems [3].
Proposal
The current RFC proposes the addition of 2 new classes CurlUrl
and CurlUrlException
. These two classes will only exist if the version of libcurl installed on the system is greater than or equal to 7.62. If the version is older, these 2 classes will not exist.
CurlUrl
/* libcurl >= 7.62.0 */ final class CurlUrl implements Stringable { public const APPEND_QUERY = UNKNOWN; public const DEFAULT_PORT = UNKNOWN; public const DEFAULT_SCHEME = UNKNOWN; public const DISALLOW_USER = UNKNOWN; public const GUESS_SCHEME = UNKNOWN; public const NO_DEFAULT_PORT = UNKNOWN; public const ALLOW_UNSUPPORTED_SCHEME = UNKNOWN; public const PATH_AS_IS = UNKNOWN; public const URL_DECODE = UNKNOWN; public const URL_ENCODE = UNKNOWN; /* libcurl >= 7.67.0 */ public const NO_AUTHORITY = UNKNOWN; /* libcurl >= 7.78.0 */ public const ALLOW_SPACE = UNKNOWN; public function __construct(?string $url = null, int $flags = 0) {} public function get(int $flags = 0): string {} public function set(?string $url, int $flags = 0): CurlUrl {} public function getHost(): ?string {} public function setHost(?string $host): CurlUrl {} public function getScheme(): ?string {} public function setScheme(?string $scheme, int $flags = 0): CurlUrl {} public function getPort(int $flags = 0): ?int {} public function setPort(?int $port): CurlUrl {} public function getPath(int $flags = 0): string {} public function setPath(?string $scheme, int $flags = 0): CurlUrl {} public function getQuery(int $flags = 0): ?string {} public function setQuery(?string $query, int $flags = 0): CurlUrl {} public function getFragment(int $flags = 0): ?string {} public function setFragment(?string $fragment, int $flags = 0): CurlUrl {} public function getUser(int $flags = 0): ?string {} public function setUser(?string $user, int $flags = 0): CurlUrl {} public function getPassword(int $flags = 0): ?string {} public function setPassword(?string $password, int $flags = 0): CurlUrl {} public function getOptions(int $flags = 0): ?string {} public function setOptions(?string $options, int $flags = 0): CurlUrl {} public function __toString(): string {} /* libcurl >= 7.65.0 */ public function getZoneId(int $flags = 0): ?string {} public function setZoneId(?string $zoneid, int $flags = 0): CurlUrl {} }
__construct(?string $url = null, int $flags = 0)
Create a new CurlUrl object. Implementation is similar to :
public function __construct(?string $url = null, int $flags = 0) { $this->set($url, $flags); }
CurlUrl::get(int $flags = 0): string
Return the full, normalized, and possibly cleaned up URL version of what was previously parsed.
Supported flags | Description |
---|---|
CurlUrl::DEFAULT_SCHEME | If the object has no scheme stored, this option will make the method return the default scheme instead of null. |
CurlUrl::DEFAULT_PORT | If the object has no port stored, this option will make the method return the default port for the used scheme. |
CurlUrl::NO_DEFAULT_PORT | Instructs the method to not return a port number if it matches the default port for the scheme. |
CurlUrl::URL_ENCODE | If set, the method will encode the host name part. If not set (default), libcurl returns the URL with the host name “raw” to support IDN names to appear as-is. IDN host names are typically using non-ASCII bytes that otherwise will be percent-encoded. Note that even when not asking for URL encoding, the '%' (byte 37) will be URL encoded to make sure the host name remains valid. |
CurlUrl::URL_DECODE | If set, the method will decode the host name part. If there are any byte values lower than 32 in the decoded string, the get operation will return an error instead. |
CurlUrl::set(?string $url, int $flags = 0): CurlUrl
Set the full URL of the CurlUrl instance.
If the object is already populated with a URL, the new URL can be relative to the previous. Setting $url
to NULL
will set all parts of the object to null.
Supported flags | Description |
---|---|
CurlUrl::ALLOW_UNSUPPORTED_SCHEME | Make this function accept unsupported schemes. |
CurlUrl::DEFAULT_SCHEME | If set, will make libcurl allow the URL to be set without a scheme and then sets that to the default scheme: HTTPS. Overrides the CurlUrl::GUESS_SCHEME option if both are set. |
CurlUrl::GUESS_SCHEME | If set, will make libcurl allow the URL to be set without a scheme and it instead “guesses” which scheme that was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be used, otherwise it picks HTTP. Conflicts with the CurlUrl::DEFAULT_SCHEME option which takes precedence if both are set. |
CurlUrl::NO_AUTHORITY | If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled. |
CurlUrl::PATH_AS_IS | If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc. |
CurlUrl::ALLOW_SPACE | If set, the URL parser allows space (ASCII 32) where possible. The URL syntax does normally not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless CurlUrl::URL_ENCODE is also set. |
CurlUrl::URL_ENCODE | Can be set in combination with CurlUrl::ALLOW_SPACE which makes libcurl URL-encode the space before stored. This affects how the URL will be constructed when CurlUrl::get() is subsequently used to extract the full URL or individual parts. |
CurlUrl::getHost(): ?string
Get the Host part of the URL.
CurlUrl::setHost(?string $host): CurlUrl
Set the host part of the URL.
If it is IDNA, the string must then be encoded as your locale says or UTF-8.
If it is a bracketed IPv6 numeric address, it may contain a zone id (or you can use CurlUrl::setZoneid()
).
CurlUrl::getScheme(): ?string
Get the scheme part of the URL.
CurlUrl::setScheme(?string $scheme, int $flags = 0): CurlUrl
Set the scheme part of the URL. Libcurl only accepts setting schemes up to 40 bytes long.
Supported flags | Description |
---|---|
CurlUrl::ALLOW_UNSUPPORTED_SCHEME | Make this function accept unsupported schemes. |
CurlUrl::getPort(int $flags = 0): ?int
Get the port part of the URL.
Supported flags | Description |
---|---|
CurlUrl::DEFAULT_PORT | If the object has no port stored, this option will make the method return the default port for the used scheme. |
CurlUrl::NO_DEFAULT_PORT | Instructs the method to return null if the port matches the default port for the scheme. |
CurlUrl::setPort(?int $port): CurlUrl
Set the port part of the URL. The given port number must be between 1 and 65535. Anything else will throw an exception.
CurlUrl::getPath(int $flags = 0): string
Get the path part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::setPath(?string $scheme, int $flags = 0): CurlUrl
Set the path part of the URL. If a path is set in the URL without a leading slash, a slash will be inserted automatically when this URL is read from the handle.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getQuery(int $flags = 0): ?string
Get the query part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::setQuery(?string $query, int $flags = 0): CurlUrl
Set the query part of the URL. The question mark in the URL is not part of the actual query contents.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::APPEND_QUERY | If set, the provided part will be appended on the end of the existing query - and if the previous part did not end with an ampersand (&), an ampersand will be inserted before the new appended part. When CurlUrl::APPEND_QUERY is used together with CurlUrl::URL_ENCODE , the first '=' symbol will not be URL encoded. |
CurlUrl::getFragment(int $flags = 0): ?string
Get the fragment part of the URL. The hash sign in the URL is not part of the actual fragment contents.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::setFragment(?string $fragment, int $flags = 0): CurlUrl
Set the fragement part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getUser(int $flags = 0): ?string
Get the user part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::setUser(?string $user, int $flags = 0): CurlUrl
Set the user part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getPassword(int $flags = 0): ?string
Get the password part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::setPassword(?string $password, int $flags = 0): CurlUrl
Set the password part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getOptions(int $flags = 0): ?string
Get the options part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::setOptions(?string $options, int $flags = 0): CurlUrl
Set the options part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getZoneId(int $flags = 0): ?string
Get the zone id part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::setZoneId(?string $zoneid, int $flags = 0): CurlUrl
Set the zone id part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::__toString(): string
Same as calling CurlUrl::get()
CurlUrlException
The CurlUrlException
class represents an error raised by libcurl. The constants exposed in this class are all the code that CurlUrlException::getCode()
could return. Those code are internally mapped to CURLUE_* error codes that libcurl could raise. Those constants may vary depending on the version of libcurl ext/curl was compiled with.
If ext/curl was compiled with libcurl > 7.80 then CurlUrlException::getMessage()
will return a user friendly message that will discriber the problem. (Exemple: Malformed input to a URL function).
/* libcurl >= 7.62.0 */ final class CurlUrlException extends Exception { public const BAD_PORT_NUMBER = UNKNOWN; public const MALFORMED_INPUT = UNKNOWN; public const OUT_OF_MEMORY = UNKNOWN; public const UNSUPPORTED_SCHEME = UNKNOWN; public const URL_DECODING_FAILED = UNKNOWN; public const USER_NOT_ALLOWED = UNKNOWN; /* libcurl >= 7.81.0 */ public const BAD_FILE_URL = UNKNOWN; public const BAD_FRAGMENT = UNKNOWN; public const BAD_HOSTNAME = UNKNOWN; public const BAD_IPV6 = UNKNOWN; public const BAD_LOGIN = UNKNOWN; public const BAD_PASSWORD = UNKNOWN; public const BAD_PATH = UNKNOWN; public const BAD_QUERY = UNKNOWN; public const BAD_SCHEME = UNKNOWN; public const BAD_SLASHES = UNKNOWN; public const BAD_USER = UNKNOWN; }
Backward Incompatible Changes
None, except that the class names CurlUrl
and CurlException
will be declared by PHP and conflict with applications declaring the same class names in the global namespace.
Proposed PHP Version(s)
8.2
Future Scope
The current implementation of the CurlUrl
class is mutable. We might want to add a new ImmutableCurlUrl
class.
Proposed Voting Choices
As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.
Patches and Tests
Implementation
N/A