This is an old revision of the document!
PHP RFC: New Curl URL API
- Version: 0.9
- Date: 2022-06-21
- Author: Pierrick Charron pierrick@php.net
- Status: Under discussion
- First Published at: https://wiki.php.net/rfc/curl-url-api
Introduction
Since its version 7.62.0 [1], libcurl features a brand new URL API [2] that can be used to parse and generate URLs, using libcurl’s own parser. One of the goal of this API is to tighten a problematic vulnerable area for applications where the URL parser library would believe one thing and libcurl another. This could and has sometimes led to security problems [3].
Proposal
The current RFC proposes the addition of 2 new classes CurlUrl
and CurlUrlException
. These two classes will only exist if the version of libcurl installed on the system is greater than or equal to 7.62. If the version is older, these 2 classes will not exist.
One new Curl option will also be available CURLOPT_CURLU
. This option will tell curl to work with this URL and will overwrite the CURLOPT_URL
option.
To avoid confusing behavior where the CurlUrl object would be unintentionally modified after being attached to a CurlHandle, the CurlUrl object will be immutable
$url = new CurlUrl("https://www.php.net"); $ch = curl_init(); curl_setopt($ch, CURLOPT_CURLU, $url); curl_exec($ch); curl_close($ch);
CurlUrl
/* libcurl >= 7.62.0 */ final class CurlUrl implements Stringable { public const APPEND_QUERY = UNKNOWN; public const DEFAULT_PORT = UNKNOWN; public const DEFAULT_SCHEME = UNKNOWN; public const DISALLOW_USER = UNKNOWN; public const GUESS_SCHEME = UNKNOWN; public const NO_DEFAULT_PORT = UNKNOWN; public const ALLOW_UNSUPPORTED_SCHEME = UNKNOWN; public const PATH_AS_IS = UNKNOWN; public const URL_DECODE = UNKNOWN; public const URL_ENCODE = UNKNOWN; /* libcurl >= 7.67.0 */ public const NO_AUTHORITY = UNKNOWN; /* libcurl >= 7.78.0 */ public const ALLOW_SPACE = UNKNOWN; public function __construct(?string $url = null, int $flags = 0) {} public function get(int $flags = 0): string {} public function with(string $url, int $flags = 0): CurlUrl {} public function getHost(): ?string {} public function withHost(?string $host): CurlUrl {} public function getScheme(): ?string {} public function withScheme(?string $scheme, int $flags = 0): CurlUrl {} public function getPort(int $flags = 0): ?int {} public function withPort(?int $port): CurlUrl {} public function getPath(int $flags = 0): string {} public function withPath(?string $scheme, int $flags = 0): CurlUrl {} public function getQuery(int $flags = 0): ?string {} public function withQuery(?string $query, int $flags = 0): CurlUrl {} public function getFragment(int $flags = 0): ?string {} public function withFragment(?string $fragment, int $flags = 0): CurlUrl {} public function getUser(int $flags = 0): ?string {} public function withUser(?string $user, int $flags = 0): CurlUrl {} public function getPassword(int $flags = 0): ?string {} public function withPassword(?string $password, int $flags = 0): CurlUrl {} public function getOptions(int $flags = 0): ?string {} public function withOptions(?string $options, int $flags = 0): CurlUrl {} public function __toString(): string {} /* libcurl >= 7.65.0 */ public function getZoneId(int $flags = 0): ?string {} public function withZoneId(?string $zoneid, int $flags = 0): CurlUrl {} }
__construct(?string $url = null, int $flags = 0)
Create a new CurlUrl object.
CurlUrl::get(int $flags = 0): string
Return the full, normalized, and possibly cleaned up URL version of what was previously parsed.
Supported flags | Description |
---|---|
CurlUrl::DEFAULT_SCHEME | If the object has no scheme stored, this option will make the method return the default scheme instead of null. |
CurlUrl::DEFAULT_PORT | If the object has no port stored, this option will make the method return the default port for the used scheme. |
CurlUrl::NO_DEFAULT_PORT | Instructs the method to not return a port number if it matches the default port for the scheme. |
CurlUrl::URL_ENCODE | If set, the method will encode the host name part. If not set (default), libcurl returns the URL with the host name “raw” to support IDN names to appear as-is. IDN host names are typically using non-ASCII bytes that otherwise will be percent-encoded. Note that even when not asking for URL encoding, the '%' (byte 37) will be URL encoded to make sure the host name remains valid. |
CurlUrl::URL_DECODE | If set, the method will decode the host name part. If there are any byte values lower than 32 in the decoded string, the get operation will return an error instead. |
CurlUrl::with(string $url, int $flags = 0): CurlUrl
Return a new CurlUrl object with the new URL. If the source object is already populated with a URL, the new URL can be relative to the previous.
Supported flags | Description |
---|---|
CurlUrl::ALLOW_UNSUPPORTED_SCHEME | Make this function accept unsupported schemes. |
CurlUrl::DEFAULT_SCHEME | If set, will make libcurl allow the URL to be set without a scheme and then sets that to the default scheme: HTTPS. Overrides the CurlUrl::GUESS_SCHEME option if both are set. |
CurlUrl::GUESS_SCHEME | If set, will make libcurl allow the URL to be set without a scheme and it instead “guesses” which scheme that was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be used, otherwise it picks HTTP. Conflicts with the CurlUrl::DEFAULT_SCHEME option which takes precedence if both are set. |
CurlUrl::NO_AUTHORITY | If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled. |
CurlUrl::PATH_AS_IS | If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc. |
CurlUrl::ALLOW_SPACE | If set, the URL parser allows space (ASCII 32) where possible. The URL syntax does normally not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless CurlUrl::URL_ENCODE is also set. |
CurlUrl::URL_ENCODE | Can be set in combination with CurlUrl::ALLOW_SPACE which makes libcurl URL-encode the space before stored. This affects how the URL will be constructed when CurlUrl::get() is subsequently used to extract the full URL or individual parts. |
CurlUrl::getHost(): ?string
Get the Host part of the URL.
CurlUrl::withHost(?string $host): CurlUrl
Return a new CurlUrl object with the host par set to the given host.
If it is IDNA, the string must then be encoded as your locale says or UTF-8.
If it is a bracketed IPv6 numeric address, it may contain a zone id (or you can use CurlUrl::withZoneid()
).
CurlUrl::getScheme(): ?string
Get the scheme part of the URL.
CurlUrl::withScheme(?string $scheme, int $flags = 0): CurlUrl
Return a new CurlUrl object with the scheme part set to the given scheme. Libcurl only accepts setting schemes up to 40 bytes long.
Supported flags | Description |
---|---|
CurlUrl::ALLOW_UNSUPPORTED_SCHEME | Make this function accept unsupported schemes. |
CurlUrl::getPort(int $flags = 0): ?int
Get the port part of the URL.
Supported flags | Description |
---|---|
CurlUrl::DEFAULT_PORT | If the object has no port stored, this option will make the method return the default port for the used scheme. |
CurlUrl::NO_DEFAULT_PORT | Instructs the method to return null if the port matches the default port for the scheme. |
CurlUrl::withPort(?int $port): CurlUrl
Return a new CurlUrl object with the port set to the given one. The given port number must be between 1 and 65535. Anything else will throw an exception.
CurlUrl::getPath(int $flags = 0): string
Get the path part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::withPath(?string $scheme, int $flags = 0): CurlUrl
Return a new CurlUrl object with the path set to the given path. If a path is set in the URL without a leading slash, a slash will be inserted automatically when this URL is read from the handle.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getQuery(int $flags = 0): ?string
Get the query part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::withQuery(?string $query, int $flags = 0): CurlUrl
Return a new CurlUrl object with the query set to the given query. The question mark in the URL is not part of the actual query contents.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::APPEND_QUERY | If set, the provided part will be appended on the end of the existing query - and if the previous part did not end with an ampersand (&), an ampersand will be inserted before the new appended part. When CurlUrl::APPEND_QUERY is used together with CurlUrl::URL_ENCODE , the first '=' symbol will not be URL encoded. |
CurlUrl::getFragment(int $flags = 0): ?string
Get the fragment part of the URL. The hash sign in the URL is not part of the actual fragment contents.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::withFragment(?string $fragment, int $flags = 0): CurlUrl
Return a new CurlUrl object with the fragment set to the given fragment.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getUser(int $flags = 0): ?string
Get the user part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::withUser(?string $user, int $flags = 0): CurlUrl
Return a new CurlUrl object with the user set to the given user.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getPassword(int $flags = 0): ?string
Get the password part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::withPassword(?string $password, int $flags = 0): CurlUrl
Return a new CurlUrl object with the password set to the given password.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getOptions(int $flags = 0): ?string
Get the options part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::withOptions(?string $options, int $flags = 0): CurlUrl
Return a new CurlUrl object with the options set to the given options.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::getZoneId(int $flags = 0): ?string
Get the zone id part of the URL.
Supported flags | Description |
---|---|
CurlUrl::URL_DECODE | If set, URL decode the contents before returning it. |
CurlUrl::withZoneId(?string $zoneid, int $flags = 0): CurlUrl
Return a new CurlUrl object with the zoneid set to the given zoneid.
Supported flags | Description |
---|---|
CurlUrl::URL_ENCODE | If set, URL encodes the part. |
CurlUrl::__toString(): string
Same as calling CurlUrl::get()
CurlUrlException
The CurlUrlException
class represents an error raised by libcurl. The constants exposed in this class are all the code that CurlUrlException::getCode()
could return. Those code are internally mapped to CURLUE_* error codes that libcurl could raise. Those constants may vary depending on the version of libcurl ext/curl was compiled with.
If ext/curl was compiled with libcurl > 7.80 then CurlUrlException::getMessage()
will return a user friendly message that will discriber the problem. (Exemple: Malformed input to a URL function).
/* libcurl >= 7.62.0 */ final class CurlUrlException extends Exception { public const BAD_PORT_NUMBER = UNKNOWN; public const MALFORMED_INPUT = UNKNOWN; public const OUT_OF_MEMORY = UNKNOWN; public const UNSUPPORTED_SCHEME = UNKNOWN; public const URL_DECODING_FAILED = UNKNOWN; public const USER_NOT_ALLOWED = UNKNOWN; /* libcurl >= 7.81.0 */ public const BAD_FILE_URL = UNKNOWN; public const BAD_FRAGMENT = UNKNOWN; public const BAD_HOSTNAME = UNKNOWN; public const BAD_IPV6 = UNKNOWN; public const BAD_LOGIN = UNKNOWN; public const BAD_PASSWORD = UNKNOWN; public const BAD_PATH = UNKNOWN; public const BAD_QUERY = UNKNOWN; public const BAD_SCHEME = UNKNOWN; public const BAD_SLASHES = UNKNOWN; public const BAD_USER = UNKNOWN; }
Backward Incompatible Changes
None, except that the class names CurlUrl
and CurlException
will be declared by PHP and conflict with applications declaring the same class names in the global namespace.
Proposed PHP Version(s)
8.2
Proposed Voting Choices
As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.
Patches and Tests
Implementation
N/A