rfc:curl-url-api

This is an old revision of the document!


PHP RFC: New Curl URL API

Introduction

Since its version 7.62.0 [1], libcurl features a brand new URL API [2] that can be used to parse and generate URLs, using libcurl’s own parser. One of the goal of this API is to tighten a problematic vulnerable area for applications where the URL parser library would believe one thing and libcurl another. This could and has sometimes led to security problems [3].

Proposal

The current RFC proposes the addition of 2 new classes CurlUrl and CurlUrlException. These two classes will only exist if the version of libcurl installed on the system is greater than or equal to 7.62. If the version is older, these 2 classes will not exist.

One new Curl option will also be available CURLOPT_CURLU. This option will tell curl to work with this URL and will overwrite the CURLOPT_URL option.

To avoid confusing behavior where the CurlUrl object would be unintentionally modified after being attached to a CurlHandle, the CurlUrl object will be immutable

$url = new CurlUrl("https://www.php.net");
$ch  = curl_init();
curl_setopt($ch, CURLOPT_CURLU, $url);
curl_exec($ch);
curl_close($ch);

CurlUrl

/* libcurl >= 7.62.0 */
final class CurlUrl implements Stringable
{
    public const APPEND_QUERY = UNKNOWN;
    public const DEFAULT_PORT = UNKNOWN;
    public const DEFAULT_SCHEME = UNKNOWN;
    public const DISALLOW_USER = UNKNOWN;
    public const GUESS_SCHEME = UNKNOWN;
    public const NO_DEFAULT_PORT = UNKNOWN;
    public const ALLOW_UNSUPPORTED_SCHEME = UNKNOWN;
    public const PATH_AS_IS = UNKNOWN;
    public const URL_DECODE = UNKNOWN;
    public const URL_ENCODE = UNKNOWN;
 
    /* libcurl >= 7.67.0 */
    public const NO_AUTHORITY = UNKNOWN;
 
    /* libcurl >= 7.78.0 */
    public const ALLOW_SPACE = UNKNOWN;
 
    public function __construct(?string $url = null, int $flags = 0) {}
    public function get(int $flags = 0): string {}
    public function with(string $url, int $flags = 0): CurlUrl {}
    public function getHost(): ?string {}
    public function withHost(?string $host): CurlUrl {}
    public function getScheme(): ?string {}
    public function withScheme(?string $scheme, int $flags = 0): CurlUrl {}
    public function getPort(int $flags = 0): ?int {}
    public function withPort(?int $port): CurlUrl {}
    public function getPath(int $flags = 0): string {}
    public function withPath(?string $scheme, int $flags = 0): CurlUrl {}
    public function getQuery(int $flags = 0): ?string {}
    public function withQuery(?string $query, int $flags = 0): CurlUrl {}
    public function getFragment(int $flags = 0): ?string {}
    public function withFragment(?string $fragment, int $flags = 0): CurlUrl {}
    public function getUser(int $flags = 0): ?string {}
    public function withUser(?string $user, int $flags = 0): CurlUrl {}
    public function getPassword(int $flags = 0): ?string {}
    public function withPassword(?string $password, int $flags = 0): CurlUrl {}
    public function getOptions(int $flags = 0): ?string {}
    public function withOptions(?string $options, int $flags = 0): CurlUrl {}
    public function __toString(): string {}
 
    /* libcurl >= 7.65.0 */
    public function getZoneId(int $flags = 0): ?string {}
    public function withZoneId(?string $zoneid, int $flags = 0): CurlUrl {}
}

__construct(?string $url = null, int $flags = 0)

Create a new CurlUrl object.

CurlUrl::get(int $flags = 0): string

Return the full, normalized, and possibly cleaned up URL version of what was previously parsed.

Supported flags Description
CurlUrl::DEFAULT_SCHEME If the object has no scheme stored, this option will make the method return the default scheme instead of null.
CurlUrl::DEFAULT_PORT If the object has no port stored, this option will make the method return the default port for the used scheme.
CurlUrl::NO_DEFAULT_PORT Instructs the method to not return a port number if it matches the default port for the scheme.
CurlUrl::URL_ENCODE If set, the method will encode the host name part. If not set (default), libcurl returns the URL with the host name “raw” to support IDN names to appear as-is. IDN host names are typically using non-ASCII bytes that otherwise will be percent-encoded. Note that even when not asking for URL encoding, the '%' (byte 37) will be URL encoded to make sure the host name remains valid.
CurlUrl::URL_DECODE If set, the method will decode the host name part. If there are any byte values lower than 32 in the decoded string, the get operation will return an error instead.

CurlUrl::with(string $url, int $flags = 0): CurlUrl

Return a new CurlUrl object with the new URL. If the source object is already populated with a URL, the new URL can be relative to the previous.

Supported flags Description
CurlUrl::ALLOW_UNSUPPORTED_SCHEME Make this function accept unsupported schemes.
CurlUrl::DEFAULT_SCHEME If set, will make libcurl allow the URL to be set without a scheme and then sets that to the default scheme: HTTPS. Overrides the CurlUrl::GUESS_SCHEME option if both are set.
CurlUrl::GUESS_SCHEME If set, will make libcurl allow the URL to be set without a scheme and it instead “guesses” which scheme that was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be used, otherwise it picks HTTP. Conflicts with the CurlUrl::DEFAULT_SCHEME option which takes precedence if both are set.
CurlUrl::NO_AUTHORITY If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled.
CurlUrl::PATH_AS_IS If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc.
CurlUrl::ALLOW_SPACE If set, the URL parser allows space (ASCII 32) where possible. The URL syntax does normally not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless CurlUrl::URL_ENCODE is also set.
CurlUrl::URL_ENCODE Can be set in combination with CurlUrl::ALLOW_SPACE which makes libcurl URL-encode the space before stored. This affects how the URL will be constructed when CurlUrl::get() is subsequently used to extract the full URL or individual parts.

CurlUrl::getHost(): ?string

Get the Host part of the URL.

CurlUrl::withHost(?string $host): CurlUrl

Return a new CurlUrl object with the host par set to the given host. If it is IDNA, the string must then be encoded as your locale says or UTF-8. If it is a bracketed IPv6 numeric address, it may contain a zone id (or you can use CurlUrl::withZoneid()).

CurlUrl::getScheme(): ?string

Get the scheme part of the URL.

CurlUrl::withScheme(?string $scheme, int $flags = 0): CurlUrl

Return a new CurlUrl object with the scheme part set to the given scheme. Libcurl only accepts setting schemes up to 40 bytes long.

Supported flags Description
CurlUrl::ALLOW_UNSUPPORTED_SCHEME Make this function accept unsupported schemes.

CurlUrl::getPort(int $flags = 0): ?int

Get the port part of the URL.

Supported flags Description
CurlUrl::DEFAULT_PORT If the object has no port stored, this option will make the method return the default port for the used scheme.
CurlUrl::NO_DEFAULT_PORT Instructs the method to return null if the port matches the default port for the scheme.

CurlUrl::withPort(?int $port): CurlUrl

Return a new CurlUrl object with the port set to the given one. The given port number must be between 1 and 65535. Anything else will throw an exception.

CurlUrl::getPath(int $flags = 0): string

Get the path part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::withPath(?string $scheme, int $flags = 0): CurlUrl

Return a new CurlUrl object with the path set to the given path. If a path is set in the URL without a leading slash, a slash will be inserted automatically when this URL is read from the handle.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getQuery(int $flags = 0): ?string

Get the query part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::withQuery(?string $query, int $flags = 0): CurlUrl

Return a new CurlUrl object with the query set to the given query. The question mark in the URL is not part of the actual query contents.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.
CurlUrl::APPEND_QUERY If set, the provided part will be appended on the end of the existing query - and if the previous part did not end with an ampersand (&), an ampersand will be inserted before the new appended part. When CurlUrl::APPEND_QUERY is used together with CurlUrl::URL_ENCODE, the first '=' symbol will not be URL encoded.

CurlUrl::getFragment(int $flags = 0): ?string

Get the fragment part of the URL. The hash sign in the URL is not part of the actual fragment contents.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::withFragment(?string $fragment, int $flags = 0): CurlUrl

Return a new CurlUrl object with the fragment set to the given fragment.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getUser(int $flags = 0): ?string

Get the user part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::withUser(?string $user, int $flags = 0): CurlUrl

Return a new CurlUrl object with the user set to the given user.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getPassword(int $flags = 0): ?string

Get the password part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::withPassword(?string $password, int $flags = 0): CurlUrl

Return a new CurlUrl object with the password set to the given password.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getOptions(int $flags = 0): ?string

Get the options part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::withOptions(?string $options, int $flags = 0): CurlUrl

Return a new CurlUrl object with the options set to the given options.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getZoneId(int $flags = 0): ?string

Get the zone id part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::withZoneId(?string $zoneid, int $flags = 0): CurlUrl

Return a new CurlUrl object with the zoneid set to the given zoneid.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::__toString(): string

Same as calling CurlUrl::get()

CurlUrlException

The CurlUrlException class represents an error raised by libcurl. The constants exposed in this class are all the code that CurlUrlException::getCode() could return. Those code are internally mapped to CURLUE_* error codes that libcurl could raise. Those constants may vary depending on the version of libcurl ext/curl was compiled with.

If ext/curl was compiled with libcurl > 7.80 then CurlUrlException::getMessage() will return a user friendly message that will discriber the problem. (Exemple: Malformed input to a URL function).

/* libcurl >= 7.62.0 */
final class CurlUrlException extends Exception
{
    public const BAD_PORT_NUMBER = UNKNOWN;
    public const MALFORMED_INPUT = UNKNOWN;
    public const OUT_OF_MEMORY = UNKNOWN;
    public const UNSUPPORTED_SCHEME = UNKNOWN;
    public const URL_DECODING_FAILED = UNKNOWN;
    public const USER_NOT_ALLOWED = UNKNOWN;
 
    /* libcurl >= 7.81.0 */
    public const BAD_FILE_URL = UNKNOWN;
    public const BAD_FRAGMENT = UNKNOWN;
    public const BAD_HOSTNAME = UNKNOWN;
    public const BAD_IPV6 = UNKNOWN;
    public const BAD_LOGIN = UNKNOWN;
    public const BAD_PASSWORD = UNKNOWN;
    public const BAD_PATH = UNKNOWN;
    public const BAD_QUERY = UNKNOWN;
    public const BAD_SCHEME = UNKNOWN;
    public const BAD_SLASHES = UNKNOWN;
    public const BAD_USER = UNKNOWN;
}

Backward Incompatible Changes

None, except that the class names CurlUrl and CurlException will be declared by PHP and conflict with applications declaring the same class names in the global namespace.

Proposed PHP Version(s)

8.2

Proposed Voting Choices

As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.

Patches and Tests

Implementation

N/A

References

rfc/curl-url-api.1656593097.txt.gz · Last modified: 2022/06/30 12:44 by pierrick