rfc:curl-url-api

This is an old revision of the document!


PHP RFC: New Curl URL API

Introduction

Since its version 7.62.0 [1], libcurl features a brand new URL API [2] that can be used to parse and generate URLs, using libcurl’s own parser. One of the goal of this API is to tighten a problematic vulnerable area for applications where the URL parser library would believe one thing and libcurl another. This could and has sometimes led to security problems [3].

Proposal

The current RFC propose the addition of 2 new classes CurlUrl and CurlUrlException. Those two classes will only exists if the version of libcurl installed on the system is greater or equal to 7.62. If the version is older those 2 classes will not exists.

CurlUrl

/* libcurl >= 7.62.0 */
final class CurlUrl implements Stringable
{
    public const APPEND_QUERY = UNKNOWN;
    public const DEFAULT_PORT = UNKNOWN;
    public const DEFAULT_SCHEME = UNKNOWN;
    public const DISALLOW_USER = UNKNOWN;
    public const GUESS_SCHEME = UNKNOWN;
    public const NO_DEFAULT_PORT = UNKNOWN;
    public const ALLOW_UNSUPPORTED_SCHEME = UNKNOWN;
    public const PATH_AS_IS = UNKNOWN;
    public const URL_DECODE = UNKNOWN;
    public const URL_ENCODE = UNKNOWN;
 
    /* libcurl >= 7.67.0 */
    public const NO_AUTHORITY = UNKNOWN;
 
    /* libcurl >= 7.78.0 */
    public const ALLOW_SPACE = UNKNOWN;
 
    public function __construct(?string $url = null, int $flags = 0) {}
    public function get(int $flags = 0): string {}
    public function set(?string $url, int $flags = 0): CurlUrl {}
    public function getHost(): ?string {}
    public function setHost(?string $host): CurlUrl {}
    public function getScheme(): ?string {}
    public function setScheme(?string $scheme, int $flags = 0): CurlUrl {}
    public function getPort(int $flags = 0): ?int {}
    public function setPort(?int $port): CurlUrl {}
    public function getPath(int $flags = 0): string {}
    public function setPath(?string $scheme, int $flags = 0): CurlUrl {}
    public function getQuery(int $flags = 0): ?string {}
    public function setQuery(?string $query, int $flags = 0): CurlUrl {}
    public function getFragment(int $flags = 0): ?string {}
    public function setFragment(?string $fragment, int $flags = 0): CurlUrl {}
    public function getUser(int $flags = 0): ?string {}
    public function setUser(?string $user, int $flags = 0): CurlUrl {}
    public function getPassword(int $flags = 0): ?string {}
    public function setPassword(?string $password, int $flags = 0): CurlUrl {}
    public function getOptions(int $flags = 0): ?string {}
    public function setOptions(?string $options, int $flags = 0): CurlUrl {}
    public function __toString(): string {}
 
    /* libcurl >= 7.65.0 */
    public function getZoneId(int $flags = 0): ?string {}
    public function setZoneId(?string $zoneid, int $flags = 0): CurlUrl {}
}

__construct(?string $url = null, int $flags = 0)

Create a new CurlUrl object. Implementation is similar to :

public function __construct(?string $url = null, int $flags = 0) {
    $this->set($url, $flags);
}

CurlUrl::get(int $flags = 0): string

Return the full, normalized, and possibly cleaned up URL version of what was previously parsed.

Supported flags Description
CurlUrl::DEFAULT_SCHEME If the object has no scheme stored, this option will make the method return the default scheme instead of null.
CurlUrl::DEFAULT_PORT If the object has no port stored, this option will make the method return the default port for the used scheme.
CurlUrl::NO_DEFAULT_PORT Instructs the method to not return a port number if it matches the default port for the scheme.
CurlUrl::URL_ENCODE If set, the method will encode the host name part. If not set (default), libcurl returns the URL with the host name “raw” to support IDN names to appear as-is. IDN host names are typically using non-ASCII bytes that otherwise will be percent-encoded. Note that even when not asking for URL encoding, the '%' (byte 37) will be URL encoded to make sure the host name remains valid.
CurlUrl::URL_DECODE If set, the method will decode the host name part. If there's any byte values lower than 32 in the decoded string, the get operation will return an error instead.

CurlUrl::set(?string $url, int $flags = 0): CurlUrl

Set the full URL of the CurlUrl instance. If the object is already populated with a URL, the new URL can be relative to the previous. Setting $url to NULL will set all parts of the object to null.

Supported flags Description
CurlUrl::ALLOW_UNSUPPORTED_SCHEME Make this function accept unsupported schemes.
CurlUrl::DEFAULT_SCHEME If set, will make libcurl allow the URL to be set without a scheme and then sets that to the default scheme: HTTPS. Overrides the CurlUrl::GUESS_SCHEME option if both are set.
CurlUrl::GUESS_SCHEME If set, will make libcurl allow the URL to be set without a scheme and it instead “guesses” which scheme that was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be used, otherwise it picks HTTP. Conflicts with the CurlUrl::DEFAULT_SCHEME option which takes precedence if both are set.
CurlUrl::NO_AUTHORITY If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled.
CurlUrl::PATH_AS_IS If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc.
CurlUrl::ALLOW_SPACE If set, the URL parser allows space (ASCII 32) where possible. The URL syntax does normally not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless CurlUrl::URL_ENCODE is also set.
CurlUrl::URL_ENCODE Can be set in combination with CurlUrl::ALLOW_SPACE which makes libcurl URL-encode the space before stored. This affects how the URL will be constructed when CurlUrl::get() is subsequently used to extract the full URL or individual parts.

CurlUrl::getHost(): ?string

Get the Host part of the URL.

CurlUrl::setHost(?string $host): CurlUrl

Set the host part of the URL. If it is IDNA the string must then be encoded as your locale says or UTF-8. If it is a bracketed IPv6 numeric address it may contain a zone id (or you can use CurlUrl::setZoneid()).

CurlUrl::getScheme(): ?string

Get the scheme part of the URL.

CurlUrl::setScheme(?string $scheme, int $flags = 0): CurlUrl

Set the scheme part of the URL. Libcurl only accepts setting schemes up to 40 bytes long.

Supported flags Description
CurlUrl::ALLOW_UNSUPPORTED_SCHEME Make this function accept unsupported schemes.

CurlUrl::getPort(int $flags = 0): ?int

Get the port part of the URL.

Supported flags Description
CurlUrl::DEFAULT_PORT If the object has no port stored, this option will make the method return the default port for the used scheme.
CurlUrl::NO_DEFAULT_PORT Instructs the method to return null if the port matches the default port for the scheme.

CurlUrl::setPort(?int $port): CurlUrl

Set the port part of the URL. The given port number must be between 1 and 65535. Anything else will throw an exception.

CurlUrl::getPath(int $flags = 0): string

Get the path part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::setPath(?string $scheme, int $flags = 0): CurlUrl

Set the path part of the URL. If a path is set in the URL without a leading slash, a slash will be inserted automatically when this URL is read from the handle.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getQuery(int $flags = 0): ?string

Get the query part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::setQuery(?string $query, int $flags = 0): CurlUrl

Set the query part of the URL. The question mark in the URL is not part of the actual query contents.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.
CurlUrl::APPEND_QUERY If set, the provided part will be appended on the end of the existing query - and if the previous part did not end with an ampersand (&), an ampersand will be inserted before the new appended part. When CurlUrl::APPEND_QUERY is used together with CurlUrl::URL_ENCODE, the first '=' symbol will not be URL encoded.

CurlUrl::getFragment(int $flags = 0): ?string

Get the fragment part of the URL. The hash sign in the URL is not part of the actual fragment contents.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::setFragment(?string $fragment, int $flags = 0): CurlUrl

Set the fragement part of the URL.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getUser(int $flags = 0): ?string

Get the user part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::setUser(?string $user, int $flags = 0): CurlUrl

Set the user part of the URL.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getPassword(int $flags = 0): ?string

Get the password part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::setPassword(?string $password, int $flags = 0): CurlUrl

Set the password part of the URL.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getOptions(int $flags = 0): ?string

Get the options part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::setOptions(?string $options, int $flags = 0): CurlUrl

Set the options part of the URL.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::getZoneId(int $flags = 0): ?string

Get the zone id part of the URL.

Supported flags Description
CurlUrl::URL_DECODE If set, URL decode the contents before returning it.

CurlUrl::setZoneId(?string $zoneid, int $flags = 0): CurlUrl

Set the zone id part of the URL.

Supported flags Description
CurlUrl::URL_ENCODE If set, URL encodes the part.

CurlUrl::__toString(): string

Same as calling CurlUrl::get()

CurlUrlException

The CurlUrlException class represents an error raised by libcurl. The constants exposed in this class are all the code that CurlUrlException::getCode() could return. Those code are internally mapped to CURLUE_* error codes that libcurl could raise. Those constants may vary depending on the version of libcurl ext/curl was compiled with.

If ext/curl was compiled with libcurl > 7.80 then CurlUrlException::getMessage() will return a user friendly message that will discriber the problem. (Exemple: Malformed input to a URL function).

/* libcurl >= 7.62.0 */
final class CurlUrlException extends Exception
{
    public const BAD_PORT_NUMBER = UNKNOWN;
    public const MALFORMED_INPUT = UNKNOWN;
    public const OUT_OF_MEMORY = UNKNOWN;
    public const UNSUPPORTED_SCHEME = UNKNOWN;
    public const URL_DECODING_FAILED = UNKNOWN;
    public const USER_NOT_ALLOWED = UNKNOWN;
 
    /* libcurl >= 7.81.0 */
    public const BAD_FILE_URL = UNKNOWN;
    public const BAD_FRAGMENT = UNKNOWN;
    public const BAD_HOSTNAME = UNKNOWN;
    public const BAD_IPV6 = UNKNOWN;
    public const BAD_LOGIN = UNKNOWN;
    public const BAD_PASSWORD = UNKNOWN;
    public const BAD_PATH = UNKNOWN;
    public const BAD_QUERY = UNKNOWN;
    public const BAD_SCHEME = UNKNOWN;
    public const BAD_SLASHES = UNKNOWN;
    public const BAD_USER = UNKNOWN;
}

Backward Incompatible Changes

None, except that the class names CurlUrl, CurlException will be declared by PHP and conflict with applications declaring one of the same class name in the global namespace.

Proposed PHP Version(s)

8.2

Future Scope

The current implementation of the CurlUrl class is mutable. We might want to add a new ImmutableCurlUrl class.

Proposed Voting Choices

As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.

Patches and Tests

Implementation

N/A

References

rfc/curl-url-api.1656100324.txt.gz · Last modified: 2022/06/24 19:52 by ramsey