rfc:curl-url-api

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
rfc:curl-url-api [2022/07/04 19:35] pierrickrfc:curl-url-api [2022/07/04 20:11] – Fix typos and improve wording in several places theodorejb
Line 8: Line 8:
 ===== Introduction ===== ===== Introduction =====
  
-Since its version 7.62.0 [1], libcurl features a brand new URL API [2] that can be used to parse and generate URLs, using libcurl’s own parser. One of the goals of this API is to tighten a problematic vulnerable area for applications where the URL parser library would believe one thing and libcurl another. This could and has sometimes led to security problems [3].+Since version 7.62.0 of libcurl, [[https://daniel.haxx.se/blog/2018/10/31/curl-7-62-0-moar-stuff/|1]] the library features a brand new URL API [[https://daniel.haxx.se/blog/2018/09/09/libcurl-gets-a-url-api/|2]] that can be used to parse and generate URLs, using libcurl’s own parser. One of the goals of this API is to tighten a problematic vulnerable area for applications where the URL parser library would believe one thing and libcurl another. This could and has sometimes led to security problems[[https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf|3]]
  
 ===== Proposal ===== ===== Proposal =====
  
-There is obviously many different ways on how to implement this API in user land and recent discussions shows that people do not agree on how this API should be implemented. This RFC will propose 4 different solutions of exposing this API.+There are obviously many different ways this API could be implemented in userland, and recent discussions show that there is not yet a consensus on how it should be done. This RFC proposes 4 different solutions of exposing this API.
  
 ==== Common to all proposed implementations ==== ==== Common to all proposed implementations ====
  
-All the different implementations would add two new classes <php>CurlUrl</php> and <php>CurlUrlException</php>. These two classes will only exist if the version of libcurl installed on the system is greater than or equal to 7.62. If the version is older, these classes will not exist.+Each of the four different implementations would add two new classes<php>CurlUrl</php> and <php>CurlUrlException</php>. These two classes will only exist if the version of libcurl installed on the system is greater than or equal to 7.62. If the version is older, these two classes will not exist.
  
-One new Curl option will also be available <php>CURLOPT_CURLU</php>. Curl will use the given object read-only and will not change its contents.+One new Curl option will also be available<php>CURLOPT_CURLU</php>. Curl will use the given read-only object and will not change its contents.
  
-The <php>CurlUrlException</php> class represents an error raised by libcurl. The constants exposed in this class are all the code that <php>CurlUrlException::getCode()</php> could return. Those codes are internally mapped to CURLUE_* error codes that libcurl could raise. Those constants may vary depending on the version of libcurl ext/curl was compiled with.+The <php>CurlUrlException</php> class represents an error raised by libcurl. The constants exposed in this class are all the codes that <php>CurlUrlException::getCode()</php> could return. Those codes are internally mapped to CURLUE_* error codes that libcurl could raise. Those constants may vary depending on the version of libcurl ext/curl was compiled with.
  
-If ext/curl was compiled with libcurl > 7.80 then <php>CurlUrlException::getMessage()</php> will return a user-friendly message that will discribe the problem. (Exemple: Malformed input to a URL function).+If ext/curl was compiled with libcurl > 7.80 then <php>CurlUrlException::getMessage()</php> will return a user-friendly message that will describe the problem. (Example: Malformed input to a URL function).
  
 <PHP> <PHP>
Line 52: Line 52:
 ==== Implementation 1 : Procedural API ==== ==== Implementation 1 : Procedural API ====
  
-This implementation is a simple one to one binding of the libcurl functions. The underlying CURLU handle will be exposed as an opaque CurlUrl object.+This implementation is a simple one-to-one binding of the libcurl functions. The underlying CURLU handle will be exposed as an opaque CurlUrl object.
  
 All <php>CURLUPART_*</php> and <php>CURLU_</php> constants will be exposed as global constants with the same name in userland. All <php>CURLUPART_*</php> and <php>CURLU_</php> constants will be exposed as global constants with the same name in userland.
Line 100: Line 100:
 == curl_url(?string $url = null) == == curl_url(?string $url = null) ==
  
-Create a new CurlUrl object. If <php>$url</php> is set, the URL will be initialised using this url, otherwise, all the parts will be set to null.+Create a new CurlUrl object. If <php>$url</php> is set, the object will be initialized using this URL, otherwise, all the parts will be set to <php>null</php>.
  
 == curl_url_set(CurlUrl $url, int $part, string $content, int $flags = 0): void == == curl_url_set(CurlUrl $url, int $part, string $content, int $flags = 0): void ==
  
-Update individual pieces of the URL. The <php>$part</php> argument identify the particular URL part to set or change (<php>CURLUPART_*</php>). Setting a part to a <php>null</php> value will effectively remove that part's contents from the <php>CurlUrl</php> object.+Update individual pieces of the URL. The <php>$part</php> argument identifies the particular URL part to set or change (<php>CURLUPART_*</php>). Setting a part to a <php>null</php> value will effectively remove that part's contents from the <php>CurlUrl</php> object.
  
 The <php>$flags</php> argument is a bitmask with individual features. The <php>$flags</php> argument is a bitmask with individual features.
Line 111: Line 111:
 | <php>CURLU_NON_SUPPORT_SCHEME</php> | If set, allows this function to set a non-supported scheme. | | <php>CURLU_NON_SUPPORT_SCHEME</php> | If set, allows this function to set a non-supported scheme. |
 | <php>CURLU_URLENCODE</php> | If set, URL encodes the part. | | <php>CURLU_URLENCODE</php> | If set, URL encodes the part. |
-| <php>CURLU_DEFAULT_SCHEME</php> | If set, allow the URL to be set without a scheme and then sets that to the default scheme: HTTPS. Overrides the <php>CURLU_GUESS_SCHEME</php> option if both are set. | +| <php>CURLU_DEFAULT_SCHEME</php> | If set, allows the URL to be set without a scheme, in which case the scheme will be set to the default: HTTPS. Overrides the <php>CURLU_GUESS_SCHEME</php> option if both are set. | 
-| <php>CURLU_GUESS_SCHEME</php> | If set,allow the URL to be set without a scheme and it instead "guesses" which scheme that was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be usedotherwise it picks HTTP. Conflicts with the <php>CURLU_DEFAULT_SCHEME</php> option which takes precedence if both are set. |+| <php>CURLU_GUESS_SCHEME</php> | If set, allows the URL to be set without a scheme and it instead "guesses" which scheme was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be usedotherwise it picks HTTP. Conflicts with the <php>CURLU_DEFAULT_SCHEME</php> option which takes precedence if both are set. |
 | <php>CURLU_NO_AUTHORITY</php> | If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled. | | <php>CURLU_NO_AUTHORITY</php> | If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled. |
 | <php>CURLU_PATH_AS_IS</php> | If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc. | | <php>CURLU_PATH_AS_IS</php> | If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc. |
-| <php>CURLU_ALLOW_SPACE</php> | If set, the URL parser allows space (ASCII 32) where possible. The URL syntax does normally not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless |+| <php>CURLU_ALLOW_SPACE</php> | If set, the URL parser allows space (ASCII 32) where possible. The URL syntax normally does not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless |
  
 == curl_url_get(CurlUrl $url, int $part, int $flags = 0): ?string == == curl_url_get(CurlUrl $url, int $part, int $flags = 0): ?string ==
Line 121: Line 121:
 This function lets the user extract individual pieces from the <php>$url</php> object. If the particular part is not set, this function will return <php>null</php>. This function lets the user extract individual pieces from the <php>$url</php> object. If the particular part is not set, this function will return <php>null</php>.
  
-The <php>$part</php> argument identify the particular URL part to extract.+The <php>$part</php> argument identifies the particular URL part to extract.
  
 The <php>$flags</php> argument is a bitmask with individual features. The <php>$flags</php> argument is a bitmask with individual features.
Line 134: Line 134:
 ==== Implementation 2 : Mutable CurlUrl with single setter/getter method ==== ==== Implementation 2 : Mutable CurlUrl with single setter/getter method ====
  
-This implementation is an oopified version of the procedural API offered by libcurl, with few changes :+This implementation is an OOPified version of the procedural API offered by libcurl, with few changes:
  
   * Constants related to the Curl URL API are moved to class constants on <php>CurlUrl</php> or <php>CurlException</php>   * Constants related to the Curl URL API are moved to class constants on <php>CurlUrl</php> or <php>CurlException</php>
   * Functions that normally take the <php>CurlUrl</php> handle are replaced by methods on this handle.   * Functions that normally take the <php>CurlUrl</php> handle are replaced by methods on this handle.
-  * The <php>CurlUrl::get</php> method will return <php>null</php> instead of an exception if the retrived part is not set.+  * The <php>CurlUrl::get</php> method will return <php>null</php> instead of an exception if the retrieved part is not set.
   * All other errors of libcurl will become <php>CurlUrlException</php>   * All other errors of libcurl will become <php>CurlUrlException</php>
  
Line 195: Line 195:
 == CurlUrl::set(int $part, string $content, int $flags = 0): CurlUrl == == CurlUrl::set(int $part, string $content, int $flags = 0): CurlUrl ==
  
-Update individual pieces of the URL. The <php>$part</php> argument identify the particular URL part to set or change (<php>CurlUrl::PART_*</php>). Setting a part to a <php>null</php> will effectively remove that part's contents from the <php>CurlUrl</php> object.+Update individual pieces of the URL. The <php>$part</php> argument identifies the particular URL part to set or change (<php>CurlUrl::PART_*</php>). Setting a part to a <php>null</php> will effectively remove that part's contents from the <php>CurlUrl</php> object.
  
 The <php>$flags</php> argument is a bitmask with individual features. The <php>$flags</php> argument is a bitmask with individual features.
Line 203: Line 203:
 | <php>CurlUrl::URLENCODE</php> | If set, URL encodes the part. | | <php>CurlUrl::URLENCODE</php> | If set, URL encodes the part. |
 | <php>CurlUrl::DEFAULT_SCHEME</php> | If set, allow the URL to be set without a scheme and then sets that to the default scheme: HTTPS. Overrides the <php>CurlUrl::GUESS_SCHEME</php> option if both are set. | | <php>CurlUrl::DEFAULT_SCHEME</php> | If set, allow the URL to be set without a scheme and then sets that to the default scheme: HTTPS. Overrides the <php>CurlUrl::GUESS_SCHEME</php> option if both are set. |
-| <php>CurlUrl::GUESS_SCHEME</php> | If set,allow the URL to be set without a scheme and it instead "guesses" which scheme that was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be used, otherwise it picks HTTP. Conflicts with the <php>CurlUrl::DEFAULT_SCHEME</php> option which takes precedence if both are set. |+| <php>CurlUrl::GUESS_SCHEME</php> | If set, allows the URL to be set without a scheme and it instead "guesses" which scheme was intended based on the host name. If the outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then that scheme will be used, otherwise it picks HTTP. Conflicts with the <php>CurlUrl::DEFAULT_SCHEME</php> option which takes precedence if both are set. |
 | <php>CurlUrl::NO_AUTHORITY</php> | If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled. | | <php>CurlUrl::NO_AUTHORITY</php> | If set, skips authority checks. The RFC allows individual schemes to omit the host part (normally the only mandatory part of the authority), but libcurl cannot know whether this is permitted for custom schemes. Specifying the flag permits empty authority sections, similar to how file scheme is handled. |
 | <php>CurlUrl::PATH_AS_IS</php> | If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc. | | <php>CurlUrl::PATH_AS_IS</php> | If set, makes libcurl skip the normalization of the path. That is the procedure where curl otherwise removes sequences of dot-slash and dot-dot etc. |
-| <php>CurlUrl::ALLOW_SPACE</php> | If set, the URL parser allows space (ASCII 32) where possible. The URL syntax does normally not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless |+| <php>CurlUrl::ALLOW_SPACE</php> | If set, the URL parser allows space (ASCII 32) where possible. The URL syntax normally does not allow spaces anywhere, but they should be encoded as %20 or '+'. When spaces are allowed, they are still not allowed in the scheme. When space is used and allowed in a URL, it will be stored as-is unless |
  
  
Line 537: Line 537:
 A first vote (⅔rds) to allow the introduction of a new Curl Url API A first vote (⅔rds) to allow the introduction of a new Curl Url API
  
-An STV vote to choose for the implementation.+An STV vote to choose the implementation.
  
 With STV you SHOULD rank **all** the choices in order. Don't pick the same option more than once, as that invalidates your vote. With STV you SHOULD rank **all** the choices in order. Don't pick the same option more than once, as that invalidates your vote.
rfc/curl-url-api.txt · Last modified: 2022/07/19 16:43 by pierrick