Both sides previous revisionPrevious revisionNext revision | Previous revision |
rfc:replace_parse_url [2016/10/10 16:24] – bp1222 | rfc:replace_parse_url [2021/03/27 14:57] (current) – Move to inactive ilutov |
---|
====== PHP RFC: Create URLParser Class ====== | ====== PHP RFC: Create RFC Compliant URL Parser ====== |
* Version: 0.2 | * Version: 0.3 |
* Date: 2016-10-04 | * Date: 2016-10-04 |
* Author: David Walker (dave@mudsite.com) | * Author: David Walker (dave@mudsite.com) |
* Proposed version: PHP 7.2+ | * Proposed version: PHP 7.2+ |
* Status: Draft | * Status: Inactive |
* First Published at: http://wiki.php.net/rfc/replace_parse_url | * First Published at: http://wiki.php.net/rfc/replace_parse_url |
| |
This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]]. In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to more generally replacing the current one. The discussion then shifted to the inability to remove ''parse_url()'' due to BC issues. Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters. | This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]]. In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to more generally replacing the current one. The discussion then shifted to the inability to remove ''parse_url()'' due to BC issues. Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters. |
| |
The current implementation of ''parse_url()'' makes a bunch of exceptions to [[https://tools.ietf.org/html/rfc3986|RFC 3986]]. I do not know if these are conscious exceptions, or, if ''parse_url()'' was never based off of following the RFC. After raising this RFC, I was alerted that the RFC, is itself, generally superseded by [[https://url.spec.whatwg.org|WHATWG]] spec on URLs. This is a more practical specification to how URLs exist in the real-world. | The current implementation of ''parse_url()'' makes a bunch of exceptions to [[https://tools.ietf.org/html/rfc3986|RFC 3986]]. I do not know if these are conscious exceptions, or, if ''parse_url()'' was never based off of the RFC. After raising this RFC, I was alerted that the RFC, is complimented by [[https://url.spec.whatwg.org|WHATWG]] spec on URLs. The aim of WHATWG is to combine RFC 3986 and [[https://tools.ietf.org/html/rfc3987|RFC 3987]]. However, WHATWG is a "Living Standard" which makes it subject to change, however frequent. Although it does some good combining the two RFC's, the complexities to have a single PHP parser that would require constant maintaining to adhere to the evolving standard is not exactly practical. |
| |
So, this RFC proposes creating two new classes, URLParser and URLBuilder. The former will be an immutable class, that is constructed with a URL to be parsed. There will be methods to access each piece of the URL, as well as a general getter, that will accept a string of flags that will return requested portions in an array. The complimentary to this will be URLBuilder, which will expose methods to set, or add, pieces to a URL, and a method to get the built value. | So, this RFC proposes creating a new parser that adheres to the two RFC's. In doing so, if PHP is compiled with mbstring support, would be able to properly support multibyte characters in a URL. |
| |
===== Proposal ===== | ===== Proposal ===== |
<?php | <?php |
| |
class URLParser { | class URL { |
public __construct(string $url); | public function __construct(string $url, string|URL $base); |
public getScheme() : ?string; | |
public getUsername() : ?string; | /** |
public getPassword() : ?string; | * $input - The string to be parsed |
public getHostname() : ?string; | * $base - (optional) If $url is relative, this is what it is relative to |
public getPort() : ?int; | * $encoding_override - (optional) we assume $url is a UTF-8 encoded string, you may change it here |
public getPath() : ?string; | * $url - (optional) A URL object that should be modified by the parsing of $input. The return value will be this variable as well |
public getQuery() : ?string; | * $state_override - (optional) begin parting the $input from a specific state. |
public getFragment() : ?string; | */ |
| static public function parse(string $input[, URL $base[, int $encoding_override[, URL $url[, int $state_override]]]]) : URL; |
| |
| public function getScheme() : ?string; |
| public function getUsername() : ?string; |
| public function getPassword() : ?string; |
| public function getHostname() : ?string; |
| public function getPort() : ?int; |
| public function getPath() : ?string; |
| public function getQuery() : ?string; |
| public function getFragment() : ?string; |
| |
| public function getAll() : array; |
} | } |
| |
</file> | </file> |
| |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== |
standard | standard |
| |
===== Future Scope ===== | |
Discussion brought forward the other half of this change being a URLBuilder class that is mutable. It would allow users to specify each portion of a URL without worrying about managing correct syntax. | |
| |
===== Open Issues ===== | ===== Open Issues ===== |
| |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== |
Vote to replace ''parse_url()'' with an re2c parser, and require standard compliant URI formats. | |
Requires 2/3 | Requires 2/3 |
| |
| |
===== References ===== | ===== References ===== |
PR with working Implementation: [[https://github.com/php/php-src/pull/2079]] | |