rfc:replace_parse_url

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
rfc:replace_parse_url [2016/10/10 20:02] bp1222rfc:replace_parse_url [2017/09/22 13:28] – external edit 127.0.0.1
Line 1: Line 1:
-====== PHP RFC: Create URL and URLBuilder Classes ====== +====== PHP RFC: Create RFC Compliant URL Parser ====== 
-  * Version: 0.2+  * Version: 0.3
   * Date: 2016-10-04   * Date: 2016-10-04
   * Author: David Walker (dave@mudsite.com)   * Author: David Walker (dave@mudsite.com)
Line 10: Line 10:
 This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]].  In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to more generally replacing the current one.  The discussion then shifted to the inability to remove ''parse_url()'' due to BC issues.  Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters. This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]].  In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to more generally replacing the current one.  The discussion then shifted to the inability to remove ''parse_url()'' due to BC issues.  Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters.
  
-The current implementation of ''parse_url()'' makes a bunch of exceptions to [[https://tools.ietf.org/html/rfc3986|RFC 3986]].  I do not know if these are conscious exceptions, or, if ''parse_url()'' was never based off of following the RFC.  After raising this RFC, I was alerted that the RFC, is itself, generally superseded by [[https://url.spec.whatwg.org|WHATWG]] spec on URLs.  This is a more practical specification to how URLs exist in the real-world.+The current implementation of ''parse_url()'' makes a bunch of exceptions to [[https://tools.ietf.org/html/rfc3986|RFC 3986]].  I do not know if these are conscious exceptions, or, if ''parse_url()'' was never based off of the RFC.  After raising this RFC, I was alerted that the RFC, is complimented by [[https://url.spec.whatwg.org|WHATWG]] spec on URLs.  The aim of WHATWG is to combine RFC 3986 and [[https://tools.ietf.org/html/rfc3987|RFC 3987]].  However, WHATWG is a "Living Standard" which makes it subject to change, however frequent.  Although it does some good combining the two RFC's, the complexities to have a single PHP parser that would require constant maintaining to adhere to the evolving standard is not exactly practical.
  
-So, this RFC proposes creating two new classes, URLParser and URLBuilder.  The former will be an immutable classthat is constructed with a URL to be parsed.  There will be methods to access each piece of the URL, as well as a general getter, that will accept a string of flags that will return requested portions in an array.  The complimentary to this will be URLBuilder, which will expose methods to set, or add, pieces to a URL, and a method to get the built value.+So, this RFC proposes creating a new parser that adheres to the two RFC's.  In doing soif PHP is compiled with mbstring support, would be able to properly support multibyte characters in a URL.
  
 ===== Proposal ===== ===== Proposal =====
Line 18: Line 18:
 <?php <?php
  
-interface URLInterface +class URL 
-    public getScheme() : ?string+    public function  __construct(string $url, string|URL $base); 
-    public getUsername() : ?string; +     
-    public getPassword() : ?string; +    /** 
-    public getHostname() : ?string; +     * $input - The string to be parsed 
-    public getPort() : ?int; +     * $base - (optionalIf $url is relative, this is what it is relative to 
-    public getPath() : ?string; +     * $encoding_override - (optionalwe assume $url is a UTF-8 encoded string, you may change it here 
-    public getQuery() : ?string; +     * $url - (optionalA URL object that should be modified by the parsing of $input.  The return value will be this variable as well 
-    public getFragment() : ?string; +     * $state_override - (optionalbegin parting the $input from a specific state. 
-+     */ 
- +    static public function parse(string $input[, URL $base[, int $encoding_override[, URL $url[, int $state_override]]]]) : URL; 
-class URL implements URLInterface { +     
-    public __construct(string $url); +    public function getScheme() : ?string; 
-    public setScheme(?string) : URL; +    public function getUsername() : ?string; 
-    public setUsername(?string) : URL; +    public function getPassword() : ?string; 
-    public setPassword(?string) : URL; +    public function getHostname() ?string; 
-    public setHostname(?string) : URL+    public function getPort() : ?int
-    public setPort(?int) : URL; +    public function getPath() : ?string; 
-    public setPath(?string) : URL+    public function getQuery() : ?string
-    public setQuery(?string) : URL; +    public function getFragment() : ?string; 
-    public setFragment(?string) : URL; +     
-+    public function getAll() : array;
- +
-class URLImmutable implements URLInterface { +
-    public __construct(string $url)+
-    public setScheme(?string) : URLImmutable+
-    public setUsername(?string) : URLImmutable+
-    public setPassword(?string) : URLImmutable; +
-    public setHostname(?string) : URLImmutable+
-    public setPort(?int) : URLImmutable+
-    public setPath(?string) : URLImmutable; +
-    public setQuery(?string) : URLImmutable+
-    public setFragment(?string) : URLImmutable;+
 } }
  
Line 61: Line 50:
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
 standard standard
- 
-===== Future Scope ===== 
-Discussion brought forward the other half of this change being a URLBuilder class that is mutable.  It would allow users to specify each portion of a URL without worrying about managing correct syntax. 
  
 ===== Open Issues ===== ===== Open Issues =====
Line 70: Line 56:
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-Vote to replace ''parse_url()'' with an re2c parser, and require standard compliant URI formats. 
 Requires 2/3 Requires 2/3
  
Line 80: Line 65:
  
 ===== References ===== ===== References =====
-PR with working Implementation: [[https://github.com/php/php-src/pull/2079]] 
rfc/replace_parse_url.txt · Last modified: 2021/03/27 14:57 by ilutov