rfc:replace_parse_url

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:replace_parse_url [2016/10/10 20:02] bp1222rfc:replace_parse_url [2021/03/27 14:57] (current) – Move to inactive ilutov
Line 1: Line 1:
-====== PHP RFC: Create URL and URLBuilder Classes ====== +====== PHP RFC: Create RFC Compliant URL Parser ====== 
-  * Version: 0.2+  * Version: 0.3
   * Date: 2016-10-04   * Date: 2016-10-04
   * Author: David Walker (dave@mudsite.com)   * Author: David Walker (dave@mudsite.com)
   * Proposed version: PHP 7.2+   * Proposed version: PHP 7.2+
-  * Status: Draft+  * Status: Inactive
   * First Published at: http://wiki.php.net/rfc/replace_parse_url   * First Published at: http://wiki.php.net/rfc/replace_parse_url
  
Line 10: Line 10:
 This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]].  In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to more generally replacing the current one.  The discussion then shifted to the inability to remove ''parse_url()'' due to BC issues.  Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters. This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]].  In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to more generally replacing the current one.  The discussion then shifted to the inability to remove ''parse_url()'' due to BC issues.  Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters.
  
-The current implementation of ''parse_url()'' makes a bunch of exceptions to [[https://tools.ietf.org/html/rfc3986|RFC 3986]].  I do not know if these are conscious exceptions, or, if ''parse_url()'' was never based off of following the RFC.  After raising this RFC, I was alerted that the RFC, is itself, generally superseded by [[https://url.spec.whatwg.org|WHATWG]] spec on URLs.  This is a more practical specification to how URLs exist in the real-world.+The current implementation of ''parse_url()'' makes a bunch of exceptions to [[https://tools.ietf.org/html/rfc3986|RFC 3986]].  I do not know if these are conscious exceptions, or, if ''parse_url()'' was never based off of the RFC.  After raising this RFC, I was alerted that the RFC, is complimented by [[https://url.spec.whatwg.org|WHATWG]] spec on URLs.  The aim of WHATWG is to combine RFC 3986 and [[https://tools.ietf.org/html/rfc3987|RFC 3987]].  However, WHATWG is a "Living Standard" which makes it subject to change, however frequent.  Although it does some good combining the two RFC's, the complexities to have a single PHP parser that would require constant maintaining to adhere to the evolving standard is not exactly practical.
  
-So, this RFC proposes creating two new classes, URLParser and URLBuilder.  The former will be an immutable classthat is constructed with a URL to be parsed.  There will be methods to access each piece of the URL, as well as a general getter, that will accept a string of flags that will return requested portions in an array.  The complimentary to this will be URLBuilder, which will expose methods to set, or add, pieces to a URL, and a method to get the built value.+So, this RFC proposes creating a new parser that adheres to the two RFC's.  In doing soif PHP is compiled with mbstring support, would be able to properly support multibyte characters in a URL.
  
 ===== Proposal ===== ===== Proposal =====
Line 18: Line 18:
 <?php <?php
  
-interface URLInterface +class URL 
-    public getScheme() : ?string+    public function  __construct(string $url, string|URL $base); 
-    public getUsername() : ?string; +     
-    public getPassword() : ?string; +    /** 
-    public getHostname() : ?string; +     * $input - The string to be parsed 
-    public getPort() : ?int; +     * $base - (optionalIf $url is relative, this is what it is relative to 
-    public getPath() : ?string; +     * $encoding_override - (optionalwe assume $url is a UTF-8 encoded string, you may change it here 
-    public getQuery() : ?string; +     * $url - (optionalA URL object that should be modified by the parsing of $input.  The return value will be this variable as well 
-    public getFragment() : ?string; +     * $state_override - (optionalbegin parting the $input from a specific state. 
-+     */ 
- +    static public function parse(string $input[, URL $base[, int $encoding_override[, URL $url[, int $state_override]]]]) : URL; 
-class URL implements URLInterface { +     
-    public __construct(string $url); +    public function getScheme() : ?string; 
-    public setScheme(?string) : URL; +    public function getUsername() : ?string; 
-    public setUsername(?string) : URL; +    public function getPassword() : ?string; 
-    public setPassword(?string) : URL; +    public function getHostname() ?string; 
-    public setHostname(?string) : URL+    public function getPort() : ?int
-    public setPort(?int) : URL; +    public function getPath() : ?string; 
-    public setPath(?string) : URL+    public function getQuery() : ?string
-    public setQuery(?string) : URL; +    public function getFragment() : ?string; 
-    public setFragment(?string) : URL; +     
-+    public function getAll() : array;
- +
-class URLImmutable implements URLInterface { +
-    public __construct(string $url)+
-    public setScheme(?string) : URLImmutable+
-    public setUsername(?string) : URLImmutable+
-    public setPassword(?string) : URLImmutable; +
-    public setHostname(?string) : URLImmutable+
-    public setPort(?int) : URLImmutable+
-    public setPath(?string) : URLImmutable; +
-    public setQuery(?string) : URLImmutable+
-    public setFragment(?string) : URLImmutable;+
 } }
  
Line 61: Line 50:
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
 standard standard
- 
-===== Future Scope ===== 
-Discussion brought forward the other half of this change being a URLBuilder class that is mutable.  It would allow users to specify each portion of a URL without worrying about managing correct syntax. 
  
 ===== Open Issues ===== ===== Open Issues =====
Line 70: Line 56:
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-Vote to replace ''parse_url()'' with an re2c parser, and require standard compliant URI formats. 
 Requires 2/3 Requires 2/3
  
Line 80: Line 65:
  
 ===== References ===== ===== References =====
-PR with working Implementation: [[https://github.com/php/php-src/pull/2079]] 
rfc/replace_parse_url.1476129747.txt.gz · Last modified: 2017/09/22 13:28 (external edit)