rfc:replace_parse_url

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
rfc:replace_parse_url [2016/10/03 22:48] – created bp1222rfc:replace_parse_url [2021/03/27 14:57] (current) – Move to inactive ilutov
Line 1: Line 1:
-====== PHP RFC: Replace parse_url() ====== +====== PHP RFC: Create RFC Compliant URL Parser ====== 
-  * Version: 0.1 +  * Version: 0.3 
-  * Date: 2016-10-03+  * Date: 2016-10-04
   * Author: David Walker (dave@mudsite.com)   * Author: David Walker (dave@mudsite.com)
-  * Status: Draft+  * Proposed version: PHP 7.2+ 
 +  * Status: Inactive
   * First Published at: http://wiki.php.net/rfc/replace_parse_url   * First Published at: http://wiki.php.net/rfc/replace_parse_url
  
 ===== Introduction ===== ===== Introduction =====
-This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]].  In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to replacing it with an re2c based parser.  The current implementation of ''parse_url()'' does not respect [[https://tools.ietf.org/html/rfc3986|RFC 3986]] with regard to most of the components of a URL.  The bug in question noted that+This RFC came about for an attempt to resolve [[https://bugs.php.net/bug.php?id=72811|Bug #72811]].  In the attempt, discussion shifted from trying to patch the current implementation of ''parse_url()'' to more generally replacing the current one.  The discussion then shifted to the inability to remove ''parse_url()'' due to BC issues.  Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters.
  
-<file php> +The current implementation of ''parse_url()'' makes a bunch of exceptions to [[https://tools.ietf.org/html/rfc3986|RFC 3986]].  I do not know if these are conscious exceptions, or, if ''parse_url()'' was never based off of the RFC After raising this RFC, I was alerted that the RFC, is complimented by [[https://url.spec.whatwg.org|WHATWG]] spec on URLs.  The aim of WHATWG is to combine RFC 3986 and [[https://tools.ietf.org/html/rfc3987|RFC 3987]].  However, WHATWG is a "Living Standard" which makes it subject to changehowever frequent.  Although it does some good combining the two RFC's, the complexities to have a single PHP parser that would require constant maintaining to adhere to the evolving standard is not exactly practical.
-<?php +
-var_dump(parse_url("127.0.0.1:80", PHP_URL_HOST));+
  
-/* Outputs: +So, this RFC proposes creating new parser that adheres to the two RFC's.  In doing soif PHP is compiled with mbstring supportwould be able to properly support multibyte characters in a URL.
-string(9) "127.0.0.1" +
-*/ +
-</file> +
- +
-While we all may agree that this is sensible, and totally expected, it is actually lie.  That is not how the RFC defines how that string should be interpreted.  It should parse as a single PATH element ''string(12) "127.0.0.1:80"''.  Why?  Well the RFC defines the ''hier-part''which contains the host portionof the URI to be after a double-slash, to which the example lacks.  This would result in the ''path-noscheme'' portion of the parsing to match beginning at the ''1'' and fill the path until ''?'' or ''#'' is found.+
  
 ===== Proposal ===== ===== Proposal =====
-The proposal of this RFC is two fold.  One, replace the current parser used for ''parse_url()'' to utilize re2c.  Two, ensure ''parse_url()'' more closely follows the RFC.  The function signature will not change, however, the return value will be more consistent.+<file php> 
 +<?php
  
-The function can return +class URL { 
-  * An array consisting of each component of the URI found. +    public function  __construct(string $url, string|URL $base); 
-  string|int of the component requested by the 2nd argument +     
-  NULL when we can not parse the URIorthe component request contains no value+    /*
 +     $input - The string to be parsed 
 +     * $base - (optional) If $url is relative, this is what it is relative to 
 +     * $encoding_override - (optional) we assume $url is a UTF-8 encoded string, you may change it here 
 +     * $url - (optional) A URL object that should be modified by the parsing of $input.  The return value will be this variable as well 
 +     $state_override - (optional) begin parting the $input from a specific state. 
 +     */ 
 +    static public function parse(string $input[URL $base[int $encoding_override[, URL $url[, int $state_override]]]]) : URL; 
 +     
 +    public function getScheme() : ?string; 
 +    public function getUsername() : ?string; 
 +    public function getPassword() : ?string; 
 +    public function getHostname() : ?string; 
 +    public function getPort() : ?int; 
 +    public function getPath() : ?string; 
 +    public function getQuery() : ?string; 
 +    public function getFragment() : ?string; 
 +     
 +    public function getAll() : array; 
 +}
  
-===== Discussion Points ===== +</file>
-==== RFC Break ==== +
-I do make a single exception and break with the RFC in one place.  The RFC does not permit curly-braces within a query component.  For instance ''http://example.net/index.php?q={fullname}'', where the RFC would define the path as being ''q='', I don't feel this is accurate as ''{'' and ''}'' are not special markers within an URI and should otherwise be treated as part of the string.+
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-Many of the tests that were developed for the current implementation of ''parse_url()'' have been changed to reflect a more standards compliant test.  This change will break anyone who is using the function with a non-standards compliant URI format.  This is the most problematic in terms of a BC break.  By this point, many people who use ''parse_url()'' might expect it to work in a, lets say, forgiving manner.  The example provided in the bug report is a perfect example of what I feel is a common use case of this function which will no longer act in a standards compliant method. +None
- +
-===== Proposed PHP Version(s) ===== +
-PHP 7.2, or later+
  
 ===== RFC Impact ===== ===== RFC Impact =====
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
-standard +standard
  
 ===== Open Issues ===== ===== Open Issues =====
-Make sure there are no open issues when the vote starts! +  * Deprecate ''parse_url()''?  Try and push people into using the new URLParser class
- +  * Should ''parse_url()'' have a sunset date of PHP8or PHP9?
-===== Unaffected PHP Functionality ===== +
-List existing areas/features of PHP that will not be changed by the RFC+
- +
-This helps avoid any ambiguity, shows that you have thought deeply about the RFC's impactand helps reduces mail list noise.+
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-Vote to replace ''parse_url()'' with an re2c parser, and require standard compliant URI formats. 
 Requires 2/3 Requires 2/3
  
Line 61: Line 65:
  
 ===== References ===== ===== References =====
-PR with working Implementation: [[https://github.com/php/php-src/pull/2079]] 
rfc/replace_parse_url.1475534920.txt.gz · Last modified: 2017/09/22 13:28 (external edit)