rfc:replace_parse_url

PHP RFC: Create RFC Compliant URL Parser

Introduction

This RFC came about for an attempt to resolve Bug #72811. In the attempt, discussion shifted from trying to patch the current implementation of parse_url() to more generally replacing the current one. The discussion then shifted to the inability to remove parse_url() due to BC issues. Ideas formed on creating an immutable class that will take a URL and parse it, exposing the pieces by getters.

The current implementation of parse_url() makes a bunch of exceptions to RFC 3986. I do not know if these are conscious exceptions, or, if parse_url() was never based off of the RFC. After raising this RFC, I was alerted that the RFC, is complimented by WHATWG spec on URLs. The aim of WHATWG is to combine RFC 3986 and RFC 3987. However, WHATWG is a “Living Standard” which makes it subject to change, however frequent. Although it does some good combining the two RFC's, the complexities to have a single PHP parser that would require constant maintaining to adhere to the evolving standard is not exactly practical.

So, this RFC proposes creating a new parser that adheres to the two RFC's. In doing so, if PHP is compiled with mbstring support, would be able to properly support multibyte characters in a URL.

Proposal

<?php
 
class URL {
    public function  __construct(string $url, string|URL $base);
 
    /**
     * $input - The string to be parsed
     * $base - (optional) If $url is relative, this is what it is relative to
     * $encoding_override - (optional) we assume $url is a UTF-8 encoded string, you may change it here
     * $url - (optional) A URL object that should be modified by the parsing of $input.  The return value will be this variable as well
     * $state_override - (optional) begin parting the $input from a specific state.
     */
    static public function parse(string $input[, URL $base[, int $encoding_override[, URL $url[, int $state_override]]]]) : URL;
 
    public function getScheme() : ?string;
    public function getUsername() : ?string;
    public function getPassword() : ?string;
    public function getHostname() : ?string;
    public function getPort() : ?int;
    public function getPath() : ?string;
    public function getQuery() : ?string;
    public function getFragment() : ?string;
 
    public function getAll() : array;
}

Backward Incompatible Changes

None

RFC Impact

To Existing Extensions

standard

Open Issues

  • Deprecate parse_url()? Try and push people into using the new URLParser class.
  • Should parse_url() have a sunset date of PHP8, or PHP9?

Proposed Voting Choices

Requires 2/3

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

rfc/replace_parse_url.txt · Last modified: 2021/03/27 14:57 by ilutov