rfc:dom_additions_84

PHP RFC: New ext-dom features in PHP 8.4

Introduction

The PHP 8.4 development cycle has seen two major improvements to the ext-dom extension already: HTML 5 support, and opt-in spec compliance. This RFC is the final improvement to ext-dom for PHP 8.4: it proposes to add new features to the extension. In particular, we'll be focussing on CSS selector support, filling in missing features, and adding new properties.

Proposal

The RFC consists of multiple sub-proposals bundled together under one RFC to minimize overhead. In this section, we'll discuss each feature separately.

CSS selectors

Element::$innerHTML

This is a property on the Element class defined in the DOM spec: https://html.spec.whatwg.org/#the-innerhtml-property

<PHP> class Element {

      public string $innerHTML;

} </PHP>

Reading from this field will get the serialization of the inner content of the element, writing to it will parse a string into a subtree and replace the contents of the element with the new subtree. If the document is an HTML document, the HTML parser / serializer will be used. If the document is an XML document, the XML parser / serializer will be used. Yes, that means that innerHTML can set XML content, and this is as defined by spec. This naming oopsie is legacy baggage from the spec that stems from the fact that the Element class is shared between XML and HTML documents for interopability.

If the serialization is not well-formed for XML, then a DOMException will be thrown of type SYNTAX_ERR, as defined by the spec.

Parsing documents (or fragments) can cause hard/soft errors. The soft errors are reported via warnings, or if the internal error handling mechanism is used then the errors are stored inside an array. Unless LIBXML_NOERROR is provided, in which case those soft errors are silenced. Note that we don't have a way to provide a parsing option to the innerHTML property, and so we cannot provide a way to silence the errors cleanly. I asked about this on the mailing list (https://externals.io/message/123224) but got no response. This probably means that people are uncertain, and so I choose to not implement the error reporting because it's easier to omit something and add it later than it is to remove something later.

New properties for Document

I propose the additional of several new properties for the Document class to make developing a bit easier:

<PHP> class Document {

      public ?HTMLElement $body;
      /** @readonly */
      public ?HTMLElement $head;
      public string $title;

} </PHP>

These additions are described in the HTML addendum for the DOM specification in https://html.spec.whatwg.org/#document.

The properties should be relatively self-explanatory. <php>$body</php> refers to the body element (if there is one), <php>$head</php> to the head element, and <php>$title</php> to the text inside the title element (which in turn is inside the head element). You can read about all the details using the link above, because it's a bit more complicated when SVG is involved for example, but you should be familiar with these properties from Javascript.

As you can see, this requires adding the HTMLElement class as well. This class extends the Element class. In the future we may add properties on them too but this is left out of this RFC for now. Elements that are within the HTML namespace will now return an instance of HTMLElement instead of Element. For example, <php>$documentElement</php> is a property on the Document class of type Element. If this is an HTML element, we will get an instance of HTMLElement instead of Element. This is all as defined in the spec.

TokenList

I propose to add the TokenList class from the DOM specification to PHP (https://dom.spec.whatwg.org/#interface-domtokenlist):

<PHP> / * @not-serializable * @strict-properties */ final class TokenList implements IteratorAggregate, Countable { private function __construct() {} / @readonly */

  public int $length;
  public function item(int $index): ?string {}
  public function contains(string $token): bool {}
  public function add(string ...$tokens): void {}
  public function remove(string ...$tokens): void {}
  public function toggle(string $token, ?bool $force = null): bool {}
  public function replace(string $token, string $newToken): bool {}
  public function supports(string $token): bool {}
  public string $value;
  public function count(): int {}
  public function getIterator(): \Iterator {}

} </PHP>

An instance of TokenList can be obtained via the <php>Element::$classList</php> property. As of now, its purpose is limited to managing the class names of an element, but the class is built in a way that it represents a set of tokens. On the surface level, it might seem trivial to manage the class names in documents, but that's not quite true. TokenList will consider the classes as a set, handle whitespace merging, iteration, easy manipulations like toggling, ... all for you in an easy-to-use API.

PHP-specific additions

Allowing PHP-specific developer experience improvements

DOM functions like <php>Element::insertAdjacentElement(string $where, Element $element)</php> and <php>Element::insertAdjacentText(string $where, string $data)</php> have a first “where” argument. There are only four valid values for “where”: “beforebegin”, “afterbegin”, “beforeend”, “afterend”. So that's actually an enum in disguise. I propose to make use of the PHP enum feature. This would prevent programming mistakes and make IDE hints much nicer, contributing to a better developer experience. Strictly speaking, this deviates from the DOM spec, but we already model the DOM classes in a way that fits PHP's OOP model. In fact, I'd propose to allow the use of enums where it makes sense in the extension for new APIs. Since the Element class didn't exist prior to the opt-in spec compliance RFC, we can change the signature without affecting users as no releases of PHP 8.4 have been made so far.

In particular, this will result in the following enum and function signatures: <PHP> namespace DOM {

enum AdjacentPosition {
  BeforeBegin,
  AfterBegin,
  BeforeEnd,
  AfterEnd,
}
Element::insertAdjacentElement(string $where, Element $element): ?Element;
Element::insertAdjacentText(string $where, string $data): void;

} </PHP>

API amendments

Initially, the DOM spec-compliance RFC copied the existing APIs from the old DOM classes without a deviation for most APIs. Someone reported that the <php>(DOM)Document::xinclude()</php> has weird return value behaviour. In particular, quoting from the documentation:

Returns the number of XIncludes in the document, -1 if some processing failed, or false if there were no substitutions.

This seems to be caused by an implementation mistake. The more sensical behaviour would be to return false on failure, and the number of substitutions on success. If there were no substitutions the number 0 should be returned. I propose to make this change to the new classes.

Backward Incompatible Changes

None because this RFC only affects classes added in 8.4.

Proposed PHP Version(s)

PHP 8.4.

RFC Impact

To Existing Extensions

Only ext-dom is affected.

Open Issues

None yet.

Unaffected PHP Functionality

Everything outside ext-dom.

Future Scope

I initially planned on including the outerHTML property too. This is very feasible with all the internal DOM work that happened during the PHP 8.4 development cycle. However, given that I haven't seen demand for this, I think my time is better spent with other features. If someone really wants this in 8.4, feel free to make a PoC implementation, should be fairly doable using Lexbor and the current ext-dom internal APIs.

Proposed Voting Choices

One primary yes/no vote requiring 2/3rd majority: “Accept this proposal?”.

Patches and Tests

  1. CSS selector implementation: https://github.com/php/php-src/pull/13819
  2. HTMLDocument properties implementation: https://github.com/php/php-src/pull/13791
  3. PHP-specific extensions implementation: https://github.com/nielsdos/php-src/pull/93

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

  1. innerHTML error handling: https://externals.io/message/123224
  2. HTML spec that defines DOM addendums: https://html.spec.whatwg.org
rfc/dom_additions_84.txt · Last modified: 2024/05/04 22:38 by nielsdos