rfc:dom_living_standard_api

PHP RFC: Implement Current DOM Living Standard API

Introduction

Working with XML (HTML) documents is a necessary task for many web applications and the dom extension implements a standardized API that was previously specified by a w3 group into 3 DOM Levels. Since then the standard has evolved and is now a Living Standard similar to HTML 5 and continously evolving and maintained by the Web Hypertext Application Technology Working Group (WHATWG).

- https://dom.spec.whatwg.org

Because the new API provides much improved traversal and manipulation APIs than the old API we propose to add the new methods to the existing ext/dom API.

Specifically we think this is a better solution to providing them in userland, because

  1. ext/dom + DOMDocument represents the DOM Standard, so we should continue to support the evolving versions.
  2. the added methods are a huge value add to users and fix a lot of more complicated approaches that were previously required by users. The search or re-implementation costs are high for users.

In addition this RFC seeks the approval to continue adding new methods and properties from the DOM Living Standard without RFCs in the future as long as they are implemented in a backwards compatible way.

Proposal

Follow the DOM Living Standard with ext/dom

This RFC proposes to adapt the current DOM standard changes to the PHP langauge by introducing new interfaces and public properties that simplify traversal and manipulation of DOM elements.

<?php
interface DOMParentNode
{
    public readonly DOMNode? $previousElementNode;
    public readonly DOMNode? $nextElementNode;
    public readonly int $childElementCount;
 
    public function append(...DOMNode|string|null $nodes) : void;
    public function prepend(...DOMNode|string|null $nodes) : void;
}
 
class DOMDocument implements DOMParentNode {}
class DOMElement implements DOMParentNode {}
class DOMDocumentFragment implements DOMParentNode {}
 
interface DOMChildNode
{
    public readonly DOMNode? $previousElementSibling;
    public readonly DOMNode? $nextElementSibling;
 
    public function remove() : void;
    public function before(...DOMNode|string|null $nodes) : void;
    public function after(...DOMNode|string|null $nodes) : void;
    public function replaceWith(...DOMNode|string|null $nodes) : void;
}
 
class DOMElement implements DOMChildNode {}
class DOMCharacterData implements DOMChildNode {}
class DOMDocumentType implements DOMChildNode {}

Implementation choices:

The standard implements these interfaces as “traits” and doesn't provide interfaces for them. This might make more sense with the primary language target (JavaScript), but for PHP it makes more sense to have the functionality available through an interface.

The standard contains an intermediate interface DOMNonDocumentTypeChildNode that contains the previousElementSibling and nextElementSibling properties. This is introduced to provide backwards compatibility with browser/web implementations, which are not our concern. In addition PHP interfaces cannot have properties, so it wouldn't make sense to add this empty interface. For this reason this class was not introduced, but the properties are instead on DOMChildNode directly.

The querySelector and querySelectorAll methods defined on the DOMParentNode interface are omitted, because of their complexity we recommend to leave comparable implementations of these methods to userland libraries such as PhpCss or Symfony CSS Selector.

Removing unimplemented classes

The latest DOM standard removes a lot of APIs that are not strictly necessary working with XML and especially HTML in the browser.

https://dom.spec.whatwg.org/#dom-core-changes

A lot of those classes exist in ext/dom, but don't actually do anything or throw not implemented exceptions.

  • DOMConfiguration
  • DOMDomError
  • DOMErrorHandler
  • DOMImplementationList
  • DOMImplementationSource
  • DOMLocator
  • DOMObject
  • DOMUserData
  • DOMNameList
  • DOMTypeInfo
  • DOMUserDataHandler

Not only do they return “TEST” strings in property handlers, they are also mostly undocumented and there is no harm removing them. Example:

$ php -r 'var_dump(new DOMNameList());'
object(DOMNameList)#1 (1) {
  ["length"]=>
  string(4) "TEST"
}

Notable exception are DOMEntity, DOMNotation and DOMEntityReference, which are usable from DOMDocument#createEntityReference and through DOMDocumentType#entites and notations properties.

The DOMImplementation class

The DOM Specification Level 2-3 introduced DOMImplementation::hasFeature() that were supposed to be used by clients to check for the version of a specification. The living documentation has the following to say about this:

hasFeature() originally would report whether the user agent claimed to support a given DOM feature, but experience proved it was not nearly as reliable or granular as simply checking whether the desired objects, attributes, or methods existed. As such, it is no longer to be used, but continues to exist (and simply returns true) so that old pages don’t stop working.

Changing this would constitute a small BC break, given that this is probably not used by anyone.

Currently DOMImplementation::hasFeature only returns true for “Core” and “1.0” (and for “XML” versions), since 2.0 and 3.0 levels were never fully implemented for ext/dom. However most of the important level 2+3 features have always been implemented. As such users cannot rely on this API anyways.

We propose to keep this method untouched and recommend users to use instanceof, property_exists, or method_exists to test for existance of features as proposed by the DOM Standard documentation.

Backward Incompatible Changes

  1. removing the classes that are undocumented, unimplemented and contain dummy data.
  2. Code using registerNodeClass to overwrite DOM classes can be affected IF they already implement the new functionality in a way that doesn't satisfy the behavior or signature of this proposed code changes.

Proposed PHP Version(s)

PHP 7.4

RFC Impact

To SAPIs

No effect on SAPIs.

To Existing Extensions

The dom extensions API is changed in a backwards compatible way (only adding new properties/methods).

The new functionality can all be implemented entirely using the already available libxml2 datastructures, so no changes to the libxml2 dependency is nceessary.

To Opcache

No effect on Opcache.

Future Scope

In the future this RFC seeks approval to add more functionality of the DOM Living Standard to ext/dom as long as no backwards incompatible changes are introduced.

Patches and Tests

https://github.com/beberlei/php-src/pull/1

This pull request is still work in progress.

Implementation

tbd

References

- DOM Living Standard Document https://dom.spec.whatwg.org

rfc/dom_living_standard_api.txt · Last modified: 2019/04/06 12:48 by beberlei