rfc:improve_callbacks_dom_and_xsl

This is an old revision of the document!


PHP RFC: Improve callbacks in ext/dom and ext/xsl

Introduction

The DOMXPath class allows developers to run XPath expression queries on a HTML/XML document. The XSLTProcessor class allows developers to perform XSL transformations on HTML/XML documents. Both of these internally use XPath expressions, and they support calling PHP functions inside those XPath expressions. However, there is an unfortunate limitation in that only callable strings are supported. This means that closures, first-class callables, and instance methods cannot be used. This proposal aims to improve the callback support such that any callable can be used.

To better understand the proposal, I'll first give some background to demo the usage as it is today. Then I'll introduce the proposal itself.

Background

Important: in all examples in this RFC, I will use the following sample document to work with. I will not repeat this code snippet.

// Set up a sample document with links to cats and dogs
$doc = new DOMDocument;
$doc->loadXML(<<<XML
<animals>
    <a box_preference="small" href="cat/jill">Jill</a>
    <a box_preference="medium" href="cat/kes">Kes</a>
    <a box_preference="medium" href="cat/minou">Minou</a>
    <a box_preference="large" href="cat/tessa">Tessa</a>
    <a href="dog/jack">Jack</a>
</animals>
XML);

Let's take a look at how to use a callback with DOMXPath. We have to use DOMXPath::registerPhpFunctions(string|array|null $restrict=null) to register callbacks. Only after that can we use it in the evaluation.

function my_callback(string $href): bool {
    return preg_match("/cat/", $href);
}
 
$xpath = new DOMXPath($doc);
// This is necessary to resolve the php name in the php:function expression down below.
$xpath->registerNamespace("php", "http://php.net/xpath");
// This registers the function "my_callback" with the XPath evaluator, such that my_callback can be called.
$xpath->registerPhpFunctions("my_callback");
// This selects all <a> tags where my_callback(their href attribute content) returns true.
$results = $xpath->evaluate("//a[php:function('my_callback', string(@href))]");
foreach ($results as $result) {
    echo "Found ", $result->textContent, "\n";
}
 
/* This script outputs:
Found Jill
Found Kes
Found Minou
Found Tessa
*/

Note that this is only the simplest example. You can do much more complex actions with it, even with side-effects. Furthermore, I only showcased a single callback. It's possible to chain multiple callbacks.

Registering multiple callbacks at once is possible too, in that case you have to pass an array to registerPhpFunctions:

// Now my_first_callback and my_second_callback can be used within XPath
$xpath->registerPhpFunctions(["my_first_callback", "my_second_callback"]);
 
// registerPhpFunctions() is additive, so calling it again will make even more functions callable:
// now strtoupper() will also be callable.
$xpath->registerPhpFunctions("strtoupper");

Finally, if the $restrict argument is null, then all global functions and static class methods will be callable.

I won't repeat this for XSLTProcessor, but it works the same way: XSLTProcessor also has a registerPhpFunctions method that works exactly the same.

Proposal

Right now, it is impossible to call instance functions or closures. Based on feature request https://bugs.php.net/bug.php?id=38595, I propose the following change.

If you pass an array, entries of the form $key => $value will be interpreted as “associate the callable $value with name $key”. They key must be a string. Example:

$xpath->registerPhpFunctions(["function_name" => $callable]);
// Now you can use "php:function('function_name', ...)" in XPath expressions to call $callable.

$callable can be any kind of callable, examples: "MyClass::staticFunc", [$this, "instanceFunc"], $object->foo(...), fn ($argument1, $argument2, etc...) => whatever, etc.

The behaviour of passing only a string value without a key will remain the same as before. And you can mix them with $key => $value entries:

$xpath->registerPhpFunctions([
  "function_name" => $callable,
  "var_dump"
]);
// Now $callable can be called using "php:function('function_name', ...)"
// and var_dump using "php:function('var_dump', ...)".

Whether the value is callable will now be checked during registration instead of during execution.

registerPhpFunctions error conditions

There will be new error conditions added to registerPhpFunctions as part of this RFC.

In case the argument is a string:

  1. If the string is not a callable: argument type error

In case the argument is an array:

  1. If the value is not a callable: argument type error
  2. If there is no string key, and the value cannot be converted to a string: whatever error zval_try_get_string gives. Example: if you do registerPhpFunctions([ function() {...} ]): this will throw a “Object of class Closure could not be converted to string” error.
  3. If there is a string key that's empty: argument value error

Exceptions vs warnings during execution

Prior to PHP 8.0, the DOMXPath class threw warnings when invoking a “php:function(...)” in the following error conditions:

  1. The handler name is not a string
  2. The function callback could not be called because it isn't callable
  3. The function callback wasn't registered
  4. When trying to return an object from a callback to an XPath expression that is not a DOM object. You can only return objects that have an XML representation.

In PHP 8.0, these were changed to throw exceptions instead of warnings (https://github.com/php/php-src/pull/5418). XSLTProcessor has the same error conditions, but still uses warnings to this day. As part of this proposal, the implementations will be unified and will therefore use exceptions instead of warnings.

Finally, both for DOMXPath and XSLTProcessor, if you never called registerPhpFunctions it will throw a warning instead of throwing an exception. This is inconsistent because if you did call the function but did not register a function that you're trying to call you get an exception instead. I propose to make this an exception too such that it is consistent with the other error conditions.

Method signature

The method signature of registerPhpFunctions will remain the same, i.e. registerPhpFunctions(string|array|null $restrict=null). You might be wondering why I don't change the string type in that union to callable. There are two reasons:

  1. callable|array is ambiguous: what does ["foo", "bar"] mean? Does this mean: register both “foo” and “bar” as functions? Or does this mean: register foo::bar ?
  2. XPath expressions are written as strings, so we have to give a string name to the callable.

Usage examples of the API improvement

Here are some simple, but somewhat realistic examples of how this API improvement can be used.

Here's an example of DOMXPath with the new API:

class Collector {
    function __construct(private string $regex, private array $available_boxes) {}
 
    function process(DOMDocument $doc) {
        $xpath = new DOMXPath($doc);
        $xpath->registerNamespace("php", "http://php.net/xpath");
 
        // This registers the callbacks
        $xpath->registerPhpFunctions([
            "filter" => $this->filter(...),
            "check_box_preference" => fn (string $box) => in_array($box, $this->available_boxes),
        ]);
 
        $results = $xpath->evaluate(<<<X
        //a
        [php:function('filter', string(@href))]
        [php:function('check_box_preference', string(@box_preference))]
        X);
        foreach ($results as $result) {
            echo "Found ", $result->textContent, "\n";
        }
    }
 
    function filter(string $href): bool {
        return preg_match($this->regex, $href);
    }
}
 
(new Collector("/cat/", ["medium", "large"]))->process($doc);

As you can see, this allows the use of instance methods when you have to carry around state. It also allows the use of closures.

And here's an example of using XSLTProcessor with the improved API:

class BoxCounter {
    function __construct(private array $available_boxes) {}
 
    function process(DOMDocument $doc) {
        $xsl = new DOMDocument;
        $xsl->loadXML(<<<XML
        <?xml version="1.0" encoding="iso-8859-1"?>
        <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:php="http://php.net/xsl">
            <xsl:template match="//a">
                <xsl:if test="php:function('filter', string(@box_preference))">
                    <xsl:value-of select="."/>
                </xsl:if>
            </xsl:template>
        </xsl:stylesheet>
        XML);
 
        $proc = new XSLTProcessor;
        $proc->registerPHPFunctions(["filter" => $this->assignBox(...)]);
        $proc->importStyleSheet($xsl);
        echo $proc->transformToXML($doc);
    }
 
    function assignBox($size) {
        if (!@$this->available_boxes[$size]) return false;
        $this->available_boxes[$size]--;
        return true;
    }
}
 
(new BoxCounter(["medium" => 1, "large" => 3]))->process($doc);

This is again an example of instance methods in use, but for XSL transformations this time, with a stateful function.

Alternatives

I considered an alternative solution too: https://bugs.php.net/bug.php?id=49567.

In that feature request, the idea is to add a new registerObjectMethods method instead of extending registerPhpFunctions. Or more generally, the alternative is that we add registerCallable(string $name, callable $callable). The downside is that this can break BC if there are user child classes of DOMXPath and XSLTProcessor that already contain a method named registerCallable.

Backward Incompatible Changes

Strictly speaking, as the callable validity is checked earlier (i.e. when calling registerPhpFunctions), this has a subtle break. If the function is not declared yet at the time of calling registerPhpFunctions, then this will throw an error. Previously this was accepted as long as the function was declared by the time the callback was executed. I think however that this situation is sufficiently rare and easily avoidable. It might also be confusing for users to have two functions that have almost the same purpose, especially w.r.t. interactions between these two.

Proposed PHP Version(s)

Next PHP 8.x, that is 8.4 at the time of writing.

RFC Impact

To Existing Extensions

This affects both the ext/dom and ext/xsl extension. Implementation-wise, the ext/dom extension will gain the shared code to deal with xpath callables because the result set handling already depends on DOM classes. Furthermore, ext/xsl already depends on ext/dom anyway.

Open Issues

None.

Unaffected PHP Functionality

Everything else. Why does this section exist?

Future Scope

None right now.

Proposed Voting Choices

One primary vote requiring 2/3rd majority: “Accept the proposed changes to ext/dom and ext/xsl callbacks?”

Patches and Tests

TODO

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/improve_callbacks_dom_and_xsl.1699221516.txt.gz · Last modified: 2023/11/05 21:58 by nielsdos