This is an old revision of the document!
PHP RFC: Improve callbacks in ext/dom and ext/xsl
- Version: 0.1
- Date: 2023-11-05
- Author: Niels Dossche, nielsdos@php.net
- Status: Draft
- First Published at: https://wiki.php.net/rfc/improve_callbacks_dom_and_xsl
Introduction
The DOMXPath
class allows developers to run XPath expression queries on a HTML/XML document.
The XSLTProcessor
class allows developers to perform XSL transformations on HTML/XML documents. Both of these internally use XPath expressions, and they support calling PHP functions inside those XPath expressions. However, there is an unfortunate limitation in that only callable strings are supported. This means that closures, first-class callables, and instance methods cannot be used. This proposal aims to improve the callback support such that any callable can be used.
To better understand the proposal, I'll first give some background to demo the usage as it is today. Then I'll introduce the proposal itself.
Background
Important: in all examples in this RFC, I will use the following sample document to work with. I will not repeat this code snippet.
// Set up a sample document with links to cats and dogs $doc = new DOMDocument; $doc->loadXML(<<<XML <animals> <a box_preference="small" href="cat/jill">Jill</a> <a box_preference="medium" href="cat/kes">Kes</a> <a box_preference="medium" href="cat/minou">Minou</a> <a box_preference="large" href="cat/tessa">Tessa</a> <a href="dog/jack">Jack</a> </animals> XML);
Let's take a look at how to use a callback with DOMXPath
.
We have to use DOMXPath::registerPhpFunctions(string|array|null $restrict=null)
to register callbacks. Only after that can we use it in the evaluation.
function my_callback(string $href): bool { return preg_match("/cat/", $href); } $xpath = new DOMXPath($doc); // This is necessary to resolve the php name in the php:function expression down below. $xpath->registerNamespace("php", "http://php.net/xpath"); // This registers the function "my_callback" with the XPath evaluator, such that my_callback can be called. $xpath->registerPhpFunctions("my_callback"); // This selects all <a> tags where my_callback(their href attribute content) returns true. $results = $xpath->evaluate("//a[php:function('my_callback', string(@href))]"); foreach ($results as $result) { echo "Found ", $result->textContent, "\n"; } /* This script outputs: Found Jill Found Kes Found Minou Found Tessa */
Note that this is only the simplest example. You can do much more complex actions with it, even with side-effects. Furthermore, I only showcased a single callback. It's possible to chain multiple callbacks.
Registering multiple callbacks at once is possible too, in that case you have to pass an array to registerPhpFunctions
:
// Now my_first_callback and my_second_callback can be used within XPath $xpath->registerPhpFunctions(["my_first_callback", "my_second_callback"]); // registerPhpFunctions() is additive, so calling it again will make even more functions callable: // now strtoupper() will also be callable. $xpath->registerPhpFunctions("strtoupper");
Finally, if the $restrict
argument is null, then all global functions and static class methods will be callable.
I won't repeat this for XSLTProcessor
, but it works the same way: XSLTProcessor
also has a registerPhpFunctions
method that works exactly the same.
Proposal
Right now, it is impossible to call instance functions or closures. Based on feature request https://bugs.php.net/bug.php?id=38595, I propose the following change.
If you pass an array, entries of the form $key => $value
will be interpreted as “associate the callable $value with name $key”. They key must be a string.
Example:
$xpath->registerPhpFunctions(["function_name" => $callable]); // Now you can use "php:function('function_name', ...)" in XPath expressions to call $callable.
$callable
can be any kind of callable, examples: "MyClass::staticFunc"
, [$this, "instanceFunc"]
, $object->foo(...)
, fn ($argument1, $argument2, etc...) => whatever
, etc.
The behaviour of passing only a string value without a key will remain the same as before.
And you can mix them with $key => $value
entries:
$xpath->registerPhpFunctions([ "function_name" => $callable, "var_dump" ]); // Now $callable can be called using "php:function('function_name', ...)" // and var_dump using "php:function('var_dump', ...)".
Whether the value is callable will now be checked during registration instead of during execution.
registerPhpFunctions error conditions
There will be new error conditions added to registerPhpFunctions
as part of this RFC.
In case the argument is a string:
- If the string is not a callable: argument type error
In case the argument is an array:
- If the value of an array entry is not a callable: argument type error
- If there is no string key, and the value cannot be converted to a string: whatever error
zval_try_get_string
gives. Example: if you doregisterPhpFunctions([ function() {...} ])
: this will throw a “Object of class Closure could not be converted to string” error. - If there is a string key that's empty: argument value error
Exceptions vs warnings during execution
Prior to PHP 8.0, the DOMXPath
class threw warnings when invoking a “php:function(...)” in the following error conditions:
- The handler name is not a string
- The function callback could not be called because it isn't callable
- The function callback wasn't registered
- When trying to return an object from a callback to an XPath expression that is not a DOM object. You can only return objects that have an XML representation.
In PHP 8.0, these were changed to throw exceptions instead of warnings (https://github.com/php/php-src/pull/5418).
XSLTProcessor
has the same error conditions, but still uses warnings to this day.
As part of this proposal, the implementations will be unified and will therefore use exceptions instead of warnings.
Finally, both for DOMXPath
and XSLTProcessor
, if you never called registerPhpFunctions
it will throw a warning instead of throwing an exception. This is inconsistent because if you did call the function but did not register a function that you're trying to call you get an exception instead. I propose to make this an exception too such that it is consistent with the other error conditions.
Method signature
The method signature of registerPhpFunctions
will remain the same, i.e. registerPhpFunctions(string|array|null $restrict=null)
.
You might be wondering why I don't change the string type in that union to callable. There are two reasons:
- callable|array is ambiguous: what does
["foo", "bar"]
mean? Does this mean: register both “foo” and “bar” as functions? Or does this mean: register foo::bar ? - XPath expressions are written as strings, so we have to give a string name to the callable.
Usage examples of the API improvement
Here are some simple, but somewhat realistic examples of how this API improvement can be used.
Here's an example of DOMXPath
with the new API:
class Collector { function __construct(private string $regex, private array $available_boxes) {} function process(DOMDocument $doc) { $xpath = new DOMXPath($doc); $xpath->registerNamespace("php", "http://php.net/xpath"); // This registers the callbacks $xpath->registerPhpFunctions([ "filter" => $this->filter(...), "check_box_preference" => fn (string $box) => in_array($box, $this->available_boxes), ]); $results = $xpath->evaluate(<<<X //a [php:function('filter', string(@href))] [php:function('check_box_preference', string(@box_preference))] X); foreach ($results as $result) { echo "Found ", $result->textContent, "\n"; } } function filter(string $href): bool { return preg_match($this->regex, $href); } } (new Collector("/cat/", ["medium", "large"]))->process($doc);
As you can see, this allows the use of instance methods when you have to carry around state. It also allows the use of closures.
And here's an example of using XSLTProcessor
with the improved API:
class BoxCounter { function __construct(private array $available_boxes) {} function process(DOMDocument $doc) { $xsl = new DOMDocument; $xsl->loadXML(<<<XML <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:php="http://php.net/xsl"> <xsl:template match="//a"> <xsl:if test="php:function('filter', string(@box_preference))"> <xsl:value-of select="."/> </xsl:if> </xsl:template> </xsl:stylesheet> XML); $proc = new XSLTProcessor; $proc->registerPHPFunctions(["filter" => $this->assignBox(...)]); $proc->importStyleSheet($xsl); echo $proc->transformToXML($doc); } function assignBox($size) { if (!@$this->available_boxes[$size]) return false; $this->available_boxes[$size]--; return true; } } (new BoxCounter(["medium" => 1, "large" => 3]))->process($doc);
This is again an example of instance methods in use, but for XSL transformations this time, with a stateful function.
Alternatives
I considered an alternative solution too: https://bugs.php.net/bug.php?id=49567.
In that feature request, the idea is to add a new registerObjectMethods
method instead of extending registerPhpFunctions
.
Or more generally, the alternative is that we add registerCallable(string $name, callable $callable)
. The downside is that this can break BC if there are user child classes of DOMXPath
and XSLTProcessor
that already contain a method named registerCallable
. It might also be confusing for users to have two functions that have almost the same purpose, especially w.r.t. interactions between these two.
Backward Incompatible Changes
Strictly speaking, as the callable validity is checked earlier (i.e. when calling registerPhpFunctions
), this has a subtle break. If the function is not declared yet at the time of calling registerPhpFunctions
, then this will throw an error. Previously this was accepted as long as the function was declared by the time the callback was executed. I think however that this situation is sufficiently rare and easily avoidable.
Proposed PHP Version(s)
Next PHP 8.x, that is 8.4 at the time of writing.
RFC Impact
To Existing Extensions
This affects both the ext/dom and ext/xsl extension. Implementation-wise, the ext/dom extension will gain the shared code to deal with XPath callables because the result set handling (and therefore ext/xsl) already depends on DOM classes anyway.
Open Issues
None.
Unaffected PHP Functionality
Everything else.
Future Scope
None right now.
Proposed Voting Choices
One primary vote requiring 2/3rd majority: “Accept the proposed changes to ext/dom and ext/xsl callbacks?”
Patches and Tests
TODO
Implementation
After the project is implemented, this section should contain
- the version(s) it was merged into
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
- a link to the language specification section (if any)
References
- Pre-RFC pitch: https://externals.io/message/121286
- Feature request: https://bugs.php.net/bug.php?id=38595
- Feature request: https://bugs.php.net/bug.php?id=49567
Rejected Features
Keep this updated with features that were discussed on the mail lists.