====== PHP RFC: Add stream open functions to XML{Reader,Writer} ====== * Version: 0.11.0 * Date: 2024-04-21 * Author: Niels Dossche * Status: Implemented (https://github.com/php/php-src/pull/14030) * First Published at: https://wiki.php.net/rfc/xmlreader_writer_streams ===== Introduction ===== The XMLReader and XMLWriter classes deal with XML in a stream-oriented manner. The former implements an XML "pull parser". This means that instead of keeping the data in memory or building a document tree, the document is streamed and the developer can instruct XMLReader to parse chunks at the current cursor and either process or skip the data. The advantage is that developers can process and filter large documents while requiring few resources. It is most often used as a lower-level building block for more complex handling of large XML documents. Similarly, XMLWriter writes an XML document to a stream or memory by using functions like startElement and writeElement. There is however a strange limitation to these classes: they cannot operate on an already-open stream! This is bizarre as the APIs (both internally and user-facing) are stream-oriented. Streams that are already open are common when working, for example, with HTTP requests, data passed from a framework, or just XML data embedded in an existing stream. The lack of an API that works with already-opened streams causes developers to rely on workarounds, e.g. reading the stream entirely to memory and then using the XMLReader APIs, or writing an XML file using XMLWriter and then having to pass that into an already-open stream. That's just wasteful and needlessly difficult. This RFC aims to fix that problem and aims to fix some other inconsistency as well. ===== Proposal ===== ==== Main Proposal ==== I propose to add 2 new functions, one to XMLReader and one to XMLWriter, to create an instance from a stream. Here is how they would look like: class XMLReader { /** @param resource $stream */ public static function fromStream($stream, ?string $encoding = null, int $flags = 0, ?string $documentUri = null): static {} } class XMLWriter { /** @param resource $stream */ public static function toStream($stream): static {} } The signatures are heavily inspired by the existing function public static XMLReader::open(string $uri, ?string $encoding = null, int $flags = 0): bool|XMLReader that operate on files. However, a major difference is that XMLReader::fromStream() is static-only, whereas the other open functions of XMLReader can either be statically or non-statically called and change their return-value behaviour depending on that. The disadvantage of the existing static methods is that they can only return an instance of XMLReader, therefore when XMLReader is inherited by a user subclass we run into the problem that it doesn't return an object of the right type. We solve this by choosing static as return-type, and letting the method internally call the constructor of the static type (with no arguments). As we seem to move away from overloaded functions, I decided to only make a static method variation available. The $documentUri parameter is used mostly for when libxml outputs error messages, such that you can put an origin name in there. The signature for XMLWriter::toStream() should be self-explanatory. It is also modeled like the other open functions, but it is considerably simpler. You'll also notice the lack of an encoding argument, and that's because this is already handled by the XMLWriter::startDocument() function. While implementing this, I found some strange behaviour regarding the ?string $encoding parameter of the existing functions XMLReader::open() and XMLReader::XML(). The first oddity is that they emit a warning instead of throwing a ValueError when the encoding contains NULL bytes. This is inconsistent with how other functions handle it. I propose to promote this warning to a ValueError instead. The second oddity is that invalid encoding names are ignored entirely. This means that it won't emit a warning or anything, but just silently not set the encoding. This can hide bugs. I propose to also throw a ValueError in this case stating "Argument #X ($encoding) must be a valid character encoding". An earlier version of this RFC proposed adding openStream() methods to both classes, but the naming was not ideal and the behaviour of being an instance method was not liked. Therefore, this was changed to static-only methods that return an instance of the respective class. ==== Consistency ==== We're adding new static named constructors to the XMLWriter and XMLReader classes. However, XMLWriter doesn't have static constructors yet and XMLReader has this hybrid static methods/instance methods we talked about earlier. Those existing methods also can't be used in subclasses because they return XMLWriter or XMLReader instead of static. The idea is to add the following static named constructors for consistency with the newly proposed methods, with the same arguments as their existing counterpart: - XMLReader::fromUrl(string $url, ?string $encoding = null, int $flags = 0): static as a new version of XMLReader::open(...) - XMLReader::fromString(string $source, ?string $encoding = null, int $flags = 0): static as a new version of XMLReader::XML(...) - XMLWriter::toMemory(): static as a new version of XMLWriter::openMemory(...) - XMLWriter::toUrl(string $url): static as a new version of XMLWriter::openUri(...) Note I used Url here instead of Uri because that's the more accurate term: it actually locates the resource instead of just identifying it. This does not aim to deprecate any existing methods. We will merely update the documentation to point users towards the new constructors. ===== Backward Incompatible Changes ===== There are three minor BC breaks. The first one is the fact that we're adding new methods. If a user extends the XMLReader or XMLWriter class, and their class implements a method with the same name but an incompatible signature, a compile error will occur. I analyzed the top 2500 Composer packages, and none used any of the proposed function names in subclasses of the XML classes. This means that the top 2500 packages don't suffer a BC break because of this. That doesn't mean there will be none, but it gives a good indication. The second BC break is caused by throwing a ValueError on invalid encodings instead of silently ignoring invalid encodings. If we don't signal the invalid encoding in any way to the user, this can subtly hide bugs. For example, this could hide typos or silently pass invalid user input to the respective functions. Forcing developers to handle this error explicitly will result in more robust code in the end. The third BC break is the promotion of the NUL-byte warning to a ValueError. This makes the XMLReader and XMLWriter class more consistent with other extensions that throw instead of issuing a warning. The migration for developers should be quite simple: instead of silencing the warning and/or checking the return value of the function, they should use a try-catch construct to handle the error. ===== Example usages ===== ==== Minimal XMLReader example ==== // Could be any stream, but this is for simplicity sake $h = fopen("php://memory", "w+"); fwrite($h, ""); fseek($h, 0); $reader = XMLReader::fromStream($h); while ($reader->read()) { switch ($reader->nodeType) { case XMLReader::ELEMENT: echo "Element: ", $reader->name, "\n"; break; case XMLReader::COMMENT: echo "Comment: ", $reader->value, "\n"; break; } } ==== Minimal XMLWriter example ==== // Could be any stream, but this is for simplicity sake $h = fopen("php://output", "w"); $writer = XMLWriter::toStream($h); $writer->startElement("root"); $writer->writeAttribute("align", "left"); $writer->writeComment("hello"); $writer->endElement(); $amount = $writer->flush(); echo "\nAmount of bytes written: "; var_dump($amount); ===== Proposed PHP Version(s) ===== Next PHP 8.x, this is PHP 8.4 at the time of writing. ===== RFC Impact ===== ==== To Existing Extensions ==== Only ext/xmlreader and ext/xmlwriter are affected. ===== Open Issues ===== None yet. ===== Unaffected PHP Functionality ===== Everything else, why do we have this section? ===== Future Scope ===== None yet. ===== Proposed Voting Choices ===== Two primary votes each requiring 2/3rd majority: one for the main proposal and one for the consistency proposal. Voting started on 2024-06-13 and will end on 2024-06-28. * Yes * No ---- * Yes * No ===== Patches and Tests ===== Implementation PR: https://github.com/php/php-src/pull/14030 ===== Implementation ===== Implemented via https://github.com/php/php-src/pull/14030 ===== References ===== - https://bugs.php.net/bug.php?id=63506 - https://bugs.php.net/bug.php?id=46146 ===== Rejected Features ===== None yet. ===== Changelog ===== - 0.11.0: Incorporate feedback about static methods - 0.10.1: Language fixes - 0.10.0: Static again - 0.9.2: Add example usages of the new APIs. - 0.9.1: Made XMLReader::openStream() non-static instead such that it works with overridden classes. - 0.9: Initial version under discussion