rfc:xmlreader_writer_streams

This is an old revision of the document!


PHP RFC: Add openStream() to XML{Reader,Writer}

Introduction

The XMLReader and XMLWriter class are two classes that deal with XML in a stream-oriented manner. The former is an XML “pull parser”. This means that instead of keeping the data in memory or building a document tree, the document is streamed and the developer can instruct XMLReader to parse chunks at the current cursor. The advantage is that you can process and filter large documents and it is used as a lower-level building block for more complex handling of large XML documents. Similarly, XMLWriter creates an XML document by using functions like startElement and writeElement that is written either to memory or to a stream.

There is however a strange limitation to these classes: they cannot operate on an already-open stream! This is bizarre as the APIs (both internally and user-facing) are stream-oriented. This RFC aims to fix that problem and aims to fix some other inconsistency as well.

Proposal

Backward Incompatible Changes

There are three minor BC breaks.

The first one is the fact that we're adding the openStream() method. If a user extends the XMLReader or XMLWriter class, and their extension implements a method with the same name but an incompatible signature, a compile error will occur. I analyzed the top 2500 Composer packages, and only found one package that contains the method name openStream and it wasn't in a class that extends either classes. This means that the top 2500 packages don't suffer a BC break because of this. That doesn't mean there will be none, but it gives a good indication.

The second BC break is caused by throwing a ValueError on invalid encodings instead of silently ignoring invalid encodings. If we don't signal the invalid encoding in any way to the user, this can subtly hide bugs. For example, this could hide typos or silently pass invalid user input to the respective functions. Forcing developers to handle this error explicitly will result in more robust code in the end.

The third BC break is the promotion of the NULL-byte warning to a ValueError. This makes the XMLReader and XMLWriter class more consistent with other extensions that throw instead of issuing a warning. The migration for developers should be quite simple: instead of silencing the warning and/or checking the return value of the function, they should use a try-catch construct to handle the error.

Proposed PHP Version(s)

Next PHP 8.x, this is PHP 8.4 at the time of writing.

RFC Impact

To Existing Extensions

Only ext/xmlreader and ext/xmlwriter are affected.

Open Issues

None yet.

Unaffected PHP Functionality

Everything else, why do we have this section?

Future Scope

None yet.

Proposed Voting Choices

One primary vote requiring 2/3rd majority to accept the RFC as a whole.

Patches and Tests

Links to any external patches and tests go here.

If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.

Make it clear if the patch is intended to be the final patch, or is just a prototype.

For changes affecting the core language, you should also provide a patch for the language specification.

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Rejected Features

None yet.

rfc/xmlreader_writer_streams.1713719866.txt.gz · Last modified: 2024/04/21 17:17 by nielsdos