rfc:xml_option_parse_huge

This is an old revision of the document!


PHP RFC: PHP_XML_OPTION_PARSE_HUGE

Introduction

ext/xml allows the user to parse XML in an event-driven way (SAX). The user can register callbacks to be called when certain nodes are encountered while parsing. In a sense, this is a streaming parsing model. This RFC attempts to address a feature request on the old bugtracker: https://bugs.php.net/bug.php?id=68325.

One important piece of background information is that the ext/xml extension can work with two XML parsers: either libexpat, or libxml2. The latter is the most commonly used one of the two. Since libxml2 2.7.6 parsing large inputs is opt-in instead of allowed by default. This is to protect against denial of service attacks. However, this broke a valid use-case of xml_parse and xml_parse_into_struct. Passing large documents to these methods now results in a parse error. There is a workaround for xml_parse by parsing in chunks, but this is a bit cumbersome if the data is already in memory anyway as you'll have to split the data into chunks. Ironically, this increases memory usage instead of preventing blowing up memory usage. For the latter method, that workaround does not work as you cannot use it with chunked parsing.

This proposal aims to solve these issues by introducing a new option.

Proposal

It's possible to set parser options via xml_set_parser_option. The idea is to add a new parser option that takes effect when libxml2 is used. The boolean option will be called XML_OPTION_PARSE_HUGE, so that will be a new integer constant added to the global namespace. The default value is false because that's the behaviour right now, and therefore will still protect against denial of service attacks in case of untrusted documents.

Internally, this option will pass XML_PARSE_HUGE to libxml2, allowing large documents to be parsed without resulting in a parse error. If libexpat is used, this option will do nothing as libexpat does not block loading large documents anyway.

Backward Incompatible Changes

No BC breaks unless the user defined a global constant XML_OPTION_PARSE_HUGE themselves.

Proposed PHP Version(s)

Next PHP 8.x.

RFC Impact

To SAPIs

No changes.

To Existing Extensions

It only impacts ext/xml.

To Opcache

No changes.

New Constants

Adds a single integer constant to the global namespace: XML_OPTION_PARSE_HUGE ( = 5). Intended to be used only inside ext/xml.

php.ini Defaults

No changes.

Open Issues

None yet.

Unaffected PHP Functionality

Everything outside of ext/xml.

Future Scope

None yet.

Proposed Voting Choices

One primary vote (requires 2/3 majority): add PHP_XML_OPTION_PARSE_HUGE?

Patches and Tests

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Links to external references, discussions or RFCs

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/xml_option_parse_huge.1695328869.txt.gz · Last modified: 2023/09/21 20:41 by nielsdos