rfc:xml_option_parse_huge

This is an old revision of the document!


PHP RFC: PHP_XML_OPTION_PARSE_HUGE

Introduction

ext/xml allows the user to parse XML in an event-driven way (SAX). The user can register callbacks to be called when certain nodes are encountered while parsing. In a sense, this is a streaming parsing model. This RFC attempts to address a feature request on the old bugtracker: https://bugs.php.net/bug.php?id=68325.

Let's break it down more clearly:

First, it's important to note that the ext/xml extension can work with two different XML parsers: either libexpat or libxml2, with libxml2 being the more commonly used of the two.

Now, let's get to the heart of the issue:

Starting with libxml2 version 2.7.6, parsing large input data is no longer allowed by default; it must be explicitly enabled. This change was made to prevent potential denial-of-service attacks. However, this modification unintentionally disrupted a legitimate use-case involving the xml_parse and xml_parse_into_struct functions. Attempting to parse large documents using these methods now results in a parsing error. There is a workaround for xml_parse by parsing in chunks, but this is a bit cumbersome if the data is already in memory anyway as you'll have to split the data into chunks. Ironically, this increases memory usage instead of preventing blowing up memory usage. For the latter method, that workaround does not work as you cannot use it with chunked parsing.

This proposal aims to solve this issue by introducing a new option.

Proposal

It's possible to set parser options via xml_set_parser_option. The idea is to add a new parser option that takes effect when libxml2 is used. The boolean option will be called XML_OPTION_PARSE_HUGE, so that will be a new integer constant added to the global namespace. The default value is false because that's the behaviour right now, and therefore will still protect against denial of service attacks in case of untrusted documents.

Internally, this option will pass XML_PARSE_HUGE to libxml2, allowing large documents to be parsed without resulting in a parse error. If libexpat is used, this option will do nothing as libexpat does not block loading large documents anyway.

Backward Incompatible Changes

No BC breaks unless the user defined a global constant XML_OPTION_PARSE_HUGE themselves.

Proposed PHP Version(s)

Next PHP 8.x.

RFC Impact

To SAPIs

No changes.

To Existing Extensions

It only impacts ext/xml.

To Opcache

No changes.

New Constants

Adds a single integer constant to the global namespace: XML_OPTION_PARSE_HUGE ( = 5). Intended to be used only inside ext/xml.

php.ini Defaults

No changes.

Open Issues

None yet.

Unaffected PHP Functionality

Everything outside of ext/xml.

Future Scope

None yet.

Proposed Voting Choices

One primary vote (requires 2/3 majority): add PHP_XML_OPTION_PARSE_HUGE?

Patches and Tests

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Links to external references, discussions or RFCs

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/xml_option_parse_huge.1695329261.txt.gz · Last modified: 2023/09/21 20:47 by nielsdos