This is an old revision of the document!
PHP RFC: PHP_XML_OPTION_PARSE_HUGE
- Version: 0.9
- Date: 2023-09-21
- Author: Niels Dossche, nielsdos@php.net
- Status: Draft
- First Published at: https://wiki.php.net/rfc/xml_option_parse_huge
Introduction
ext/xml allows the user to parse XML in an event-driven way (SAX). The user can register callbacks to be called when certain nodes are encountered while parsing. In a sense, this is a streaming parsing model. This RFC attempts to address a feature request on the old bugtracker: https://bugs.php.net/bug.php?id=68325.
Let's break it down more clearly:
First, it's important to note that the ext/xml extension can work with two different XML parsers: either libexpat or libxml2, with libxml2 being the more commonly used of the two.
Now, let's get to the heart of the issue:
Starting with libxml2 version 2.7.6, parsing large input data is no longer allowed by default; it must be explicitly enabled. This change was made to prevent potential denial-of-service attacks. However, this modification unintentionally disrupted a legitimate use-case involving the xml_parse
and xml_parse_into_struct
functions. Attempting to parse large documents using these methods now results in a parsing error.
There is a workaround for xml_parse
by parsing in chunks, but this is a bit cumbersome if the data is already in memory anyway as you'll have to split the data into chunks. Ironically, this increases memory usage instead of preventing blowing up memory usage. For the latter method, that workaround does not work as you cannot use it with chunked parsing.
This proposal aims to solve these issues by introducing a new option.
Proposal
It's possible to set parser options via xml_set_parser_option
. The idea is to add a new parser option that takes effect when libxml2 is used. The boolean option will be called XML_OPTION_PARSE_HUGE
, so that will be a new integer constant added to the global namespace. The default value is false
because that's the behaviour right now, and therefore will still protect against denial of service attacks in case of untrusted documents.
Internally, this option will pass XML_PARSE_HUGE to libxml2, allowing large documents to be parsed without resulting in a parse error. If libexpat is used, this option will do nothing as libexpat does not block loading large documents anyway.
Backward Incompatible Changes
No BC breaks unless the user defined a global constant XML_OPTION_PARSE_HUGE themselves.
Proposed PHP Version(s)
Next PHP 8.x.
RFC Impact
To SAPIs
No changes.
To Existing Extensions
It only impacts ext/xml.
To Opcache
No changes.
New Constants
Adds a single integer constant to the global namespace: XML_OPTION_PARSE_HUGE ( = 5). Intended to be used only inside ext/xml.
php.ini Defaults
No changes.
Open Issues
None yet.
Unaffected PHP Functionality
Everything outside of ext/xml.
Future Scope
None yet.
Proposed Voting Choices
One primary vote (requires 2/3 majority): add PHP_XML_OPTION_PARSE_HUGE?
Patches and Tests
Implementation: https://github.com/php/php-src/pull/12256
Implementation
After the project is implemented, this section should contain
- the version(s) it was merged into
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
- a link to the language specification section (if any)
References
Links to external references, discussions or RFCs
Rejected Features
Keep this updated with features that were discussed on the mail lists.