This is an old revision of the document!
PHP RFC: PHP_XML_OPTION_PARSE_HUGE
- Version: 0.9
- Date: 2023-09-21
- Author: Niels Dossche, nielsdos@php.net
- Status: Draft
- First Published at: https://wiki.php.net/rfc/xml_option_parse_huge
Introduction
ext/xml allows the user to parse XML in an event-driven way (SAX). The user can register callbacks to be called when certain nodes are encountered while parsing. In a sense, this is a streaming parsing model. This RFC attempts to address a feature request on the old bugtracker: https://bugs.php.net/bug.php?id=68325.
One important piece of background information is that the ext/xml extension can work with two XML parsers: either libexpat, or libxml2. The latter is the most commonly used one of the two. Since libxml2 2.7.6 parsing large inputs is opt-in instead of allowed by default. This is to protect against denial of service attacks. However, this broke a valid use-case of xml_parse
and xml_parse_into_struct
. Passing large documents to these methods now results in a parse error. There is a workaround for xml_parse
by parsing in chunks, but this is a bit cumbersome if the data is already in memory anyway as you'll have to split the data into chunks. Ironically, this increases memory usage instead of preventing blowing up memory usage. For the latter method, that workaround does not work as you cannot use it with chunked parsing.
This proposal aims to solve these issues by introducing a new option.
Proposal
It's possible to set parser options via xml_set_parser_option
. The idea is to add a new parser option that takes effect when libxml2 is used. The boolean option will be called XML_OPTION_PARSE_HUGE
, so that will be a new integer constant added to the global namespace. The default value is false
because that's the behaviour right now, and therefore will still protect against denial of service attacks in case of untrusted documents.
Internally, this option will pass XML_PARSE_HUGE to libxml2, allowing large documents to be parsed without resulting in a parse error. If libexpat is used, this option will do nothing as libexpat does not block loading large documents anyway.
Backward Incompatible Changes
No BC breaks unless the user defined a global constant XML_OPTION_PARSE_HUGE themselves.
Proposed PHP Version(s)
Next PHP 8.x.
RFC Impact
To SAPIs
No changes.
To Existing Extensions
It only impacts ext/xml.
To Opcache
No changes.
New Constants
Adds a single integer constant to the global namespace: XML_OPTION_PARSE_HUGE ( = 5). Intended to be used only inside ext/xml.
php.ini Defaults
No changes.
Open Issues
None yet.
Unaffected PHP Functionality
Everything outside of ext/xml.
Future Scope
None yet.
Proposed Voting Choices
One primary vote (requires 2/3 majority): add PHP_XML_OPTION_PARSE_HUGE?
Patches and Tests
Implementation: https://github.com/php/php-src/pull/12256
Implementation
After the project is implemented, this section should contain
- the version(s) it was merged into
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
- a link to the language specification section (if any)
References
Links to external references, discussions or RFCs
Rejected Features
Keep this updated with features that were discussed on the mail lists.