rfc:xml_option_parse_huge
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
rfc:xml_option_parse_huge [2023/09/21 20:41] – created nielsdos | rfc:xml_option_parse_huge [2023/10/22 15:53] (current) – implemented nielsdos | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: PHP_XML_OPTION_PARSE_HUGE | + | ====== PHP RFC: XML_OPTION_PARSE_HUGE |
- | * Version: 0.9 | + | * Version: 0.9.1 |
* Date: 2023-09-21 | * Date: 2023-09-21 | ||
* Author: Niels Dossche, nielsdos@php.net | * Author: Niels Dossche, nielsdos@php.net | ||
- | * Status: | + | * Status: |
+ | * Implementation: | ||
* First Published at: https:// | * First Published at: https:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | ext/xml allows the user to parse XML in an event-driven way (SAX). The user can register callbacks to be called when certain nodes are encountered while parsing. In a sense, this is a streaming parsing model. | + | ext/xml allows the user to parse XML in an event-driven way (SAX). The user can register callbacks to be called when certain nodes are encountered while parsing. In a sense, this is a streaming parsing model: the user's callbacks are invoked while parsing is still happening. |
This RFC attempts to address a feature request on the old bugtracker: https:// | This RFC attempts to address a feature request on the old bugtracker: https:// | ||
- | One important piece of background information is that the ext/xml extension can work with two XML parsers: either libexpat, or libxml2. The latter is the most commonly used one of the two. Since libxml2 2.7.6 parsing large inputs is opt-in instead of allowed by default. This is to protect against denial of service attacks. However, this broke a valid use-case of < | + | Let's break it down more clearly: |
- | This proposal aims to solve these issues | + | First, it's important to note that the ext/xml extension can work with two different XML parsers: either libexpat or libxml2, with libxml2 being the more commonly used of the two. |
+ | |||
+ | Now, let's get to the actual issue: | ||
+ | |||
+ | Starting with libxml2 version 2.7.0 (https:// | ||
+ | There is a workaround for < | ||
+ | |||
+ | This proposal aims to solve this issue by introducing a new parser | ||
+ | |||
+ | (*) The definition of large is defined in [[https:// | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | It's possible to set parser options via <php>xml_set_parser_option</ | + | It's possible to set parser options via <php>xml_parser_set_option</ |
Internally, this option will pass XML_PARSE_HUGE to libxml2, allowing large documents to be parsed without resulting in a parse error. | Internally, this option will pass XML_PARSE_HUGE to libxml2, allowing large documents to be parsed without resulting in a parse error. | ||
If libexpat is used, this option will do nothing as libexpat does not block loading large documents anyway. | If libexpat is used, this option will do nothing as libexpat does not block loading large documents anyway. | ||
+ | |||
+ | It's worth noting that for extensions like SimpleXML and DOM extensions, you can run into the same problem. However, there you //do// have the option < | ||
+ | |||
+ | ==== Example Usage ==== | ||
+ | |||
+ | <PHP> | ||
+ | function startElement($parser, | ||
+ | // Do something interesting | ||
+ | } | ||
+ | function endElement($parser, | ||
+ | // Do something interesting | ||
+ | } | ||
+ | $parser = xml_parser_create(); | ||
+ | xml_parser_set_option($parser, | ||
+ | xml_set_element_handler($parser, | ||
+ | // Add more handlers | ||
+ | $success = xml_parse($parser, | ||
+ | </ | ||
+ | |||
+ | If you try to change the huge parsing option while parsing is busy, e.g. in one of the callback handlers, and < | ||
+ | Example: | ||
+ | |||
+ | <PHP> | ||
+ | <?php | ||
+ | function startElement($parser, | ||
+ | xml_parser_set_option($parser, | ||
+ | } | ||
+ | function endElement($parser, | ||
+ | // Do something interesting | ||
+ | } | ||
+ | $parser = xml_parser_create(); | ||
+ | xml_parser_set_option($parser, | ||
+ | xml_set_element_handler($parser, | ||
+ | // Add more handlers | ||
+ | $success = xml_parse($parser, | ||
+ | </ | ||
+ | |||
+ | Results in: | ||
+ | Fatal error: Uncaught Error: Cannot change option XML_OPTION_PARSE_HUGE while parsing in example.php: | ||
+ | |||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | No BC breaks unless the user defined a global constant XML_OPTION_PARSE_HUGE themselves. | + | No BC breaks unless the user defined a global constant |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 65: | Line 115: | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | One primary vote (requires 2/3 majority): add PHP_XML_OPTION_PARSE_HUGE? | + | One primary vote (requires 2/3 majority): add XML_OPTION_PARSE_HUGE parsing option? |
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 72: | Line 127: | ||
===== Implementation ===== | ===== Implementation ===== | ||
- | After the project is implemented, | + | |
- | - the version(s) it was merged | + | Merged |
- | | + | |
- | - a link to the PHP manual entry for the feature | + | ===== Changelog ===== |
- | - a link to the language specification section (if any) | + | |
+ | * 0.9.1: Fixed libxml2 version, clarified limit, added code sample, linked | ||
+ | * 0.9.0: First version under discussion | ||
===== References ===== | ===== References ===== |
rfc/xml_option_parse_huge.1695328869.txt.gz · Last modified: 2023/09/21 20:41 by nielsdos