rfc:decode_html
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:decode_html [2024/08/25 22:04] – Add basic description of proposed implementation. dmsnell | rfc:decode_html [2024/09/06 18:58] (current) – Fix list bullet syntax. dmsnell | ||
---|---|---|---|
Line 120: | Line 120: | ||
* The replaced values for character references are UTF-8. | * The replaced values for character references are UTF-8. | ||
* The list of named character references is non-configurable. | * The list of named character references is non-configurable. | ||
- | * Calling code //must// indicate | + | * Calling code //must// indicate |
* The passed input is assumed to be the entire contents of the attribute value or text node - not a truncation thereof. | * The passed input is assumed to be the entire contents of the attribute value or text node - not a truncation thereof. | ||
- | This RFC does not propose solving | + | A new enum specifies supported HTML contexts. For the most part the enum specifies three internal properties: |
+ | |||
+ | * Are character references decoded? | ||
+ | * Are ambiguous ampersand references interpreted? | ||
+ | * Are NULL bytes replace or removed? | ||
+ | |||
+ | While these could be handled via three boolean flags, that would require developers to understand the nuances involved in the different situations where they imply. By focusing the API on the kind of situations developers work in, the burden is removed to know the internal details | ||
+ | |||
+ | <code php> | ||
+ | enum HtmlContext { | ||
+ | // A complete attribute value, single-quoted, double-quoted, or unquoted. | ||
+ | case Attribute; | ||
+ | |||
+ | // DATA content between tags: normal HTML text inside a BODY element. | ||
+ | case BodyText; | ||
+ | |||
+ | // Like BodyText, but found inside an SVG or MathML element where NULL bytes are treated differently. | ||
+ | case ForeignText; | ||
+ | |||
+ | // Text content inside of a SCRIPT element; nothing is escaped other than NULL bytes. | ||
+ | case Script; | ||
+ | |||
+ | // Identical to Script but left as a convenience/ | ||
+ | case Style; | ||
+ | |||
+ | // Identical to Script but left as a convenience/ | ||
+ | case Comment; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | A few more contexts //could// exist, namely for elements where text content is not allowed ('' | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== |
rfc/decode_html.1724623457.txt.gz · Last modified: 2024/08/25 22:04 by dmsnell