rfc:escaper
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:escaper [2012/09/18 13:00] – Added XML escaper aliases padraic | rfc:escaper [2013/09/27 04:12] – [SPL Class or Functions?] Split into sections yohgaki | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Escaping RFC for PHP Core ====== | ====== Escaping RFC for PHP Core ====== | ||
- | * Version: 1.0 | + | * Version: 1.0.1 |
* Date: 2012-09-18 | * Date: 2012-09-18 | ||
- | * Author: Pádraic < | + | * Author: Pádraic |
* Status: Under Discussion | * Status: Under Discussion | ||
* First Published at: http:// | * First Published at: http:// | ||
+ | |||
+ | ===== Change Log ===== | ||
+ | * 2013-09-27 Added ext/filter implementation as an option (Yasuo) | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | This RFC proposes the addition of an SPL class (and optionally a set of functions) dedicated to the secure escaping of untrusted values against Cross-Site Scripting (XSS) and related vulnerabilities. It recognises that this involves the partial duplication of certain existing functions but raises the argument that the current division of functionality, | + | This RFC proposes the addition of an SPL class (and optionally a set of functions) dedicated to the secure escaping of untrusted values against Cross-Site Scripting (XSS) and related vulnerabilities. It recognises that this involves the partial duplication of certain existing functions but raises the argument that the current division of functionality, |
+ | |||
+ | The [[https:// | ||
+ | |||
+ | The proposed functionality is intended to largely reflect the recommendations of the OWASP' | ||
- | The proposed | + | The precise method of escaping |
- | A similar approach has already been taken in PHP code by Zend Framework 2.0 (Zend\Escaper) and, just recently, Symfony 2 (via Twig) adopted this functionality. While this can be done in PHP by individual frameworks, | + | A similar approach has already been taken in PHP code by Zend Framework 2.0 (Zend\Escaper) and, just recently, Symfony 2 (via Twig) adopted this functionality. While this can be done in PHP by individual frameworks, |
===== The Problem With Inconsistent Functionality ===== | ===== The Problem With Inconsistent Functionality ===== | ||
Line 24: | Line 31: | ||
* URL/URI: rawurlencode() or urlencode() | * URL/URI: rawurlencode() or urlencode() | ||
- | In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted as offering sufficient escaping safety, than it does on what is recommended in reality to defend against XSS. While these functions can prevent some forms of XSS, they do not cover all use cases or risks. | + | In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted as offering sufficient escaping safety, than it does on what is recommended in reality to defend against XSS. While these functions can prevent some forms of XSS, they do not cover all use cases or risks and are therefore insufficient defenses. |
Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is completely useless since the value can be terminated by a space (among other things) which is never escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern HTML specification, | Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is completely useless since the value can be terminated by a space (among other things) which is never escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern HTML specification, | ||
- | Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, and misrepresentations of what functions are capable of by some programmers - these all make escaping in PHP an unnecessarily convoluted quest for those who just want an escaping function that works across all HTML contexts. | + | Using addslashes(), |
- | Including more narrowly defined | + | Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, |
- | ===== SPL Class or Functions? ===== | + | Including more narrowly defined and specifically targeted functions |
- | While it may well be feasible | + | ===== Escape filter as ext/filter ===== |
+ | |||
+ | Implementation option as filter. | ||
+ | |||
+ | ^ ID(Constant) ^ Name ^ Options ^ Description ^ | ||
+ | |FILTER_ESCAPE_HTML |" | ||
+ | |FILTER_ESCAPE_HTML_ATTR |" | ||
+ | |FILTER_ESCAPE_JAVASCRIPT |" | ||
+ | |FILTER_ESCAPE_CSS |" | ||
+ | |FILTER_ESCAPE_URI |" | ||
+ | |FILTER_ESCAPE_XML |" | ||
+ | |FILTER_ESCAPE_XML_ATTR |" | ||
+ | |||
+ | |||
+ | ===== SPL Class ===== | ||
+ | |||
+ | While it may well be advisable | ||
<code php> | <code php> | ||
- | interface | + | interface |
{ | { | ||
public function __construct($encoding = ' | public function __construct($encoding = ' | ||
Line 58: | Line 81: | ||
| | ||
public function escapeXmlAttr($value); | public function escapeXmlAttr($value); | ||
+ | |||
+ | public function getEncoding(); | ||
} | } | ||
</ | </ | ||
- | Functions may be added along the following lines: | + | The benefits of the class are to allow the centralised setting of a character encoding once and then being able to pass around the object across an entire application or library allowing it to be configured from a single location. This could be created in userland PHP around a set of functions but it seems silly to skip an obviously beneficial step to users. |
- | * escape_html($value, | + | ===== Functions ===== |
- | * escape_html_attribute($value, $encoding); | + | Functions may then be added along the following lines (names up for discussion): |
+ | * escape_html($value, | ||
+ | * escape_html_attribute($value, | ||
* escape_javascipt($value, | * escape_javascipt($value, | ||
- | |||
* escape_css($value, | * escape_css($value, | ||
- | |||
* escape_url($value, | * escape_url($value, | ||
- | |||
* escape_xml($value, | * escape_xml($value, | ||
- | |||
* escape_xml_attribute($value, | * escape_xml_attribute($value, | ||
- | I am strongly opposed to allowing these functions accept unpredictable character encoding directives via php.ini. That would require additional work to validate which is precisely what this RFC should seek to avoid. | + | ===== Implementation Notes ===== |
- | I have assumed that the character | + | I am strongly opposed to allowing these functions accept unpredictable |
- | The functions/ | + | As there is no means of globally configuring a character encoding allowed in this RFC proposal since it promotes unconfigurable-default assumptions (already evidenced by existing |
+ | |||
+ | I have assumed that the character encodings supported are limited to those presently allowed by htmlspecialchars() and that the internals of each method or function validate this fact or throw an Exception (or an error for function calls) to prevent continued insecure execution as is currently allowed by htmlspecialchars(). See links below. | ||
The following is a sample implementation in PHP from Zend Framework 2.0: | The following is a sample implementation in PHP from Zend Framework 2.0: | ||
Line 89: | Line 114: | ||
Symfony' | Symfony' | ||
https:// | https:// | ||
- | |||
===== Class Method Dissection ===== | ===== Class Method Dissection ===== | ||
Line 96: | Line 120: | ||
==== escapeHtml ==== | ==== escapeHtml ==== | ||
- | The escapeHtml() function is basically identical to htmlspecialchars() but provides a few additional tweaks (validating encoding option, ceasing execution where invalid encoding detected, etc.). It assumes a default encoding of UTF-8 and behaves as if the ENT_QUOTES and ENT_SUBTITUTE flags were both set. As it would not accept a Doctype flag, escaping is done to the lowest common denominator. | + | The escapeHtml() function is basically identical to htmlspecialchars() but provides a few additional tweaks (validating encoding option, ceasing execution where invalid encoding detected, etc.). It assumes a default encoding of UTF-8 and behaves as if the ENT_QUOTES and ENT_SUBTITUTE flags were both set. As it would not accept a Doctype flag, escaping is done to the lowest common denominator |
==== escapeHtmlAttr ==== | ==== escapeHtmlAttr ==== | ||
- | Typical HTML escaping can replace this method, but only if the attribute value can be guaranteed as being properly quoted. Where quoting is not guaranteed, this method performs additional escaping that escapes all space characters and their equivalents. In effect, this means escaping everything except basic alphanumeric characters and the comma, period, hyhen and underscore characters. Anything else will be escaped as a hexadecimal entity unless a valid name entity can be substituted. | + | Typical HTML escaping can replace this method but only if the attribute value can be guaranteed as being properly quoted. Where quoting is not guaranteed, this method performs additional escaping that escapes all space characters and their equivalents |
==== escapeJs ==== | ==== escapeJs ==== | ||
Javascript string literals in HTML are subject to significant restrictions particularly due to the potential for unquoted attributes and any uncertainty as to whether Javascript will be viewed as being CDATA or PCDATA by the browser. To eliminate any possible XSS vulnerabilities, | Javascript string literals in HTML are subject to significant restrictions particularly due to the potential for unquoted attributes and any uncertainty as to whether Javascript will be viewed as being CDATA or PCDATA by the browser. To eliminate any possible XSS vulnerabilities, | ||
+ | |||
+ | Javascript escaping applies to all literal strings and digits. It is not possible to safely escape other Javascript markup. | ||
==== escapeCss ==== | ==== escapeCss ==== | ||
- | CSS is almost identical | + | CSS is similar |
+ | |||
+ | CSS escaping applies to property values, e.g. a colour or font size. Where CSS is being manipulated further by adding new properies or names, it must be seperately sanitised. | ||
==== escapeUrl ==== | ==== escapeUrl ==== | ||
This method is basically an alias for rawurlencode() which has applied RFC 3986 since PHP 5.3. It is included primarily for consistency. | This method is basically an alias for rawurlencode() which has applied RFC 3986 since PHP 5.3. It is included primarily for consistency. | ||
+ | |||
+ | URL escaping applies to data being inserted into a URL and not to the whole URL itself. | ||
==== escapeXml/ | ==== escapeXml/ | ||
Line 125: | Line 155: | ||
Similar in nature, there are frequent lapses of awareness surrounding Javascript escaping. Backslash escaping and JSON encoding usually leave behind literal characters that can be misinterpreted by a HTML parser so the restrictive escaping strategy for Javascript values described earlier becomes necessary. | Similar in nature, there are frequent lapses of awareness surrounding Javascript escaping. Backslash escaping and JSON encoding usually leave behind literal characters that can be misinterpreted by a HTML parser so the restrictive escaping strategy for Javascript values described earlier becomes necessary. | ||
+ | |||
+ | The point of these two mentions is to make it clear that currently PHP may offer related functions for preventing XSS but these do not have the coverage or safety required of recommended practices. The RFC is not a case of ignoring existing functions, it simply proposes replacements and additions that are reliable, safe, in line with OWASP recommendations, | ||
===== Implementation for PHP Core? ===== | ===== Implementation for PHP Core? ===== |
rfc/escaper.txt · Last modified: 2018/06/18 10:11 by cmb