rfc:escaper
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:escaper [2012/09/18 12:49] – padraic | rfc:escaper [2018/06/18 10:11] (current) – This RFC appears to be inactive cmb | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Escaping RFC for PHP Core ====== | ====== Escaping RFC for PHP Core ====== | ||
- | * Version: 1.0 | + | * Version: 1.0.1 |
* Date: 2012-09-18 | * Date: 2012-09-18 | ||
- | * Author: Pádraic < | + | * Author: Pádraic |
- | * Status: | + | * Status: |
* First Published at: http:// | * First Published at: http:// | ||
+ | |||
+ | ===== Change Log ===== | ||
+ | * 2012-09-18 Initial version edited from https:// | ||
+ | * 2013-09-27 Added ext/filter implementation as an option (Yasuo) | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | This RFC proposes the addition of an SPL class (and optionally a set of functions) dedicated to the secure escaping of untrusted values against Cross-Site Scripting (XSS) and related vulnerabilities. It recognises that this involves the partial duplication of certain existing functions but raises the argument that the current division of functionality, | + | This RFC proposes the addition of an SPL class (and optionally a set of functions) dedicated to the secure escaping of untrusted values against Cross-Site Scripting (XSS) and related vulnerabilities. It recognises that this involves the partial duplication of certain existing functions but raises the argument that the current division of functionality, |
+ | |||
+ | The [[https:// | ||
+ | |||
+ | The proposed functionality is intended to largely reflect the recommendations of the OWASP' | ||
- | The proposed | + | The precise method of escaping |
- | A similar approach has already been taken in PHP code by Zend Framework 2.0 (Zend\Escaper) and, just recently, Symfony 2 (via Twig) adopted this functionality. While this can be done in PHP by individual frameworks, | + | A similar approach has already been taken in PHP code by Zend Framework 2.0 (Zend\Escaper) and, just recently, Symfony 2 (via Twig) adopted this functionality. While this can be done in PHP by individual frameworks, |
===== The Problem With Inconsistent Functionality ===== | ===== The Problem With Inconsistent Functionality ===== | ||
Line 24: | Line 32: | ||
* URL/URI: rawurlencode() or urlencode() | * URL/URI: rawurlencode() or urlencode() | ||
- | In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted as offering sufficient escaping safety, than it does on what is recommended in reality to defend against XSS. While these functions can prevent some forms of XSS, they do not cover all use cases or risks. | + | In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted as offering sufficient escaping safety, than it does on what is recommended in reality to defend against XSS. While these functions can prevent some forms of XSS, they do not cover all use cases or risks and are therefore insufficient defenses. |
Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is completely useless since the value can be terminated by a space (among other things) which is never escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern HTML specification, | Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is completely useless since the value can be terminated by a space (among other things) which is never escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern HTML specification, | ||
- | Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, and misrepresentations of what functions are capable of by some programmers - these all make escaping in PHP an unnecessarily convoluted quest for those who just want an escaping function that works across all HTML contexts. | + | Using addslashes(), |
- | Including more narrowly defined | + | Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, |
- | ===== SPL Class or Functions? ===== | + | Including more narrowly defined and specifically targeted functions |
- | While it may well be feasible | + | ===== Escape filter for ext/filter ===== |
+ | |||
+ | Implementation option as filter. | ||
+ | |||
+ | ^ ID(Constant) ^ Name ^ Options ^ Description ^ | ||
+ | |FILTER_ESCAPE_HTML |" | ||
+ | |FILTER_ESCAPE_HTML_ATTR |" | ||
+ | |FILTER_ESCAPE_JAVASCRIPT |" | ||
+ | |FILTER_ESCAPE_CSS |" | ||
+ | |FILTER_ESCAPE_URI |" | ||
+ | |FILTER_ESCAPE_XML |" | ||
+ | |FILTER_ESCAPE_XML_ATTR |" | ||
+ | |||
+ | |||
+ | ===== SPL Class ===== | ||
+ | |||
+ | While it may well be advisable | ||
<code php> | <code php> | ||
- | interface | + | interface |
{ | { | ||
public function __construct($encoding = ' | public function __construct($encoding = ' | ||
Line 50: | Line 74: | ||
public function escapeUrl($value); | public function escapeUrl($value); | ||
+ | | ||
+ | /** | ||
+ | * Aliases to HTML functions for semantic value. | ||
+ | * XML escaping is identical to HTML escaping in this RFC. | ||
+ | */ | ||
+ | public function escapeXml($value); | ||
+ | | ||
+ | public function escapeXmlAttr($value); | ||
+ | |||
+ | public function getEncoding(); | ||
} | } | ||
</ | </ | ||
- | Functions may be added along the following lines: | + | The benefits of the class are to allow the centralised setting of a character encoding once and then being able to pass around the object across an entire application or library allowing it to be configured from a single location. This could be created in userland PHP around a set of functions but it seems silly to skip an obviously beneficial step to users. |
+ | |||
+ | ===== Functions ===== | ||
+ | |||
+ | Functions may then be added along the following lines (names up for discussion): | ||
* escape_html($value, | * escape_html($value, | ||
- | |||
* escape_html_attribute($value, | * escape_html_attribute($value, | ||
- | |||
* escape_javascipt($value, | * escape_javascipt($value, | ||
- | |||
* escape_css($value, | * escape_css($value, | ||
- | |||
* escape_url($value, | * escape_url($value, | ||
+ | * escape_xml($value, | ||
+ | * escape_xml_attribute($value, | ||
- | I am strongly opposed to allowing these functions accept unpredictable character encoding directives via php.ini. That would require additional work to validate which is precisely what this RFC should seek to avoid. | + | ===== Implementation Notes ===== |
- | I have assumed that the character | + | IMPORTANT: Since proper escape requires proper |
- | The functions/methods don't drastically depart from htmlspecialchars(). The class API is the real advantage. The second parameter is not optional. | + | I am strongly opposed to allowing these functions |
+ | |||
+ | As there is no means of globally configuring a character encoding allowed in this RFC proposal since it promotes unconfigurable-default assumptions (already evidenced by existing htmlspecialchars() usage - [[https:// | ||
+ | |||
+ | I have assumed that the character encodings supported are limited to those presently allowed by htmlspecialchars() and that the internals of each method or function validate this fact or throw an Exception (or an error for function calls) to prevent continued insecure execution as is currently allowed by htmlspecialchars(). See links below. | ||
The following is a sample implementation in PHP from Zend Framework 2.0: | The following is a sample implementation in PHP from Zend Framework 2.0: | ||
Line 77: | Line 117: | ||
Symfony' | Symfony' | ||
https:// | https:// | ||
- | |||
===== Class Method Dissection ===== | ===== Class Method Dissection ===== | ||
Line 84: | Line 123: | ||
==== escapeHtml ==== | ==== escapeHtml ==== | ||
- | The escapeHtml() function is basically identical to htmlspecialchars() but provides a few additional tweaks (validating encoding option, ceasing execution where invalid encoding detected, etc.). It assumes a default encoding of UTF-8 and behaves as if the ENT_QUOTES and ENT_SUBTITUTE flags were both set. As it would not accept a Doctype flag, escaping is done to the lowest common denominator. | + | The escapeHtml() function is basically identical to htmlspecialchars() but provides a few additional tweaks (validating encoding option, ceasing execution where invalid encoding detected, etc.). It assumes a default encoding of UTF-8 and behaves as if the ENT_QUOTES and ENT_SUBTITUTE flags were both set. As it would not accept a Doctype flag, escaping is done to the lowest common denominator |
==== escapeHtmlAttr ==== | ==== escapeHtmlAttr ==== | ||
- | Typical HTML escaping can replace this method, but only if the attribute value can be guaranteed as being properly quoted. Where quoting is not guaranteed, this method performs additional escaping that escapes all space characters and their equivalents. In effect, this means escaping everything except basic alphanumeric characters and the comma, period, hyhen and underscore characters. Anything else will be escaped as a hexadecimal entity unless a valid name entity can be substituted. | + | Typical HTML escaping can replace this method but only if the attribute value can be guaranteed as being properly quoted. Where quoting is not guaranteed, this method performs additional escaping that escapes all space characters and their equivalents |
==== escapeJs ==== | ==== escapeJs ==== | ||
Javascript string literals in HTML are subject to significant restrictions particularly due to the potential for unquoted attributes and any uncertainty as to whether Javascript will be viewed as being CDATA or PCDATA by the browser. To eliminate any possible XSS vulnerabilities, | Javascript string literals in HTML are subject to significant restrictions particularly due to the potential for unquoted attributes and any uncertainty as to whether Javascript will be viewed as being CDATA or PCDATA by the browser. To eliminate any possible XSS vulnerabilities, | ||
+ | |||
+ | Javascript escaping applies to all literal strings and digits. It is not possible to safely escape other Javascript markup. | ||
==== escapeCss ==== | ==== escapeCss ==== | ||
- | CSS is almost identical | + | CSS is similar |
+ | |||
+ | CSS escaping applies to property values, e.g. a colour or font size. Where CSS is being manipulated further by adding new properies or names, it must be seperately sanitised. | ||
==== escapeUrl ==== | ==== escapeUrl ==== | ||
This method is basically an alias for rawurlencode() which has applied RFC 3986 since PHP 5.3. It is included primarily for consistency. | This method is basically an alias for rawurlencode() which has applied RFC 3986 since PHP 5.3. It is included primarily for consistency. | ||
+ | |||
+ | URL escaping applies to data being inserted into a URL and not to the whole URL itself. | ||
+ | |||
+ | ==== escapeXml/ | ||
+ | |||
+ | Since the escapeHtml method uses a common denominator escaping strategy to cover the XML serialisation of HTML5, escapeXml and escapeXmlAttr are functionally equivalent aliases for the sake of being explicit. | ||
===== Finding Holes For XSS In Existing Functions ===== | ===== Finding Holes For XSS In Existing Functions ===== | ||
Line 109: | Line 158: | ||
Similar in nature, there are frequent lapses of awareness surrounding Javascript escaping. Backslash escaping and JSON encoding usually leave behind literal characters that can be misinterpreted by a HTML parser so the restrictive escaping strategy for Javascript values described earlier becomes necessary. | Similar in nature, there are frequent lapses of awareness surrounding Javascript escaping. Backslash escaping and JSON encoding usually leave behind literal characters that can be misinterpreted by a HTML parser so the restrictive escaping strategy for Javascript values described earlier becomes necessary. | ||
+ | |||
+ | The point of these two mentions is to make it clear that currently PHP may offer related functions for preventing XSS but these do not have the coverage or safety required of recommended practices. The RFC is not a case of ignoring existing functions, it simply proposes replacements and additions that are reliable, safe, in line with OWASP recommendations, | ||
===== Implementation for PHP Core? ===== | ===== Implementation for PHP Core? ===== | ||
Line 118: | Line 169: | ||
The essence of this RFC is to propose including basic safe escaping functionality within PHP which addresses the need to apply context-specific escaping in web applications. By offering a simple consistent approach, it affords the opportunity to implement these specifically to target XSS and to omit other functionality that some native functions include, and which can be problematic to programmers or doesn' | The essence of this RFC is to propose including basic safe escaping functionality within PHP which addresses the need to apply context-specific escaping in web applications. By offering a simple consistent approach, it affords the opportunity to implement these specifically to target XSS and to omit other functionality that some native functions include, and which can be problematic to programmers or doesn' | ||
- | ===== Changelog ===== | ||
- | * 2012-09-18 Initial version edited from https:// |
rfc/escaper.1347972572.txt.gz · Last modified: 2017/09/22 13:28 (external edit)