rfc:escaper

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:escaper [2012/09/18 10:52] padraicrfc:escaper [2018/06/18 10:11] (current) – This RFC appears to be inactive cmb
Line 1: Line 1:
 ====== Escaping RFC for PHP Core ====== ====== Escaping RFC for PHP Core ======
-  * Version: 1.0+  * Version: 1.0.1
   * Date: 2012-09-18   * Date: 2012-09-18
-  * Author: Pádraic <padraic.brady.at.gmail.com> +  * Author: Pádraic Brady <padraic.brady.at.gmail.com>, Yasuo Ohgaki <yohgaki@php.net
-  * Status: Under Discussion+  * Status: Inactive
   * First Published at: http://wiki.php.net/rfc/escaper   * First Published at: http://wiki.php.net/rfc/escaper
 +
 +===== Change Log =====
 +  * 2012-09-18 Initial version edited from https://gist.github.com/gists/3066656
 +  * 2013-09-27 Added ext/filter implementation as an option (Yasuo)
  
 ===== Introduction ===== ===== Introduction =====
  
-This RFC proposes the addition of a set of standard functions and/or an SPL class dedicated to the secure escaping of untrusted values against Cross-Site Scripting (XSS) and related vulnerabilities. It recognises that this involves the partial duplication of certain existing functions but raises the argument that the current division of functionality, the disparate behaviour of that functionality and varied misunderstandings among programmers have served to enable insecure practices in the absence of a unified approach in this area.+This RFC proposes the addition of an SPL class (and optionally a set of functionsdedicated to the secure escaping of untrusted values against Cross-Site Scripting (XSS) and related vulnerabilities. It recognises that this involves the partial duplication of certain existing functions but raises the argument that the current division of functionality, the disparate behaviour of that functionality and varied misunderstandings among programmers have served to enable widespread insecure practices in the absence of a unified approach in this area
 + 
 +The [[https://www.owasp.org/index.php/Top_10_2010-Main|OWASP Top 10 web security risks]] study lists XSS in second place. PHP's sole functionality against XSS is limited to two functions of which one is commonly misapplied so, while taking some steps towards preventing XSS, PHP's anti-XSS measures are clearly insufficient and need to be boosted in line with recommended practice. 
 + 
 +The proposed functionality is intended to largely reflect the recommendations of the OWASP's various [[https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet|XSS Cheat Sheets]] by offering a comprehensive set of simple escaping class methods and functions specific to the most common HTML contexts: HTML Body, HTML Attribute, Javascript, CSS and URL/URI.
  
-The proposed functionality is intended to largely reflect the recommendations of the OWASP's various [https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet XSS Cheat Sheets] by offering comprehensive set of simple escaping functions or class methods specific to the most common HTML contexts: HTML Body, HTML Attribute, Javascript, CSS and URL/URI.+The precise method of escaping proposed by this RFC is defined by OWASP and implemented in its own peer-reviewed ESAPI libraries (one of which is implemented in C)The escaping in each case follows well known and fixed set of encoding rules for each key HTML context which cannot be impacted or negated by browser quirks or edge-case HTML parsing unless the browser suffers a catastrophic bug in a HTML parser or Javascript interpreter - both of these are unlikely.
  
-A similar approach has already been taken in PHP code by Zend Framework 2.0 (Zend\Escaper) and, just recently, Symfony 2 (via Twig) adopted this functionality. While this can be done in PHP by individual frameworks, it would be far more useful to have such escaping mechanisms available to everyone in core PHP, both from a performance perspective and to help standardise the current hodgepodge of practices that have arisen.+A similar approach has already been taken in PHP code by Zend Framework 2.0 (Zend\Escaper) and, just recently, Symfony 2 (via Twig) adopted this functionality. While this can be done in PHP by individual frameworks, this has never historically been the case and programmers remain reliant on PHP to provide this functionality.
  
 ===== The Problem With Inconsistent Functionality ===== ===== The Problem With Inconsistent Functionality =====
Line 24: Line 32:
   * URL/URI: rawurlencode() or urlencode()   * URL/URI: rawurlencode() or urlencode()
  
-In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted as offering sufficient escaping safety, than it does on what is recommended in reality to defend against XSS. While these functions can prevent XSS, they do not cover all use cases or risks.+In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted as offering sufficient escaping safety, than it does on what is recommended in reality to defend against XSS. While these functions can prevent some forms of XSS, they do not cover all use cases or risks and are therefore insufficient defenses.
  
 Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is completely useless since the value can be terminated by a space (among other things) which is never escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern HTML specification, with no specific function available to cover this use case. While it's tempting to blame users, or the HTML specification authors, escaping just needs to deal with whatever HTML and browsers allow. Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is completely useless since the value can be terminated by a space (among other things) which is never escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern HTML specification, with no specific function available to cover this use case. While it's tempting to blame users, or the HTML specification authors, escaping just needs to deal with whatever HTML and browsers allow.
  
-Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, and misrepresentations of what functions are capable of by some programmers - these all make escaping in PHP an unnecessarily convoluted quest for those who just want an escaping function that works across all HTML contexts.+Using addslashes(), custom backslash escaping or json_encode() will typically ignore HTML special characters such as ampersands which may be used to inject entities into Javascript. Under the right circumstancesbrowser will convert these entities into their literal equivalents before interpreting Javascript thus allowing attackers to inject arbitrary code.
  
-Including more narrowly defined and specifically targeted functions or SPL class methods into PHP will simplify the whole situation for users, offer a cohesive approach to escaping, and, by its presence in Core, discourage function misuse and homegrown escaping functions.+Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, and misrepresentations of what functions are capable of by some programmers - these all make escaping in PHP an unnecessarily convoluted quest for those who just want an escaping function that works across all HTML contexts. See links below.
  
-===== SPL Class or Functions? =====+Including more narrowly defined and specifically targeted functions or SPL class methods into PHP will simplify the whole situation for users, offer a cohesive approach to escaping, rectify PHP's situation as only offering a partial XSS defense and, by its presence in Core, displace function misuse and homegrown escaping functions.
  
-While it may well be feasible to do both, I have a strong preference for classes and would suggest a class structure that implements the following interface:+===== Escape filter for ext/filter ===== 
 + 
 +Implementation option as filter. 
 + 
 +^ ID(Constant) ^ Name ^ Options ^ Description ^ 
 +|FILTER_ESCAPE_HTML |"escape_html"|Escape HTML document | | 
 +|FILTER_ESCAPE_HTML_ATTR |"escape_html_attr" |Escape HTML tag attribute | | 
 +|FILTER_ESCAPE_JAVASCRIPT |"escape_javascript" |Escape JavaScript string | | 
 +|FILTER_ESCAPE_CSS |"escape_css" |Escape CSS attribute | | 
 +|FILTER_ESCAPE_URI |"escape_uri" |Escape URI parameters | | 
 +|FILTER_ESCAPE_XML |"escape_xml" |Escape XML document |Alias of FILTER_ESCAPE_HTML | 
 +|FILTER_ESCAPE_XML_ATTR |"escape_xml_attr" |Escape XML tag attribute | | 
 + 
 + 
 +===== SPL Class ===== 
 + 
 +While it may well be advisable to do both, I have a strong preference for classes coming from a framework heavily dependent on them and would suggest a class structure that implements the following interface in addition to any standalone functions:
  
 <code php> <code php>
-    interface Escaper+    interface SPL_Escaper
     {     {
         public function __construct($encoding = 'UTF-8');         public function __construct($encoding = 'UTF-8');
Line 50: Line 74:
  
         public function escapeUrl($value);         public function escapeUrl($value);
 +        
 +        /**
 +         * Aliases to HTML functions for semantic value.
 +         * XML escaping is identical to HTML escaping in this RFC.
 +         */
 +        public function escapeXml($value);
 +        
 +        public function escapeXmlAttr($value);
  
-        public function validateUrl($value);+ public function getEncoding();
  
     }     }
 </code> </code>
  
-Functions may be added along the following lines:+The benefits of the class are to allow the centralised setting of a character encoding once and then being able to pass around the object across an entire application or library allowing it to be configured from a single location. This could be created in userland PHP around a set of functions but it seems silly to skip an obviously beneficial step to users.
  
-  * escape_html($value, $encoding);+===== Functions =====
  
-  * escape_html_attribute($value, $encoding);+Functions may then be added along the following lines (names up for discussion):
  
 +  * escape_html($value, $encoding);
 +  * escape_html_attribute($value, $encoding);
   * escape_javascipt($value, $encoding);   * escape_javascipt($value, $encoding);
- 
   * escape_css($value, $encoding);   * escape_css($value, $encoding);
- 
   * escape_url($value, $encoding);   * escape_url($value, $encoding);
 +  * escape_xml($value, $encoding);
 +  * escape_xml_attribute($value, $encoding);
 +
 +===== Implementation Notes =====
 +
 +IMPORTANT: Since proper escape requires proper character encoding handling, multibyte string feature in core is mandatory for implementation.
  
-I am strongly opposed to allowing these functions accept unpredictable character encoding directives via php.ini. That would require additional work to validate which is precisely what this RFC should seek to avoid.+I am strongly opposed to allowing these functions accept unpredictable character encoding directives via php.ini. That would require additional work to validate which is precisely what this RFC should seek to avoid. By validation, I mean having programmers determine how dependencies implement escaping, what encoding they enforce (usually the default), and then determining if it can be changed by the depending applications or if the library must be forked, re-edited, etc. Those who are concious of security will review dependencies for such issues rather than blindly trust dependencies.
  
-I have assumed that the character encodings supported are limited to those presently allowed by htmlspecialchars() and that the internals of each method or function validate this fact or throw an Exception (or an error for function callsto prevent continued (potentially vulnerable) execution as is currently allowed by htmlspecialchars().+As there is no means of globally configuring a character encoding allowed in this RFC proposal since it promotes unconfigurable-default assumptions (already evidenced by existing htmlspecialchars() usage - [[https://github.com/search?q=htmlspecialchars&repo=&langOverride=&start_value=1&type=Code&language=PHP|search Github]]), the second parameter to these functions is explicitly required and has no default value. This works to undo the common practice in PHP where htmlspecialchars() calls omit all or most of its optional parameters. An application containing anything from thousands to tens of thousands of such function calls is extremely difficult to reconfigure at a later date and abusing the notion that all character encodings are equivalent to UTF-8 for special characters is itself definitely subject to infrequent browser bugs (e.g. IE6 is susceptible to character deletion when UTF-8 strings are escaped to a ISO-8859 encoding).
  
-The functions/methods don't drastically depart from htmlspecialchars(). The class API is the real advantage. The second parameter is not optional.+I have assumed that the character encodings supported are limited to those presently allowed by htmlspecialchars() and that the internals of each method or function validate this fact or throw an Exception (or an error for function calls) to prevent continued insecure execution as is currently allowed by htmlspecialchars(). See links below.
  
 The following is a sample implementation in PHP from Zend Framework 2.0: The following is a sample implementation in PHP from Zend Framework 2.0:
Line 79: Line 117:
 Symfony's Twig also recently added similar escaping options: Symfony's Twig also recently added similar escaping options:
 https://github.com/fabpot/Twig/raw/master/lib/Twig/Extension/Core.php https://github.com/fabpot/Twig/raw/master/lib/Twig/Extension/Core.php
- 
 ===== Class Method Dissection ===== ===== Class Method Dissection =====
  
Line 86: Line 123:
 ==== escapeHtml ==== ==== escapeHtml ====
  
-The escapeHtml() function is basically identical to htmlspecialchars() but provides a few additional tweaks (validating encoding option, ceasing execution where invalid encoding detected, etc.). It assumes a default encoding of UTF-8 and behaves as if the ENT_QUOTES and ENT_SUBTITUTE flags were both set. As it would not accept a Doctype flag, escaping is done to the lowest common denominator.+The escapeHtml() function is basically identical to htmlspecialchars() but provides a few additional tweaks (validating encoding option, ceasing execution where invalid encoding detected, etc.). It assumes a default encoding of UTF-8 and behaves as if the ENT_QUOTES and ENT_SUBTITUTE flags were both set. As it would not accept a Doctype flag, escaping is done to the lowest common denominator which is XML. HTML5 itself has an XML serialisation which does not recognise any of the usual HTML named entities.
  
 ==== escapeHtmlAttr ==== ==== escapeHtmlAttr ====
  
-Typical HTML escaping can replace this methodbut only if the attribute value can be guaranteed as being properly quoted. Where quoting is not guaranteed, this method performs additional escaping that escapes all space characters and their equivalents. In effect, this means escaping everything except basic alphanumeric characters and the comma, period, hyhen and underscore characters. Anything else will be escaped as a hexadecimal entity unless a valid name entity can be substituted.+Typical HTML escaping can replace this method but only if the attribute value can be guaranteed as being properly quoted. Where quoting is not guaranteed, this method performs additional escaping that escapes all space characters and their equivalents that might be used to break out of an attribute context in the absence of quotes. In effect, this means escaping everything except basic alphanumeric characters and the comma, period, hyhen and underscore characters. Anything else will be escaped as a hexadecimal entity unless a valid XML named entity can be substituted.
  
 ==== escapeJs ==== ==== escapeJs ====
  
 Javascript string literals in HTML are subject to significant restrictions particularly due to the potential for unquoted attributes and any uncertainty as to whether Javascript will be viewed as being CDATA or PCDATA by the browser. To eliminate any possible XSS vulnerabilities, Javascript escaping for HTML extends the escaping rules of both ECMAScript and JSON to include any potentially dangerous character. Very similar to HTML attribute value escaping, this means escaping everything except basic alphanumeric characters and the comma, period and underscore characters as hexadecimal or unicode escapes. Javascript string literals in HTML are subject to significant restrictions particularly due to the potential for unquoted attributes and any uncertainty as to whether Javascript will be viewed as being CDATA or PCDATA by the browser. To eliminate any possible XSS vulnerabilities, Javascript escaping for HTML extends the escaping rules of both ECMAScript and JSON to include any potentially dangerous character. Very similar to HTML attribute value escaping, this means escaping everything except basic alphanumeric characters and the comma, period and underscore characters as hexadecimal or unicode escapes.
 +
 +Javascript escaping applies to all literal strings and digits. It is not possible to safely escape other Javascript markup.
  
 ==== escapeCss ==== ==== escapeCss ====
  
-CSS is almost identical to Javascript for the same reasons. CSS escaping excludes only basic alphanumeric characters and escapes all other characters into valid CSS hexadecimal escapes.+CSS is similar to Javascript for the same reasons. CSS escaping excludes only basic alphanumeric characters and escapes all other characters into valid CSS hexadecimal escapes
 + 
 +CSS escaping applies to property values, e.g. a colour or font size. Where CSS is being manipulated further by adding new properies or names, it must be seperately sanitised.
  
 ==== escapeUrl ==== ==== escapeUrl ====
  
 This method is basically an alias for rawurlencode() which has applied RFC 3986 since PHP 5.3. It is included primarily for consistency. This method is basically an alias for rawurlencode() which has applied RFC 3986 since PHP 5.3. It is included primarily for consistency.
 +
 +URL escaping applies to data being inserted into a URL and not to the whole URL itself.
 +
 +==== escapeXml/escapeXmlAttr ====
 +
 +Since the escapeHtml method uses a common denominator escaping strategy to cover the XML serialisation of HTML5, escapeXml and escapeXmlAttr are functionally equivalent aliases for the sake of being explicit.
  
 ===== Finding Holes For XSS In Existing Functions ===== ===== Finding Holes For XSS In Existing Functions =====
Line 108: Line 155:
 In support of the inconsistency argument, I wrote a blog article a while ago about htmlspecialchars() and the circumstances of those use cases where its escaping functionality could be defeated: In support of the inconsistency argument, I wrote a blog article a while ago about htmlspecialchars() and the circumstances of those use cases where its escaping functionality could be defeated:
  
-[A Hitchhiker's Guide To XSS: How Not To Use Htmlspecialchars() For Output Escaping](http://blog.astrumfutura.com/2012/03/a-hitchhikers-guide-to-cross-site-scripting-xss-in-php-part-1-how-not-to-use-htmlspecialchars-for-output-escaping/)+[[http://blog.astrumfutura.com/2012/03/a-hitchhikers-guide-to-cross-site-scripting-xss-in-php-part-1-how-not-to-use-htmlspecialchars-for-output-escaping/|A Hitchhiker's Guide To XSS: How Not To Use Htmlspecialchars(For Output Escaping]]
  
 Similar in nature, there are frequent lapses of awareness surrounding Javascript escaping. Backslash escaping and JSON encoding usually leave behind literal characters that can be misinterpreted by a HTML parser so the restrictive escaping strategy for Javascript values described earlier becomes necessary. Similar in nature, there are frequent lapses of awareness surrounding Javascript escaping. Backslash escaping and JSON encoding usually leave behind literal characters that can be misinterpreted by a HTML parser so the restrictive escaping strategy for Javascript values described earlier becomes necessary.
 +
 +The point of these two mentions is to make it clear that currently PHP may offer related functions for preventing XSS but these do not have the coverage or safety required of recommended practices. The RFC is not a case of ignoring existing functions, it simply proposes replacements and additions that are reliable, safe, in line with OWASP recommendations, and are easy to use properly by programmers.
  
 ===== Implementation for PHP Core? ===== ===== Implementation for PHP Core? =====
rfc/escaper.1347965551.txt.gz · Last modified: 2017/09/22 13:28 (external edit)