rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/12/26 11:41] – New examples, with a focus on escaping craigfrancis | rfc:is_literal [2021/04/19 13:44] – Updated examples, and general tweaks craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.5 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 11: | Line 11: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | This RFC proposes a new function, |
- | As in, at runtime, being able to check if a variable has been created by literals, | + | This addresses some of the same use cases as "taint flags", but is both simpler and stricter. It does not address how user data is transmitted or escaped, only whether it has been passed |
- | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | + | The clearest example is a database library which supports parametrised queries at the driver level, where the programmer could use either |
- | ===== The Problem ===== | + | <code php> |
+ | $db-> | ||
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | $db-> |
+ | </ | ||
- | The vast majority of programmers should never do this (mistakes will be made). | + | By rejecting the SQL that was not written as a literal |
- | Unsafe values (often user supplied) **must** be kept separate (e.g. parameterised SQL), or be processed by something that understands the context (e.g. a HTML Templating Engine). | + | ===== Examples ===== |
- | This is primarily for security reasons, but it can also cause data to be damaged (e.g. ASCII/UTF-8 issues). | + | The [[https:// |
- | Take these mistakes, where the value has come from the user: | + | <code php> |
+ | // INSECURE | ||
+ | $qb-> | ||
+ | | ||
+ | ->where('u.id = ' . $_GET[' | ||
+ | </ | ||
- | echo "< | + | The definition of the //where()// method could check with // |
- | Flawed, and unfortunately very common, a classic XSS vulnerability. | + | <code php> |
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
- | echo "< | + | Similarly, Twig allows [[https:// |
- | Flawed because the attribute value is not quoted, e.g. //$url = '/ | + | <code php> |
+ | // INSECURE | ||
+ | echo $twig-> | ||
+ | </code> | ||
- | echo "< | + | If // |
- | Flawed because // | + | <code php> |
+ | echo $twig-> | ||
+ | </code> | ||
- | echo '<a href="' | + | ===== Proposal ===== |
- | Flawed because | + | A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way other than string concatenation. |
- | | + | <code php> |
- | var url = "<? | + | is_literal(' |
- | </script> | + | |
- | Flawed because // | + | $a = 'Example'; |
+ | is_literal($a); // true | ||
- | echo '<a href="/path/? | + | is_literal(4); |
+ | is_literal(0.3); // true | ||
+ | is_literal(' | ||
- | Flawed because // | + | $a = 'A'; |
+ | $b = $a . ' | ||
+ | is_literal($b); | ||
- | < | + | is_literal($_GET[' |
- | Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. | + | is_literal(rand(0, 10)); // false |
- | Also flawed because some browsers | + | is_literal(sprintf(' |
+ | </code> | ||
- | example.html: | + | Note that there is no way to manually mark a string as a literal (i.e. no equivalent to //untaint()//); as soon as the value has been manipulated in any way, it is no longer marked as a literal. |
- | <img src={{ url }} alt='' | + | |
- | + | ||
- | $loader = new \Twig\Loader\FilesystemLoader('./templates/'); | + | |
- | $twig = new \Twig\Environment($loader, [' | + | |
- | + | ||
- | echo $twig-> | + | |
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ onerror=alert(1)' | + | See the [[https:// |
- | $sql = ' | + | ===== Comparison to Taint Tracking ===== |
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | Some languages implement a "taint flag" which tracks whether values are considered " |
- | $sql = ' | + | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. |
- | Flawed if ' | + | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. |
- | $sql = ' | + | ===== Previous Work ===== |
- | Flawed if 'SET NAMES latin1' | + | Google uses a [[https:// |
- | $parameters = " | + | As noted by [[https://news-web.php.net/ |
- | + | ||
- | | + | |
- | + | ||
- | mail(' | + | |
- | Flawed because it's not possible to safely escape values in // | + | And there is the [[https://wiki.php.net/rfc/sql_injection_protection|Automatic SQL Injection Protection]] RFC by Matt Tait, where this RFC uses a similar concept of the [[https:// |
- | + | ||
- | ===== Previous Solutions ===== | + | |
- | + | ||
- | [[https://github.com/laruence/taint|Taint extension]] by Xinchen Hui, but this approach explicitly allows escaping, which doesn' | + | |
- | + | ||
- | [[https:// | + | |
* " | * " | ||
Line 102: | Line 108: | ||
* Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | + | ||
- | ===== Related JavaScript Implementation ===== | + | |
- | + | ||
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | |
- | + | ||
- | https://github.com/ | + | |
- | https:// | + | |
- | + | ||
- | They are looking at " | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Literals are safe values, defined within the PHP scripts, for example: | + | |
- | + | ||
- | $a = ' | + | |
- | | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = sprintf(' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $c = count($ids); | + | |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); | + | |
- | + | ||
- | $limit = 10; | + | |
- | $a = 'LIMIT ' . ($limit + 1); | + | |
- | is_literal($a); | + | |
- | + | ||
- | This uses a similar definition of [[https:// | + | |
- | + | ||
- | Thanks to [[https:// | + | |
- | + | ||
- | And thanks to [[https:// | + | |
- | + | ||
- | Commands can be checked | + | |
- | + | ||
- | This approach allows all systems/ | + | |
- | + | ||
- | Unlike the Taint extension, there must **not** be an equivalent // | + | |
- | + | ||
- | ==== Solution: SQL Injection ==== | + | |
- | + | ||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | $users = $queryBuilder | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | // example.php? | + | |
- | + | ||
- | This mistake can be easily identified by: | + | |
- | + | ||
- | public function where($predicates) | + | |
- | { | + | |
- | if (function_exists(' | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | $users = R:: | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | $users = UserQuery:: | + | |
- | + | ||
- | The // | + | |
- | + | ||
- | ==== Solution: SQL Injection, Basic ==== | + | |
- | + | ||
- | A simple example: | + | |
- | $sql = ' | + | ===== Usage ===== |
- | + | ||
- | $result | + | |
- | Checked in the framework by: | + | By libraries: |
- | class db { | + | <code php> |
- | + | function | |
- | public | + | if (function_exists(' |
- | | + | |
- | | + | if ($level === 0) { |
- | throw new Exception(' | + | // Programmer aware, and is choosing to bypass |
- | } | + | } else if ($level === 1) { |
- | + | | |
- | | + | } else { |
- | | + | |
- | | + | |
- | | + | |
} | } | ||
- | | ||
} | } | ||
+ | } | ||
- | This also works with string concatenation: | + | function example($input) { |
+ | literal_check($input); | ||
+ | // ... | ||
+ | } | ||
- | define('TABLE', | + | example('hello'); // OK |
- | + | example(strtoupper('hello')); // Exception thrown: the result of strtoupper is a new, non-literal string | |
- | $sql = ' | + | </ |
- | + | ||
- | is_literal($sql); // Returns true | + | |
- | + | ||
- | $sql .= ' AND id = ' . $mysqli-> | + | |
- | + | ||
- | is_literal($sql); // Returns false | + | |
- | ==== Solution: | + | Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//: |
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | <code php> |
+ | $order_fields = [ | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ]; | ||
- | $order_fields = [ | + | $order_id = array_search(($_GET[' |
- | ' | + | |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | | + | |
- | + | ||
- | $sql = ' ORDER BY ' . $order_fields[$order_id]; | + | |
- | ==== Solution: SQL Injection, WHERE IN ==== | + | $sql = ' ORDER BY ' . $order_fields[$order_id]; |
+ | </ | ||
- | Most SQL strings can be a simple concatenations | + | Undefined number |
- | There needs to be a special case for // | + | <code php> |
- | + | function where_in_sql($count) | |
- | | + | $sql = '?'; |
- | + | | |
- | $sql = ' | + | $sql .= ',?'; |
- | + | ||
- | ==== Solution: CLI Injection ==== | + | |
- | + | ||
- | Rather than using functions such as: | + | |
- | + | ||
- | * //exec()// | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | Frameworks (or PHP) could introduce something similar to // | + | |
- | + | ||
- | Or, take a safe literal | + | |
- | + | ||
- | $output | + | |
- | ' | + | |
- | ]); | + | |
- | + | ||
- | Rough implementation: | + | |
- | + | ||
- | | + | |
- | + | ||
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset = 0; | + | |
- | | + | |
- | while (($pos = strpos($cmd, | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | + | ||
- | ==== Solution: HTML Injection ==== | + | |
- | + | ||
- | Template engines should receive variables separately from the raw HTML. | + | |
- | + | ||
- | Often the engine will get the HTML from static files (safe): | + | |
- | + | ||
- | $html = file_get_contents('/ | + | |
- | + | ||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | |
- | + | ||
- | $template_html = ' | + | |
- | < | + | |
- | < | + | |
- | + | ||
- | Where the variables are supplied separately, in this example I'm using XPath: | + | |
- | + | ||
- | $values = [ | + | |
- | '// | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | + | ||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | |
- | + | ||
- | As a simple example, this can be done with: | + | |
- | + | ||
- | function template_parse($html, | + | |
- | + | ||
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | | + | |
} | } | ||
+ | return $sql; | ||
+ | } | ||
+ | $sql = 'WHERE id IN (' . where_in_sql(count($ids)) . ' | ||
+ | </ | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 417: | Line 190: | ||
On [[https:// | On [[https:// | ||
- | - Would this cause performance issues? Presumably not as bad a type checking. | + | - Name it something else? [[https://news-web.php.net/php.internals/109197|Jakob Givoni]] suggested |
- | - Can //array_fill()//+//implode()// pass though the " | + | - Would this cause performance issues? A [[https://github.com/ |
- | - Should the function be named // | + | - Systems/ |
- | - Systems/ | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 433: | Line 200: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
- | PHP could also have a mode where output | + | **Phase 2** could introduce |
+ | |||
+ | For example, a project could require the second argument for //pg_query()// | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), there could be a //is_figurative()// function, which MarkR seems to [[https:// | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 443: | Line 216: | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | Joe Watkins has [[https:// | ||
+ | |||
+ | Dan Ackroyd also [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A | ||
Line 452: | Line 231: | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for surprising me with the first implementation, | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag, and creating the implementation that supports string concat. | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the [[https:// | ||
+ | - **MarkR, **for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis