rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/12/26 11:41] – New examples, with a focus on escaping craigfrancis | rfc:is_literal [2021/04/18 16:55] – some suggested rewording imsop | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.4 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 11: | Line 11: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | This RFC proposes a new function, |
- | As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer. | + | The clearest example is a database library which supports parametrised queries |
- | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | ||
- | ===== The Problem | + | ===== Examples |
- | Escaping | + | The [[https:// |
- | The vast majority of programmers should never do this (mistakes will be made). | + | <code php> |
+ | // INSECURE | ||
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | </ | ||
- | Unsafe values (often user supplied) **must** be kept separate (e.g. parameterised SQL), or be processed by something that understands | + | The definition of the '' |
- | This is primarily for security reasons, but it can also cause data to be damaged | + | <code php> |
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | -> | ||
+ | </ | ||
- | Take these mistakes, where the value has come from the user: | + | Similarly, Twig allows [[https:// |
- | | + | <code php> |
+ | // INSECURE | ||
+ | echo $twig-> | ||
+ | </ | ||
- | Flawed, and unfortunately very common, a classic XSS vulnerability. | + | If '' |
- | echo "<img src=" . htmlentities($url) . " alt='' | + | <code php> |
+ | echo $twig-> | ||
+ | </code> | ||
- | Flawed because the attribute value is not quoted, e.g. // | + | ===== Proposal ===== |
- | echo "< | + | A literal is defined as any value which is entirely under the control of the programmer. The value may be passed between functions, as long as it is not modified in any way other than string concatenation. |
- | Flawed because // | + | <code php> |
+ | is_literal('Example'); // true | ||
- | echo '<a href="' | + | $a = 'Example'; |
+ | is_literal($a); // true | ||
- | Flawed because a link can include JavaScript, e.g. //$url = ' | + | is_literal(4); |
+ | is_literal(0.3); // true | ||
+ | is_literal('a' . ' | ||
- | < | + | $a = ' |
- | var url = "<? | + | $b = $a . ' B ' . 3; |
- | </script> | + | is_literal($b); // true, ideally (more details below) |
- | Flawed because // | + | is_literal($_GET['id']); // false |
- | echo '<a href="/ | + | is_literal(rand(0, 10)); // false |
- | Flawed because //urlencode()// has not been used, e.g. //$name = 'A&B'// | + | is_literal(sprintf('LIMIT %d', 3)); // false |
- | < | + | $c = count($ids); |
+ | $a = 'WHERE id IN (' . implode(',', | ||
+ | is_literal($a); | ||
+ | </code> | ||
- | Flawed because the encoding | + | Note that there is no way to manually mark a string as " |
- | Also flawed because some browsers (e.g. IE 11), if the charset isn't defined (header or meta tag), could guess the output as UTF-7, e.g. //$url = ' | ||
- | example.html: | + | ===== Implementation Notes ===== |
- | <img src={{ url }} alt='' | + | |
- | + | ||
- | $loader | + | |
- | $twig = new \Twig\Environment($loader, | + | |
- | + | ||
- | echo $twig-> | + | |
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ onerror=alert(1)'// | + | (Most of what' |
- | $sql = ' | + | Ideally string concatenation would be allowed, but [[https:// |
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | Thanks to [[https:// |
- | $sql = ' | + | As an aside, [[https:// |
- | Flawed if ' | + | ===== Comparison to Taint Tracking ===== |
- | $sql = ' | + | Some languages implement a "taint flag" which tracks whether values are considered "safe" |
- | Flawed if 'SET NAMES latin1' | + | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex |
- | $parameters = " | + | The current proposal avoids this complexity by addressing |
- | + | ||
- | // $parameters = ' | + | |
- | + | ||
- | mail('a@example.com', ' | + | |
- | Flawed because it's not possible to safely escape values in // | + | ===== Previous Work ===== |
- | ===== Previous Solutions ===== | + | Google currently uses a [[https:// |
- | [[https:// | + | As noted be [[https:// |
- | [[https:// | + | And there is the [[https:// |
* " | * " | ||
Line 102: | Line 112: | ||
* Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | + | ||
- | ===== Related JavaScript Implementation ===== | + | |
- | + | ||
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | |
- | + | ||
- | https://github.com/ | + | |
- | https:// | + | |
- | + | ||
- | They are looking at " | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Literals are safe values, defined within the PHP scripts, for example: | + | |
- | + | ||
- | $a = ' | + | |
- | | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = sprintf(' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $c = count($ids); | + | |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); | + | |
- | + | ||
- | $limit = 10; | + | |
- | $a = 'LIMIT ' . ($limit + 1); | + | |
- | is_literal($a); | + | |
- | + | ||
- | This uses a similar definition of [[https:// | + | |
- | + | ||
- | Thanks to [[https:// | + | |
- | + | ||
- | And thanks to [[https:// | + | |
- | + | ||
- | Commands can be checked to ensure they are a " | + | |
- | + | ||
- | This approach allows all systems/ | + | |
- | + | ||
- | Unlike the Taint extension, there must **not** be an equivalent // | + | |
- | + | ||
- | ==== Solution: SQL Injection ==== | + | |
- | + | ||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | $users = $queryBuilder | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | // example.php? | + | |
- | + | ||
- | This mistake can be easily identified by: | + | |
- | + | ||
- | public function where($predicates) | + | |
- | { | + | |
- | if (function_exists(' | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | $users = R:: | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | $users = UserQuery:: | + | |
- | + | ||
- | The // | + | |
- | + | ||
- | ==== Solution: SQL Injection, Basic ==== | + | |
- | + | ||
- | A simple example: | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | $result = $db-> | + | |
- | + | ||
- | Checked in the framework by: | + | |
- | + | ||
- | class db { | + | |
- | + | ||
- | public function exec($sql, $parameters = []) { | + | |
- | + | ||
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $statement = $this-> | + | |
- | $statement-> | + | |
- | return $statement-> | + | |
- | + | ||
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | This also works with string concatenation: | + | |
- | + | ||
- | define(' | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . $mysqli-> | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | ==== Solution: SQL Injection, ORDER BY ==== | + | |
- | + | ||
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | |
- | + | ||
- | $order_fields = [ | + | |
- | ' | + | |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | $order_id = array_search(($_GET[' | + | |
- | + | ||
- | $sql = ' ORDER BY ' . $order_fields[$order_id]; | + | |
- | + | ||
- | ==== Solution: SQL Injection, WHERE IN ==== | + | |
- | + | ||
- | Most SQL strings can be a simple concatenations of literal values, but //WHERE x IN (?,?,?)// needs to use a variable number of literal placeholders. | + | |
- | + | ||
- | There needs to be a special case for // | + | |
- | + | ||
- | $in_sql = implode(',', | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | ==== Solution: CLI Injection ==== | + | |
- | + | ||
- | Rather than using functions such as: | + | |
- | + | ||
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | Frameworks (or PHP) could introduce something similar to // | + | |
- | + | ||
- | Or, take a safe literal for the command, and use parameters for the arguments (like SQL does): | + | |
- | + | ||
- | $output = parameterised_exec(' | + | |
- | ' | + | |
- | ]); | + | |
- | + | ||
- | Rough implementation: | + | |
- | + | ||
- | function parameterised_exec($cmd, | + | |
- | + | ||
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset = 0; | + | |
- | $k = 0; | + | |
- | while (($pos = strpos($cmd, | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | + | ||
- | ==== Solution: HTML Injection ==== | + | |
- | + | ||
- | Template engines should receive variables separately from the raw HTML. | + | |
- | + | ||
- | Often the engine will get the HTML from static files (safe): | + | |
- | + | ||
- | $html = file_get_contents('/ | + | |
- | + | ||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | |
- | + | ||
- | $template_html = ' | + | |
- | < | + | |
- | < | + | |
- | + | ||
- | Where the variables are supplied separately, in this example I'm using XPath: | + | |
- | + | ||
- | $values = [ | + | |
- | '// | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | + | ||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | |
- | + | ||
- | As a simple example, this can be done with: | + | |
- | + | ||
- | function template_parse($html, | + | |
- | + | ||
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | + | ||
- | } | + | |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 417: | Line 140: | ||
On [[https:// | On [[https:// | ||
- | - Would this cause performance issues? | + | |
+ | | ||
- Can // | - Can // | ||
- | | + | - Systems/ |
- | | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 433: | Line 151: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
- | PHP could also have a mode where output | + | **Phase 2** could introduce |
+ | |||
+ | For example, a project could require the second argument for //pg_query()// | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), there could be a //is_figurative()// function, which MarkR seems to [[https:// | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 443: | Line 167: | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A |
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis