rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2021/01/14 18:28] – Add note about the Go implementation craigfrancis | rfc:is_literal [2021/04/19 13:44] – Updated examples, and general tweaks craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.5 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 11: | Line 11: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | This RFC proposes a new function, |
- | As in, at runtime, being able to check if a variable has been created by literals, | + | This addresses some of the same use cases as "taint flags", but is both simpler and stricter. It does not address how user data is transmitted or escaped, only whether it has been passed |
- | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | + | The clearest example is a database library which supports parametrised queries at the driver level, where the programmer could use either |
- | ===== The Problem ===== | + | <code php> |
+ | $db-> | ||
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | $db-> |
- | + | </ | |
- | The vast majority of programmers should never do this (mistakes will be made). | + | |
- | Unsafe values | + | By rejecting the SQL that was not written as a literal |
- | This is primarily for security reasons, but it can also cause data to be damaged (e.g. ASCII/UTF-8 issues). | + | ===== Examples ===== |
- | Take these mistakes, where the value has come from the user: | + | The [[https:// |
<code php> | <code php> | ||
- | echo "< | + | // INSECURE |
+ | $qb-> | ||
+ | | ||
+ | | ||
</ | </ | ||
- | Flawed, | + | The definition of the //where()// method could check with // |
<code php> | <code php> | ||
- | echo "< | + | $qb-> |
+ | | ||
+ | | ||
+ | ->setParameter(' | ||
</ | </ | ||
- | Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)'/ | + | Similarly, Twig allows [[https:// |
<code php> | <code php> | ||
- | echo "<img src='" | + | // INSECURE |
+ | echo $twig-> | ||
</ | </ | ||
- | Flawed because | + | If //createTemplate()// checked with //is_literal()//, the programmer could be advised to write this instead: |
<code php> | <code php> | ||
- | echo '<a href="' | + | echo $twig-> |
</ | </ | ||
- | Flawed because a link can include JavaScript, e.g. // | + | ===== Proposal ===== |
- | <code html> | + | A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way other than string concatenation. |
- | < | + | |
- | var url = "<? | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | Flawed because // | + | |
<code php> | <code php> | ||
- | echo '<a href="/ | + | is_literal('Example'); // true |
- | </code> | + | |
- | Flawed because // | + | $a = 'Example'; |
+ | is_literal($a); | ||
- | <code html> | + | is_literal(4); |
- | < | + | is_literal(0.3); // true |
- | </code> | + | is_literal(' |
- | Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. | + | $a = ' |
+ | $b = $a . ' B ' . 3; | ||
+ | is_literal($b); // true, ideally (more details below) | ||
- | Also flawed because some browsers | + | is_literal($_GET['id']); // false |
- | + | ||
- | <code php> | + | |
- | example.html: | + | |
- | <img src={{ url }} alt='' | + | |
- | $loader = new \Twig\Loader\FilesystemLoader(' | + | is_literal(rand(0, 10)); // false |
- | $twig = new \Twig\Environment($loader, [' | + | |
- | echo $twig-> | + | is_literal(sprintf('LIMIT %d', |
</ | </ | ||
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), | + | Note that there is no way to manually mark a string as a literal |
- | <code php> | + | See the [[https:// |
- | $sql = ' | + | |
- | </code> | + | |
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | ===== Comparison to Taint Tracking ===== |
- | <code php> | + | Some languages implement a "taint flag" which tracks whether values are considered "safe" |
- | $sql = ' | + | |
- | </code> | + | |
- | Flawed if ' | + | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. |
- | <code php> | + | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. |
- | $sql = ' | + | |
- | </ | + | |
- | + | ||
- | Flawed if 'SET NAMES latin1' | + | |
- | + | ||
- | <code php> | + | |
- | $parameters = " | + | |
- | + | ||
- | // $parameters = ' | + | |
- | + | ||
- | mail(' | + | |
- | </ | + | |
- | Flawed because it's not possible to safely escape values in // | + | ===== Previous Work ===== |
- | ===== Previous Solutions ===== | + | Google uses a [[https:// |
- | [[https:// | + | As noted by [[https:// |
- | [[https:// | + | And there is the [[https:// |
* " | * " | ||
Line 126: | Line 108: | ||
* Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | ===== Related Go Implementation | + | ===== Usage ===== |
- | As discussed by [[https:// | + | By libraries: |
- | + | ||
- | The Go language can do this by checking for " | + | |
- | + | ||
- | Google currently avoids most issues by using [[https:// | + | |
- | + | ||
- | ===== Related JavaScript Implementation ===== | + | |
- | + | ||
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | |
- | + | ||
- | https:// | + | |
- | https:// | + | |
- | + | ||
- | They are looking at " | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Literals are safe values, defined within the PHP scripts, for example: | + | |
<code php> | <code php> | ||
- | $a = ' | + | function literal_check($var) { |
- | is_literal($a); // true | + | |
- | + | $level = 2; // Get from config, defaults to 1. | |
- | $a = ' | + | if ($level === 0) { |
- | is_literal($a); // true | + | // Programmer aware, and is choosing |
- | + | } else if ($level | |
- | $a = ' | + | |
- | is_literal($a); // false | + | |
- | + | throw new Exception(' | |
- | $a = 'Example ' . time(); | + | |
- | is_literal($a); // false | + | |
- | + | ||
- | $a = sprintf('LIMIT %d', 3); | + | |
- | is_literal($a); // false | + | |
- | + | ||
- | $c = count($ids); | + | |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); // true, the odd one that involves functions. | + | |
- | + | ||
- | $limit = 10; | + | |
- | $a = 'LIMIT ' . ($limit + 1); | + | |
- | is_literal($a); | + | |
- | </ | + | |
- | + | ||
- | This uses a similar definition of [[https:// | + | |
- | + | ||
- | Thanks to [[https:// | + | |
- | + | ||
- | And thanks to [[https:// | + | |
- | + | ||
- | Commands can be checked | + | |
- | + | ||
- | This approach allows all systems/ | + | |
- | + | ||
- | Unlike the Taint extension, there must **not** be an equivalent // | + | |
- | + | ||
- | ==== Solution: SQL Injection ==== | + | |
- | + | ||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | <code php> | + | |
- | $users = $queryBuilder | + | |
- | ->select('u') | + | |
- | | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | // example.php? | + | |
- | </ | + | |
- | + | ||
- | This mistake can be easily identified by: | + | |
- | + | ||
- | <code php> | + | |
- | public function where($predicates) | + | |
- | { | + | |
- | | + | |
- | throw new Exception(' | + | |
} | } | ||
- | ... | + | } |
} | } | ||
- | </ | ||
- | |||
- | [[https:// | ||
- | |||
- | <code php> | ||
- | $users = R:: | ||
- | </ | ||
- | |||
- | [[http:// | ||
- | |||
- | <code php> | ||
- | $users = UserQuery:: | ||
- | </ | ||
- | |||
- | The // | ||
- | |||
- | ==== Solution: SQL Injection, Basic ==== | ||
- | |||
- | A simple example: | ||
- | |||
- | <code php> | ||
- | $sql = ' | ||
- | |||
- | $result = $db-> | ||
- | </ | ||
- | |||
- | Checked in the framework by: | ||
- | |||
- | <code php> | ||
- | class db { | ||
- | |||
- | public function exec($sql, $parameters = []) { | ||
- | |||
- | if (!is_literal($sql)) { | ||
- | throw new Exception(' | ||
- | } | ||
- | |||
- | $statement = $this-> | ||
- | $statement-> | ||
- | return $statement-> | ||
- | |||
- | } | ||
+ | function example($input) { | ||
+ | literal_check($input); | ||
+ | // ... | ||
} | } | ||
- | </ | ||
- | This also works with string concatenation: | + | example('hello'); // OK |
- | + | example(strtoupper('hello')); // Exception thrown: the result of strtoupper is a new, non-literal string | |
- | <code php> | + | |
- | define('TABLE', | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); // Returns true | + | |
- | + | ||
- | $sql .= ' AND id = ' . $mysqli-> | + | |
- | + | ||
- | is_literal($sql); // Returns false | + | |
</ | </ | ||
- | ==== Solution: | + | Table and Fields in SQL, which cannot use parameters; for example |
- | + | ||
- | To ensure | + | |
<code php> | <code php> | ||
Line 289: | Line 151: | ||
</ | </ | ||
- | ==== Solution: SQL Injection, WHERE IN ==== | + | Undefined number |
- | + | ||
- | Most SQL strings can be a simple concatenations | + | |
- | + | ||
- | There needs to be a special case for // | + | |
<code php> | <code php> | ||
- | $in_sql = implode(',', | + | function where_in_sql($count) |
- | + | $sql = '?'; | |
- | $sql = 'SELECT * FROM table WHERE id IN (' . $in_sql . ')'; | + | for ($k = 1; $k < $count; |
- | </ | + | $sql .= ',?'; |
- | + | ||
- | ==== Solution: CLI Injection ==== | + | |
- | + | ||
- | Rather than using functions such as: | + | |
- | + | ||
- | | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | Frameworks (or PHP) could introduce something similar to // | + | |
- | + | ||
- | Or, take a safe literal | + | |
- | + | ||
- | <code php> | + | |
- | $output | + | |
- | ' | + | |
- | ]); | + | |
- | </code> | + | |
- | + | ||
- | Rough implementation: | + | |
- | + | ||
- | <code php> | + | |
- | function parameterised_exec($cmd, $args = []) { | + | |
- | + | ||
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception('The first argument must be a literal'); | + | |
} | } | ||
- | + | return $sql; | |
- | $offset = 0; | + | |
- | $k = 0; | + | |
- | while (($pos = strpos($cmd, | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | </ | + | |
- | + | ||
- | ==== Solution: HTML Injection ==== | + | |
- | + | ||
- | Template engines should receive variables separately from the raw HTML. | + | |
- | + | ||
- | Often the engine will get the HTML from static files (safe): | + | |
- | + | ||
- | <code php> | + | |
- | $html = file_get_contents('/ | + | |
- | </ | + | |
- | + | ||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | |
- | + | ||
- | <code php> | + | |
- | $template_html = ' | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | Where the variables are supplied separately, in this example I'm using XPath: | + | |
- | + | ||
- | <code php> | + | |
- | $values = [ | + | |
- | '// | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | </ | + | |
- | + | ||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | |
- | + | ||
- | As a simple example, this can be done with: | + | |
- | + | ||
- | <code php> | + | |
- | function template_parse($html, | + | |
- | + | ||
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | | + | |
} | } | ||
+ | $sql = 'WHERE id IN (' . where_in_sql(count($ids)) . ' | ||
</ | </ | ||
Line 481: | Line 190: | ||
On [[https:// | On [[https:// | ||
- | - Would this cause performance issues? Presumably not as bad a type checking. | + | - Name it something else? [[https://news-web.php.net/php.internals/109197|Jakob Givoni]] suggested |
- | - Can //array_fill()//+//implode()// pass though the " | + | - Would this cause performance issues? A [[https://github.com/ |
- | - Should the function be named // | + | - Systems/ |
- | - Systems/ | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 497: | Line 200: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
+ | |||
+ | **Phase 2** could introduce a way for programmers | ||
+ | |||
+ | For example, | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
- | PHP could also have a mode where output | + | And, for a bit of silliness (Spaß ist verboten), there could be a // |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 507: | Line 216: | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | Joe Watkins has [[https:// | ||
+ | |||
+ | Dan Ackroyd also [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A | ||
Line 516: | Line 231: | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for surprising me with the first implementation, | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag, and creating the implementation that supports string concat. | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the [[https:// | ||
+ | - **MarkR, **for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis