rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/12/23 19:53] – New examples, with a focus on escaping craigfrancis | rfc:is_literal [2021/04/11 11:37] – Example updates craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.3 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 15: | Line 15: | ||
As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer. | As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer. | ||
- | This simple | + | This check can be used to warn or completely block (by default) many, if not all, injection based vulnerabilities. |
- | ===== The Problem ===== | + | See the [[https:// |
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | In short, abstractions like Doctrine could identify [[https:// |
- | The vast majority of programmers should never do this (mistakes will be made). | + | <code php> |
+ | $users = $queryBuilder | ||
+ | ->select(' | ||
+ | -> | ||
+ | -> | ||
+ | -> | ||
+ | -> | ||
+ | </ | ||
- | Unsafe values | + | Or Twig could project against HTML Injection vulnerabilities |
- | This is primarily for security reasons, but it also causes data to be damaged | + | <code php> |
+ | echo $twig-> | ||
+ | </ | ||
- | Take these mistakes: | + | ===== Proposal ===== |
- | echo "< | + | Literals are safe values, defined within the PHP script, for example: |
- | Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)'// | + | <code php> |
+ | is_literal(' | ||
- | echo "< | + | $a = 'Example'; |
+ | is_literal($a); // true | ||
- | Flawed because // | + | is_literal(4); // true |
+ | is_literal(0.3); // true | ||
+ | is_literal('a' . 'b'); // true, compiler can concatenate | ||
- | echo '<a href="' | + | $a = 'A'; |
+ | $b = $a . ' | ||
+ | is_literal($b); | ||
- | Flawed because a link can include JavaScript, e.g. //$url = 'javascript: | + | is_literal($_GET['id']); // false |
- | < | + | is_literal(rand(0, 10)); // false |
- | var url = "<? | + | |
- | </script> | + | |
- | Flawed because // | + | is_literal(sprintf(' |
- | echo '<a href="/ | + | $c = count($ids); |
+ | $a = 'WHERE id IN (' . implode(',', | ||
+ | is_literal($a); | ||
+ | </code> | ||
- | Flawed because | + | Ideally string concatenation would be allowed, but [[https://github.com/Danack/RfcLiteralString/issues/5|Danack]] suggested this might raise performance concerns, and an array implode like function could be used instead (or a query builder). |
- | < | + | Thanks to [[https:// |
- | Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. | + | As an aside, [[https:// |
- | Also flawed because some browsers (e.g. IE 11), if the charset isn't defined (header or meta tag), could guess the output as UTF-7, e.g. //$url = ' | + | Unlike |
- | + | ||
- | example.html: | + | |
- | <img src={{ url }} alt='' | + | |
- | + | ||
- | $loader = new \Twig\Loader\FilesystemLoader(' | + | |
- | $twig = new \Twig\Environment($loader, | + | |
- | + | ||
- | echo $twig-> | + | |
- | + | ||
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ onerror=alert(1)'// | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | Flawed if ' | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | Flawed if 'SET NAMES latin1' | + | |
- | $parameters | + | ===== Previous Work ===== |
- | + | ||
- | // $parameters | + | |
- | + | ||
- | mail(' | + | |
- | Flawed because it's not possible to safely escape values in //$additional_parameters// for //mail()//, e.g. //$email = ' | + | There is the [[https://github.com/laruence/taint|Taint extension]] by Xinchen Hui, but in contrast to this, there must **not** be an equivalent |
- | ===== Previous Solutions ===== | + | Google currently uses a [[https:// |
- | [[https:// | + | As noted be [[https:// |
- | [[https:// | + | And there is the [[https:// |
* " | * " | ||
Line 98: | Line 88: | ||
* Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | + | ||
- | ===== Related JavaScript Implementation ===== | + | |
- | + | ||
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | |
- | + | ||
- | https://github.com/ | + | |
- | https:// | + | |
- | + | ||
- | They are looking at " | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Literals are safe values, defined within the PHP scripts, for example: | + | |
- | + | ||
- | $a = ' | + | |
- | | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = sprintf(' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $c = count($ids); | + | |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); | + | |
- | + | ||
- | $limit = 10; | + | |
- | $a = 'LIMIT ' . ($limit + 1); | + | |
- | is_literal($a); | + | |
- | + | ||
- | This uses a similar definition of [[https:// | + | |
- | + | ||
- | Thanks to [[https:// | + | |
- | + | ||
- | And thanks to [[https:// | + | |
- | + | ||
- | Commands can be checked to ensure they are a " | + | |
- | + | ||
- | This approach allows all systems/ | + | |
- | + | ||
- | Unlike the Taint extension, there must **not** be an equivalent // | + | |
- | + | ||
- | ==== Solution: SQL Injection ==== | + | |
- | + | ||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | $users = $queryBuilder | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | // example.php? | + | |
- | + | ||
- | This mistake can be easily identified by: | + | |
- | + | ||
- | public function where($predicates) | + | |
- | { | + | |
- | if (function_exists(' | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | $users = R:: | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | $users = UserQuery:: | + | |
- | + | ||
- | The // | + | |
- | + | ||
- | ==== Solution: SQL Injection, Basic ==== | + | |
- | + | ||
- | A simple example: | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | $result = $db-> | + | |
- | + | ||
- | Checked in the framework by: | + | |
- | + | ||
- | class db { | + | |
- | + | ||
- | public function exec($sql, $parameters = []) { | + | |
- | + | ||
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $statement = $this-> | + | |
- | $statement-> | + | |
- | return $statement-> | + | |
- | + | ||
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | This also works with string concatenation: | + | |
- | + | ||
- | define(' | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . $mysqli-> | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | ==== Solution: SQL Injection, ORDER BY ==== | + | |
- | + | ||
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | |
- | + | ||
- | $order_fields = [ | + | |
- | ' | + | |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | $order_id = array_search(($_GET[' | + | |
- | + | ||
- | $sql = ' ORDER BY ' . $order_fields[$order_id]; | + | |
- | + | ||
- | ==== Solution: SQL Injection, WHERE IN ==== | + | |
- | + | ||
- | Most SQL strings can be a simple concatenations of literal values, but //WHERE x IN (?,?,?)// needs to use a variable number of literal placeholders. | + | |
- | + | ||
- | There needs to be a special case for // | + | |
- | + | ||
- | $in_sql = implode(',', | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | ==== Solution: CLI Injection ==== | + | |
- | + | ||
- | Rather than using functions such as: | + | |
- | + | ||
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | Frameworks (or PHP) could introduce something similar to // | + | |
- | + | ||
- | Or, take a safe literal for the command, and use parameters for the arguments (like SQL does): | + | |
- | + | ||
- | $output = parameterised_exec(' | + | |
- | ' | + | |
- | ]); | + | |
- | + | ||
- | Rough implementation: | + | |
- | + | ||
- | function parameterised_exec($cmd, | + | |
- | + | ||
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset = 0; | + | |
- | $k = 0; | + | |
- | while (($pos = strpos($cmd, | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | + | ||
- | ==== Solution: HTML Injection ==== | + | |
- | + | ||
- | Template engines should receive variables separately from the raw HTML. | + | |
- | + | ||
- | Often the engine will get the HTML from static files (safe): | + | |
- | + | ||
- | $html = file_get_contents('/ | + | |
- | + | ||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | |
- | + | ||
- | $template_html = ' | + | |
- | < | + | |
- | < | + | |
- | + | ||
- | Where the variables are supplied separately, in this example I'm using XPath: | + | |
- | + | ||
- | $values = [ | + | |
- | '// | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | + | ||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | |
- | + | ||
- | As a simple example, this can be done with: | + | |
- | + | ||
- | function template_parse($html, | + | |
- | + | ||
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | + | ||
- | } | + | |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 413: | Line 116: | ||
On [[https:// | On [[https:// | ||
- | - Would this cause performance issues? | + | |
+ | | ||
- Can // | - Can // | ||
- | | + | - Systems/ |
- | | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 429: | Line 127: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
- | PHP could also have a mode where output | + | **Phase 2** could introduce |
+ | |||
+ | For example, a project could require the second argument for //pg_query()// | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), there could be a //is_figurative()// function, which MarkR seems to [[https:// | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 439: | Line 143: | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A |
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis