rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2021/01/14 14:39] – Add syntax highlighting craigfrancis | rfc:is_literal [2021/03/21 11:36] – Add note about Perl "Taint Mode" craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.3 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 17: | Line 17: | ||
This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | ||
- | ===== The Problem ===== | + | See the [[https:// |
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | In short, abstractions like Doctrine |
- | + | ||
- | The vast majority of programmers should never do this (mistakes will be made). | + | |
- | + | ||
- | Unsafe values (often user supplied) **must** be kept separate (e.g. parameterised SQL), or be processed by something that understands the context (e.g. a HTML Templating Engine). | + | |
- | + | ||
- | This is primarily for security reasons, but it can also cause data to be damaged (e.g. ASCII/UTF-8 issues). | + | |
- | + | ||
- | Take these mistakes, where the value has come from the user: | + | |
- | + | ||
- | <code php> | + | |
- | echo "< | + | |
- | </ | + | |
- | + | ||
- | Flawed, and unfortunately very common, a classic XSS vulnerability. | + | |
- | + | ||
- | <code php> | + | |
- | echo "< | + | |
- | </ | + | |
- | + | ||
- | Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)'// | + | |
- | + | ||
- | <code php> | + | |
- | echo "< | + | |
- | </ | + | |
- | + | ||
- | Flawed because // | + | |
- | + | ||
- | <code php> | + | |
- | echo '<a href="' | + | |
- | </ | + | |
- | + | ||
- | Flawed because a link can include JavaScript, e.g. //$url = ' | + | |
- | + | ||
- | <code html> | + | |
- | < | + | |
- | var url = "<? | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | Flawed because // | + | |
- | + | ||
- | <code php> | + | |
- | echo '<a href="/ | + | |
- | </ | + | |
- | + | ||
- | Flawed because // | + | |
- | + | ||
- | <code html> | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. | + | |
- | + | ||
- | Also flawed because some browsers (e.g. IE 11), if the charset isn't defined (header or meta tag), could guess the output as UTF-7, e.g. //$url = ' | + | |
- | + | ||
- | <code php> | + | |
- | example.html: | + | |
- | <img src={{ url }} alt='' | + | |
- | + | ||
- | $loader = new \Twig\Loader\FilesystemLoader(' | + | |
- | $twig = new \Twig\Environment($loader, | + | |
- | + | ||
- | echo $twig-> | + | |
- | </ | + | |
- | + | ||
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ onerror=alert(1)'// | + | |
- | + | ||
- | <code php> | + | |
- | $sql = ' | + | |
- | </ | + | |
- | + | ||
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | |
- | + | ||
- | <code php> | + | |
- | $sql = ' | + | |
- | </ | + | |
- | + | ||
- | Flawed if ' | + | |
- | + | ||
- | <code php> | + | |
- | $sql = ' | + | |
- | </ | + | |
- | + | ||
- | Flawed if 'SET NAMES latin1' | + | |
- | + | ||
- | <code php> | + | |
- | $parameters = " | + | |
- | + | ||
- | // $parameters = ' | + | |
- | + | ||
- | mail(' | + | |
- | </ | + | |
- | + | ||
- | Flawed because it's not possible to safely escape values in // | + | |
- | + | ||
- | ===== Previous Solutions ===== | + | |
- | + | ||
- | [[https://github.com/ | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | * " | + | |
- | * this amount of work isn't ideal for "just for one use case" ([[https:// | + | |
- | * It would have effected every SQL function, such as // | + | |
- | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | + | |
- | + | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | |
- | + | ||
- | ===== Related JavaScript Implementation ===== | + | |
- | + | ||
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | |
- | + | ||
- | https:// | + | |
- | https:// | + | |
- | + | ||
- | They are looking at " | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Literals are safe values, defined within the PHP scripts, for example: | + | |
- | + | ||
- | <code php> | + | |
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = sprintf(' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $c = count($ids); | + | |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); | + | |
- | + | ||
- | $limit = 10; | + | |
- | $a = 'LIMIT ' . ($limit + 1); | + | |
- | is_literal($a); | + | |
- | </ | + | |
- | + | ||
- | This uses a similar definition of [[https:// | + | |
- | + | ||
- | Thanks to [[https:// | + | |
- | + | ||
- | And thanks to [[https:// | + | |
- | + | ||
- | Commands can be checked to ensure they are a " | + | |
- | + | ||
- | This approach allows all systems/ | + | |
- | + | ||
- | Unlike the Taint extension, there must **not** be an equivalent // | + | |
- | + | ||
- | ==== Solution: SQL Injection ==== | + | |
- | + | ||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. | + | |
- | + | ||
- | [[https:// | + | |
<code php> | <code php> | ||
Line 191: | Line 28: | ||
-> | -> | ||
-> | -> | ||
- | |||
- | // example.php? | ||
</ | </ | ||
- | This mistake can be easily identified by: | + | Or this Twig [[https:// |
<code php> | <code php> | ||
- | public function where($predicates) | + | echo $twig-> |
- | { | + | |
- | if (function_exists('is_literal') && !is_literal($predicates)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
</ | </ | ||
- | [[https:// | + | ===== Proposal ===== |
- | <code php> | + | Literals are safe values, defined within the PHP script, for example: |
- | $users = R:: | + | |
- | </ | + | |
- | + | ||
- | [[http:// | + | |
<code php> | <code php> | ||
- | $users = UserQuery:: | + | is_literal('Example'); // true |
- | </code> | + | |
- | The //is_literal()// | + | $a = ' |
+ | is_literal($a); // true | ||
- | ==== Solution: SQL Injection, Basic ==== | + | is_literal(4); |
+ | is_literal(0.3); | ||
+ | is_literal(' | ||
- | A simple example: | + | $a = 'A'; |
+ | $b = $a . ' B ' . 3; | ||
+ | is_literal($b); | ||
- | <code php> | + | is_literal($_GET[' |
- | $sql = 'SELECT * FROM table WHERE id = ?'; | + | |
- | $result = $db-> | + | is_literal(rand(0, 10)); // false |
- | </code> | + | |
- | Checked in the framework by: | + | is_literal(sprintf(' |
- | <code php> | + | $c = count($ids); |
- | class db { | + | $a = 'WHERE id IN (' . implode(',', array_fill(0, $c, '?' |
- | + | is_literal($a); // true, the one exception that involves functions. | |
- | public function exec($sql, $parameters | + | |
- | + | ||
- | if (!is_literal($sql)) { | + | |
- | throw new Exception('SQL must be a literal.'); | + | |
- | } | + | |
- | + | ||
- | $statement = $this-> | + | |
- | | + | |
- | return $statement-> | + | |
- | + | ||
- | } | + | |
- | + | ||
- | } | + | |
</ | </ | ||
- | This also works with string concatenation: | + | Ideally |
- | <code php> | + | Thanks to [[https:// |
- | define(' | + | |
- | $sql = ' | + | As an aside, [[https:// |
- | is_literal($sql); // Returns true | + | Unlike the Taint extension, there must **not** be an equivalent //untaint()// function, or support any kind of escaping. |
- | $sql .= ' AND id = ' . $mysqli-> | + | ===== Previous Work ===== |
- | is_literal($sql); // Returns false | + | There is the [[https:// |
- | </code> | + | |
- | ==== Solution: SQL Injection, ORDER BY ==== | + | Google currently uses a [[https:// |
- | To ensure | + | As noted be [[https://news-web.php.net/php.internals/109192|Tyson Andre]], it might be possible to use static analysis, for example [[https:// |
- | < | + | And there is the [[https:// |
- | $order_fields = [ | + | |
- | 'name', | + | |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | $order_id = array_search(($_GET['sort'] ?? NULL), $order_fields); | + | * " |
+ | * this amount of work isn't ideal for "just for one use case" ([[https:// | ||
+ | * It would have effected every SQL function, such as // | ||
+ | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | $sql = ' ORDER BY ' . $order_fields[$order_id]; | + | I also agree that "SQL injection is almost |
- | </ | + | |
- | + | ||
- | ==== Solution: | + | |
- | + | ||
- | Most SQL strings can be a simple concatenations of literal values, but //WHERE x IN (?,?,?)// needs to use a variable number of literal placeholders. | + | |
- | + | ||
- | There needs to be a special case for // | + | |
- | + | ||
- | <code php> | + | |
- | $in_sql = implode(',', | + | |
- | + | ||
- | $sql = ' | + | |
- | </ | + | |
- | + | ||
- | ==== Solution: CLI Injection ==== | + | |
- | + | ||
- | Rather than using functions such as: | + | |
- | + | ||
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | Frameworks (or PHP) could introduce something similar to // | + | |
- | + | ||
- | Or, take a safe literal for the command, and use parameters for the arguments (like SQL does): | + | |
- | + | ||
- | <code php> | + | |
- | $output = parameterised_exec(' | + | |
- | ' | + | |
- | | + | |
- | </ | + | |
- | + | ||
- | Rough implementation: | + | |
- | + | ||
- | <code php> | + | |
- | function parameterised_exec($cmd, | + | |
- | + | ||
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset = 0; | + | |
- | $k = 0; | + | |
- | while (($pos = strpos($cmd, | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | </ | + | |
- | + | ||
- | ==== Solution: HTML Injection ==== | + | |
- | + | ||
- | Template engines should receive variables separately from the raw HTML. | + | |
- | + | ||
- | Often the engine will get the HTML from static files (safe): | + | |
- | + | ||
- | <code php> | + | |
- | $html = file_get_contents(' | + | |
- | </ | + | |
- | + | ||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | |
- | + | ||
- | < | + | |
- | $template_html = ' | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | Where the variables are supplied separately, in this example I'm using XPath: | + | |
- | + | ||
- | < | + | |
- | $values = [ | + | |
- | '// | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | | + | |
- | '//a' => [ | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | </ | + | |
- | + | ||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | |
- | + | ||
- | As a simple example, this can be done with: | + | |
- | + | ||
- | <code php> | + | |
- | function template_parse($html, | + | |
- | + | ||
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match(' | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | + | ||
- | } | + | |
- | </ | + | |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 473: | Line 116: | ||
On [[https:// | On [[https:// | ||
- | - Would this cause performance issues? | + | |
+ | | ||
- Can // | - Can // | ||
- | | + | - Systems/ |
- | | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 489: | Line 127: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
+ | |||
+ | **Phase 2** could introduce a way for programmers to specify that certain function arguments only accept safe literals, and/or specific value-objects their project trusts (this idea comes from [[https:// | ||
+ | |||
+ | For example, a project could require the second argument for // | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
- | PHP could also have a mode where output | + | And, for a bit of silliness (Spaß ist verboten), there could be a // |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 499: | Line 143: | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A |
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis