rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/12/23 19:53] – New examples, with a focus on escaping craigfrancis | rfc:is_literal [2021/02/20 11:03] – Add examples and details from DanAck craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.3 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 17: | Line 17: | ||
This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | ||
- | ===== The Problem ===== | + | See the [[https:// |
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | But, in short, abstractions like [[https:// |
- | The vast majority of programmers should never do this (mistakes will be made). | + | <code php> |
+ | $query = $em-> | ||
+ | </ | ||
- | Unsafe values (often user supplied) *must* be kept separate (e.g. parameterised SQL), or be processed by something that understands the context (e.g. a HTML Templating Engine). | + | ===== Proposal ===== |
- | This is primarily for security reasons, but it also causes data to be damaged (e.g. ASCII/UTF-8 issues). | + | Literals are safe values, defined within the PHP script, for example: |
- | Take these mistakes: | + | <code php> |
+ | is_literal(' | ||
- | echo "< | + | $a = 'Example'; |
+ | is_literal($a); | ||
- | Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)'// | + | is_literal(4); |
+ | is_literal(0.3); // true | ||
+ | is_literal('a' . ' | ||
- | echo "< | + | $a = 'A'; |
+ | $b = $a . ' | ||
+ | is_literal($b); | ||
- | Flawed because // | + | is_literal($_GET['id']); // false |
- | echo '<a href="' | + | is_literal(rand(0, 10)); // false |
- | Flawed because a link can include JavaScript, e.g. //$url = ' | + | is_literal(sprintf('LIMIT %d', 3)); // false |
- | < | + | $c = count($ids); |
- | var url = "<?= addslashes($url) ?>"; | + | $a = 'WHERE id IN (' . implode(',', |
- | </script> | + | is_literal($a); // true, the one exception that involves functions. |
+ | </code> | ||
- | Flawed because | + | Ideally string concatenation would be allowed, but [[https://github.com/Danack/RfcLiteralString/issues/5|Danack]] suggested this might raise performance concerns, and an array implode like function could be used instead |
- | echo '<a href="/path/?name=' | + | This uses a similar definition of [[https://wiki.php.net/ |
- | Flawed because | + | Thanks to [[https://chat.stackoverflow.com/transcript/message/51565346# |
- | < | + | As an aside, [[https:// |
- | Flawed because | + | Unlike |
- | + | ||
- | Also flawed because some browsers (e.g. IE 11), if the charset isn't defined (header or meta tag), could guess the output as UTF-7, e.g. //$url = ' | + | |
- | + | ||
- | example.html: | + | |
- | <img src={{ url }} alt='' | + | |
- | + | ||
- | $loader = new \Twig\Loader\FilesystemLoader(' | + | |
- | $twig = new \Twig\Environment($loader, | + | |
- | + | ||
- | echo $twig-> | + | |
- | + | ||
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ onerror=alert(1)'// | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | Flawed if ' | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | Flawed if 'SET NAMES latin1' | + | |
- | $parameters | + | ===== Previous Work ===== |
- | + | ||
- | // $parameters | + | |
- | + | ||
- | mail(' | + | |
- | Flawed because it's not possible to safely escape values in //$additional_parameters// | + | There is the [[https://github.com/laruence/taint|Taint extension]] by Xinchen Hui, but this approach explicitly allows escaping, which doesn't address all issues. |
- | ===== Previous Solutions ===== | + | Google currently uses a [[https:// |
- | [[https://github.com/laruence/taint|Taint extension]] by Xinchen Hui, but this approach explicitly allows escaping, which doesn' | + | It might be possible to use static analysis, for example |
- | [[https:// | + | And there is the [[https:// |
* " | * " | ||
Line 99: | Line 80: | ||
I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | ||
- | |||
- | ===== Related JavaScript Implementation ===== | ||
- | |||
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | ||
- | |||
- | https:// | ||
- | https:// | ||
- | |||
- | They are looking at " | ||
- | |||
- | ===== Solution ===== | ||
- | |||
- | Literals are safe values, defined within the PHP scripts, for example: | ||
- | |||
- | $a = ' | ||
- | is_literal($a); | ||
- | | ||
- | $a = ' | ||
- | is_literal($a); | ||
- | | ||
- | $a = ' | ||
- | is_literal($a); | ||
- | | ||
- | $a = ' | ||
- | is_literal($a); | ||
- | | ||
- | $a = sprintf(' | ||
- | is_literal($a); | ||
- | | ||
- | $c = count($ids); | ||
- | $a = 'WHERE id IN (' . implode(',', | ||
- | is_literal($a); | ||
- | | ||
- | $limit = 10; | ||
- | $a = 'LIMIT ' . ($limit + 1); | ||
- | is_literal($a); | ||
- | |||
- | This uses a similar definition of [[https:// | ||
- | |||
- | Thanks to [[https:// | ||
- | |||
- | And thanks to [[https:// | ||
- | |||
- | Commands can be checked to ensure they are a " | ||
- | |||
- | This approach allows all systems/ | ||
- | |||
- | Unlike the Taint extension, there must **not** be an equivalent // | ||
- | |||
- | ==== Solution: SQL Injection ==== | ||
- | |||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. | ||
- | |||
- | [[https:// | ||
- | |||
- | $users = $queryBuilder | ||
- | -> | ||
- | -> | ||
- | -> | ||
- | -> | ||
- | -> | ||
- | | ||
- | // example.php? | ||
- | |||
- | This mistake can be easily identified by: | ||
- | |||
- | public function where($predicates) | ||
- | { | ||
- | if (function_exists(' | ||
- | throw new Exception(' | ||
- | } | ||
- | ... | ||
- | } | ||
- | |||
- | [[https:// | ||
- | |||
- | $users = R:: | ||
- | |||
- | [[http:// | ||
- | |||
- | $users = UserQuery:: | ||
- | |||
- | The // | ||
- | |||
- | ==== Solution: SQL Injection, Basic ==== | ||
- | |||
- | A simple example: | ||
- | |||
- | $sql = ' | ||
- | | ||
- | $result = $db-> | ||
- | |||
- | Checked in the framework by: | ||
- | |||
- | class db { | ||
- | | ||
- | public function exec($sql, $parameters = []) { | ||
- | | ||
- | if (!is_literal($sql)) { | ||
- | throw new Exception(' | ||
- | } | ||
- | | ||
- | $statement = $this-> | ||
- | $statement-> | ||
- | return $statement-> | ||
- | | ||
- | } | ||
- | | ||
- | } | ||
- | |||
- | This also works with string concatenation: | ||
- | |||
- | define(' | ||
- | | ||
- | $sql = ' | ||
- | | ||
- | is_literal($sql); | ||
- | | ||
- | $sql .= ' AND id = ' . $mysqli-> | ||
- | | ||
- | is_literal($sql); | ||
- | |||
- | ==== Solution: SQL Injection, ORDER BY ==== | ||
- | |||
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | ||
- | |||
- | $order_fields = [ | ||
- | ' | ||
- | ' | ||
- | ' | ||
- | ]; | ||
- | | ||
- | $order_id = array_search(($_GET[' | ||
- | | ||
- | $sql = ' ORDER BY ' . $order_fields[$order_id]; | ||
- | |||
- | ==== Solution: SQL Injection, WHERE IN ==== | ||
- | |||
- | Most SQL strings can be a simple concatenations of literal values, but //WHERE x IN (?,?,?)// needs to use a variable number of literal placeholders. | ||
- | |||
- | There needs to be a special case for // | ||
- | |||
- | $in_sql = implode(',', | ||
- | | ||
- | $sql = ' | ||
- | |||
- | ==== Solution: CLI Injection ==== | ||
- | |||
- | Rather than using functions such as: | ||
- | |||
- | * //exec()// | ||
- | * // | ||
- | * // | ||
- | * // | ||
- | |||
- | Frameworks (or PHP) could introduce something similar to // | ||
- | |||
- | Or, take a safe literal for the command, and use parameters for the arguments (like SQL does): | ||
- | |||
- | $output = parameterised_exec(' | ||
- | ' | ||
- | ]); | ||
- | |||
- | Rough implementation: | ||
- | |||
- | function parameterised_exec($cmd, | ||
- | | ||
- | if (!is_literal($cmd)) { | ||
- | throw new Exception(' | ||
- | } | ||
- | | ||
- | $offset = 0; | ||
- | $k = 0; | ||
- | while (($pos = strpos($cmd, | ||
- | if (!isset($args[$k])) { | ||
- | throw new Exception(' | ||
- | exit(); | ||
- | } | ||
- | $arg = escapeshellarg($args[$k]); | ||
- | $cmd = substr($cmd, | ||
- | $offset = ($pos + strlen($arg)); | ||
- | $k++; | ||
- | } | ||
- | if (isset($args[$k])) { | ||
- | throw new Exception(' | ||
- | exit(); | ||
- | } | ||
- | | ||
- | return exec($cmd); | ||
- | | ||
- | } | ||
- | |||
- | ==== Solution: HTML Injection ==== | ||
- | |||
- | Template engines should receive variables separately from the raw HTML. | ||
- | |||
- | Often the engine will get the HTML from static files (safe): | ||
- | |||
- | $html = file_get_contents('/ | ||
- | |||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | ||
- | |||
- | $template_html = ' | ||
- | < | ||
- | < | ||
- | |||
- | Where the variables are supplied separately, in this example I'm using XPath: | ||
- | |||
- | $values = [ | ||
- | '// | ||
- | NULL => ' | ||
- | ' | ||
- | ' | ||
- | ], | ||
- | '// | ||
- | ' | ||
- | ], | ||
- | ]; | ||
- | | ||
- | echo template_parse($template_html, | ||
- | |||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | ||
- | |||
- | As a simple example, this can be done with: | ||
- | |||
- | function template_parse($html, | ||
- | | ||
- | if (!is_literal($html)) { | ||
- | throw new Exception(' | ||
- | } | ||
- | | ||
- | $dom = new DomDocument(); | ||
- | $dom-> | ||
- | | ||
- | $xpath = new DOMXPath($dom); | ||
- | | ||
- | foreach ($values as $query => $attributes) { | ||
- | | ||
- | if (!is_literal($query)) { | ||
- | throw new Exception(' | ||
- | } | ||
- | | ||
- | foreach ($xpath-> | ||
- | foreach ($attributes as $attribute => $value) { | ||
- | | ||
- | if (!is_literal($attribute)) { | ||
- | throw new Exception(' | ||
- | } | ||
- | | ||
- | if ($attribute) { | ||
- | $safe = false; | ||
- | if ($attribute == ' | ||
- | if (preg_match('/ | ||
- | $safe = true; // Not " | ||
- | } | ||
- | } else if ($attribute == ' | ||
- | if (in_array($value, | ||
- | $safe = true; // Only allow specific classes? | ||
- | } | ||
- | } else if (preg_match('/ | ||
- | if (preg_match('/ | ||
- | $safe = true; | ||
- | } | ||
- | } | ||
- | if ($safe) { | ||
- | $element-> | ||
- | } | ||
- | } else { | ||
- | $element-> | ||
- | } | ||
- | | ||
- | } | ||
- | } | ||
- | | ||
- | } | ||
- | | ||
- | $html = ''; | ||
- | $body = $dom-> | ||
- | if ($body-> | ||
- | foreach ($body-> | ||
- | $html .= $dom-> | ||
- | } | ||
- | } | ||
- | | ||
- | return $html; | ||
- | | ||
- | } | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 413: | Line 107: | ||
On [[https:// | On [[https:// | ||
- | - Would this cause performance issues? | + | |
+ | | ||
- Can // | - Can // | ||
- | | + | - Systems/ |
- | | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 429: | Line 118: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
- | PHP could also have a mode where output (e.g. //echo '< | + | This check could be used to throw an exception, or generate an error/ |
+ | |||
+ | PHP could have a mode where output (e.g. //echo '< | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), there could be a // | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 439: | Line 132: | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A |
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis