rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2021/01/14 14:39] – Add syntax highlighting craigfrancis | rfc:is_literal [2021/04/18 16:55] – some suggested rewording imsop | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.4 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 11: | Line 11: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | This RFC proposes a new function, |
- | As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer. | + | The clearest example is a database library which supports parametrised queries |
- | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS). | ||
- | ===== The Problem | + | ===== Examples |
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | The [[https:// |
- | + | ||
- | The vast majority of programmers should never do this (mistakes will be made). | + | |
- | + | ||
- | Unsafe values (often user supplied) **must** be kept separate (e.g. parameterised SQL), or be processed by something that understands the context (e.g. a HTML Templating Engine). | + | |
- | + | ||
- | This is primarily | + | |
- | + | ||
- | Take these mistakes, where the value has come from the user: | + | |
<code php> | <code php> | ||
- | echo "< | + | // INSECURE |
+ | $qb-> | ||
+ | | ||
+ | | ||
</ | </ | ||
- | Flawed, | + | The definition of the '' |
<code php> | <code php> | ||
- | echo "< | + | $qb-> |
+ | | ||
+ | | ||
+ | ->setParameter(' | ||
</ | </ | ||
- | Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)'/ | + | Similarly, Twig allows [[https:// |
<code php> | <code php> | ||
- | echo "<img src='" | + | // INSECURE |
+ | echo $twig-> | ||
</ | </ | ||
- | Flawed because // | + | If '' |
<code php> | <code php> | ||
- | echo '<a href="' | + | echo $twig-> |
</ | </ | ||
- | Flawed because a link can include JavaScript, e.g. // | + | ===== Proposal ===== |
- | <code html> | + | A literal |
- | < | + | |
- | var url = "<? | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | Flawed because // | + | |
<code php> | <code php> | ||
- | echo '<a href="/ | + | is_literal('Example'); // true |
- | </code> | + | |
- | Flawed because // | + | $a = 'Example'; |
+ | is_literal($a); | ||
- | <code html> | + | is_literal(4); |
- | < | + | is_literal(0.3); // true |
- | </code> | + | is_literal(' |
- | Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. | + | $a = ' |
+ | $b = $a . ' B ' . 3; | ||
+ | is_literal($b); // true, ideally (more details below) | ||
- | Also flawed because some browsers | + | is_literal($_GET['id']); // false |
- | <code php> | + | is_literal(rand(0, |
- | example.html: | + | |
- | <img src={{ url }} alt='' | + | |
- | $loader = new \Twig\Loader\FilesystemLoader('./ | + | is_literal(sprintf('LIMIT %d', |
- | $twig = new \Twig\Environment($loader, [' | + | |
- | echo $twig-> | + | $c = count($ids); |
+ | $a = 'WHERE id IN (' . implode(',' | ||
+ | is_literal($a); // true, the one exception that involves functions. [TODO: this exception is controversial] | ||
</ | </ | ||
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), | + | Note that there is no way to manually mark a string as " |
- | <code php> | ||
- | $sql = ' | ||
- | </ | ||
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | ===== Implementation Notes ===== |
- | <code php> | + | (Most of what's in this section probably doesn't need to be in the final RFC.) |
- | $sql = 'SELECT 1 FROM user WHERE id="' . $mysqli-> | + | |
- | </ | + | |
- | Flawed if ' | + | Ideally string concatenation would be allowed, but [[https://github.com/Danack/RfcLiteralString/issues/5|Danack]] suggested this might raise performance concerns, and an array implode like function could be used instead (or a query builder). |
- | <code php> | + | Thanks to [[https:// |
- | $sql = ' | + | |
- | </code> | + | |
- | Flawed if 'SET NAMES latin1' | + | As an aside, [[https:// |
- | <code php> | + | ===== Comparison to Taint Tracking ===== |
- | $parameters | + | |
- | // $parameters = ' | + | Some languages implement a "taint flag" which tracks whether values are considered " |
- | mail('a@example.com', ' | + | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. |
- | </ | + | |
+ | The current proposal avoids this complexity by addressing a different part of the problem: separating inputs supplied by the programmer from inputs supplied by the user. | ||
- | Flawed because it's not possible to safely escape values in // | + | ===== Previous Work ===== |
- | ===== Previous Solutions ===== | + | Google currently uses a [[https:// |
- | [[https:// | + | As noted be [[https:// |
- | [[https:// | + | And there is the [[https:// |
* " | * " | ||
Line 126: | Line 112: | ||
* Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | + | ||
- | ===== Related JavaScript Implementation ===== | + | |
- | + | ||
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | |
- | + | ||
- | https://github.com/ | + | |
- | https:// | + | |
- | + | ||
- | They are looking at " | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Literals are safe values, defined within the PHP scripts, for example: | + | |
- | + | ||
- | <code php> | + | |
- | $a = ' | + | |
- | is_literal($a); // true | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = sprintf(' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $c = count($ids); | + | |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); | + | |
- | + | ||
- | $limit = 10; | + | |
- | $a = 'LIMIT ' . ($limit + 1); | + | |
- | is_literal($a); | + | |
- | </ | + | |
- | + | ||
- | This uses a similar definition of [[https:// | + | |
- | + | ||
- | Thanks to [[https:// | + | |
- | + | ||
- | And thanks to [[https:// | + | |
- | + | ||
- | Commands can be checked to ensure they are a " | + | |
- | + | ||
- | This approach allows all systems/ | + | |
- | + | ||
- | Unlike the Taint extension, there must **not** be an equivalent // | + | |
- | + | ||
- | ==== Solution: SQL Injection ==== | + | |
- | + | ||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | <code php> | + | |
- | $users = $queryBuilder | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | // example.php? | + | |
- | </ | + | |
- | + | ||
- | This mistake can be easily identified by: | + | |
- | + | ||
- | <code php> | + | |
- | public function where($predicates) | + | |
- | { | + | |
- | if (function_exists(' | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | </ | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | <code php> | + | |
- | $users = R:: | + | |
- | </ | + | |
- | + | ||
- | [[http:// | + | |
- | + | ||
- | <code php> | + | |
- | $users = UserQuery:: | + | |
- | </ | + | |
- | + | ||
- | The // | + | |
- | + | ||
- | ==== Solution: SQL Injection, Basic ==== | + | |
- | + | ||
- | A simple example: | + | |
- | + | ||
- | <code php> | + | |
- | $sql = ' | + | |
- | + | ||
- | $result = $db-> | + | |
- | </ | + | |
- | + | ||
- | Checked in the framework by: | + | |
- | + | ||
- | <code php> | + | |
- | class db { | + | |
- | + | ||
- | public function exec($sql, $parameters = []) { | + | |
- | + | ||
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $statement = $this-> | + | |
- | $statement-> | + | |
- | return $statement-> | + | |
- | + | ||
- | } | + | |
- | + | ||
- | } | + | |
- | </ | + | |
- | + | ||
- | This also works with string concatenation: | + | |
- | + | ||
- | <code php> | + | |
- | define(' | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . $mysqli-> | + | |
- | + | ||
- | is_literal($sql); | + | |
- | </ | + | |
- | + | ||
- | ==== Solution: SQL Injection, ORDER BY ==== | + | |
- | + | ||
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | |
- | + | ||
- | <code php> | + | |
- | $order_fields = [ | + | |
- | ' | + | |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | $order_id = array_search(($_GET[' | + | |
- | + | ||
- | $sql = ' ORDER BY ' . $order_fields[$order_id]; | + | |
- | </ | + | |
- | + | ||
- | ==== Solution: SQL Injection, WHERE IN ==== | + | |
- | + | ||
- | Most SQL strings can be a simple concatenations of literal values, but //WHERE x IN (?,?,?)// needs to use a variable number of literal placeholders. | + | |
- | + | ||
- | There needs to be a special case for // | + | |
- | + | ||
- | <code php> | + | |
- | $in_sql = implode(',', | + | |
- | + | ||
- | $sql = ' | + | |
- | </ | + | |
- | + | ||
- | ==== Solution: CLI Injection ==== | + | |
- | + | ||
- | Rather than using functions such as: | + | |
- | + | ||
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | Frameworks (or PHP) could introduce something similar to // | + | |
- | + | ||
- | Or, take a safe literal for the command, and use parameters for the arguments (like SQL does): | + | |
- | + | ||
- | <code php> | + | |
- | $output = parameterised_exec(' | + | |
- | ' | + | |
- | ]); | + | |
- | </ | + | |
- | + | ||
- | Rough implementation: | + | |
- | + | ||
- | <code php> | + | |
- | function parameterised_exec($cmd, | + | |
- | + | ||
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset = 0; | + | |
- | $k = 0; | + | |
- | while (($pos = strpos($cmd, | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | </ | + | |
- | + | ||
- | ==== Solution: HTML Injection ==== | + | |
- | + | ||
- | Template engines should receive variables separately from the raw HTML. | + | |
- | + | ||
- | Often the engine will get the HTML from static files (safe): | + | |
- | + | ||
- | <code php> | + | |
- | $html = file_get_contents('/ | + | |
- | </ | + | |
- | + | ||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | |
- | + | ||
- | <code php> | + | |
- | $template_html = ' | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | Where the variables are supplied separately, in this example I'm using XPath: | + | |
- | + | ||
- | <code php> | + | |
- | $values = [ | + | |
- | '// | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | </ | + | |
- | + | ||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | |
- | + | ||
- | As a simple example, this can be done with: | + | |
- | + | ||
- | <code php> | + | |
- | function template_parse($html, | + | |
- | + | ||
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | + | ||
- | } | + | |
- | </ | + | |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 473: | Line 140: | ||
On [[https:// | On [[https:// | ||
- | - Would this cause performance issues? | + | |
+ | | ||
- Can // | - Can // | ||
- | | + | - Systems/ |
- | | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 489: | Line 151: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
+ | |||
+ | **Phase 2** could introduce a way for programmers | ||
+ | |||
+ | For example, | ||
- | PHP could also have a mode where output | + | **Phase 3** could set a default of 'only literals' |
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), there could be a //is_figurative()// function, which MarkR seems to [[https://chat.stackoverflow.com/transcript/message/ | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 499: | Line 167: | ||
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A |
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis