rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/03/23 17:44] – Add a note about "interned strings" craigfrancis | rfc:is_literal [2021/04/19 13:44] – Updated examples, and general tweaks craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.1 | + | * Version: 0.5 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
+ | * Updated: 2021-04-19 | ||
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
* First Published at: https:// | * First Published at: https:// | ||
+ | * GitHub Repo: https:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | This RFC proposes a new function, |
- | This function would allows developers/ | + | This addresses some of the same use cases as "taint flags", but is both simpler and stricter. It does not address how user data is transmitted or escaped, only whether it has been passed |
- | Commands can then be tested to ensure they are a " | + | The clearest example is a database library which supports parametrised queries at the driver level, where the programmer could use either of these: |
- | This will also allow systems/ | + | <code php> |
+ | $db-> | ||
- | Literals are values defined within the PHP scripts, for example: | + | $db-> |
+ | </ | ||
- | $a = ' | + | By rejecting the SQL that was not written as a literal |
- | $b = ' | + | |
- | is_literal($b); // Returns true | + | |
- | + | ||
- | $c = ' | + | |
- | is_literal($c); | + | |
- | ===== Related JavaScript Implementation | + | ===== Examples |
- | + | ||
- | This proposal is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | |
- | + | ||
- | https:// | + | |
- | https:// | + | |
- | + | ||
- | They are looking at " | + | |
- | + | ||
- | ===== Taint Checking ===== | + | |
- | + | ||
- | Xinchen Hui has done some amazing work with the Taint extension: | + | |
- | + | ||
- | https:// | + | |
- | + | ||
- | Unfortunately this approach does not address all issues, mainly because it still allows string escaping, which is only " | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | // delete.php? | + | |
- | + | ||
- | // DELETE FROM table WHERE id = id | + | |
- | $html = '< | + | The [[https:// |
- | + | ||
- | | + | |
- | + | ||
- | | + | |
- | The Taint extension also [[https://github.com/ | + | <code php> |
+ | // INSECURE | ||
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | </ | ||
- | ===== Previous RFC ===== | + | The definition of the //where()// method could check with // |
- | Matt Tait suggested [[https:// | + | < |
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | | ||
+ | </code> | ||
- | It was noted that " | + | Similarly, Twig allows |
- | Where it would have effected every SQL function, such as // | + | <code php> |
+ | // INSECURE | ||
+ | echo $twig->createTemplate('< | ||
+ | </code> | ||
- | And each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https:// | + | If // |
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | <code php> |
+ | echo $twig-> | ||
+ | </ | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | Add an // | + | A literal is defined as a value (string) which has been written |
- | This uses a similar definition as the [[https:// | + | < |
+ | is_literal(' | ||
- | Thanks to [[https://news-web.php.net/ | + | $a = ' |
+ | is_literal($a); | ||
- | And thanks to [[https://chat.stackoverflow.com/transcript/message/ | + | is_literal(4); |
+ | is_literal(0.3); // true | ||
+ | is_literal(' | ||
- | Unlike the Taint extension, there is no need to provide an equivalent //untaint()// function. | + | $a = ' |
+ | $b = $a . ' B ' . 3; | ||
+ | is_literal($b); // true, ideally (more details below) | ||
- | ===== Examples ===== | + | is_literal($_GET[' |
- | ==== SQL Injection, Basic ==== | + | is_literal(rand(0, 10)); // false |
- | A simple example: | + | is_literal(sprintf(' |
+ | </ | ||
- | $sql = ' | + | Note that there is no way to manually mark a string as a literal (i.e. no equivalent to //untaint()//); as soon as the value has been manipulated in any way, it is no longer marked as a literal. |
- | + | ||
- | $result = $db-> | + | |
- | Checked in the framework by: | + | See the [[https:// |
- | class db { | + | ===== Comparison to Taint Tracking ===== |
- | + | ||
- | public function exec($sql, $parameters | + | |
- | + | ||
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $statement | + | |
- | $statement-> | + | |
- | return $statement-> | + | |
- | + | ||
- | } | + | |
- | + | ||
- | } | + | |
- | It will also work with string concatenation: | + | Some languages implement a "taint flag" which tracks whether values are considered " |
- | define(' | + | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . mysqli_real_escape_string($db, | + | |
- | + | ||
- | is_literal($sql); | + | |
- | ==== SQL Injection, ORDER BY ==== | + | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. |
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | ===== Previous Work ===== |
- | $order_fields = [ | + | Google uses a [[https:// |
- | ' | + | |
- | ' | + | |
- | ' | + | |
- | | + | |
- | + | ||
- | $order_id = array_search(($_GET[' | + | |
- | + | ||
- | $sql = ' ORDER BY ' . $order_fields[$order_id]; | + | |
- | ==== SQL Injection, WHERE IN ==== | + | As noted by [[https:// |
- | Most SQL strings can be a concatenations | + | And there is the [[https:// |
- | So there //might// need to be a special case for //array_fill()//+//implode()// or //str_repeat()//+//substr()// to create something like '?,?,?' | + | * " |
+ | * this amount of work isn't ideal for "just for one use case" ([[https://news-web.php.net/php.internals/87647|Julien Pauli]]); | ||
+ | * It would have effected every SQL function, such as //mysqli_query()//, //$pdo-> | ||
+ | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | $in_sql = implode(',', | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | + | ||
- | | + | |
- | + | ||
- | $in_sql = substr(str_repeat('?,', | + | |
- | To be used with: | + | ===== Usage ===== |
- | $sql = ' | + | By libraries: |
- | ==== SQL Injection, ORM Usage ==== | + | <code php> |
- | + | function | |
- | [[https:// | + | if (function_exists(' |
- | + | $level = 2; // Get from config, defaults to 1. | |
- | $users = $queryBuilder | + | |
- | ->select(' | + | // Programmer aware, and is choosing |
- | | + | |
- | -> | + | |
- | -> | + | } else { |
- | -> | + | throw new Exception(' |
- | + | ||
- | // example.php? | + | |
- | + | ||
- | Where this mistake could be identified by: | + | |
- | + | ||
- | public | + | |
- | | + | |
- | | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | $users | + | |
- | + | ||
- | [[http://propelorm.org/ | + | |
- | + | ||
- | $users = UserQuery:: | + | |
- | + | ||
- | ==== SQL Injection, ORM Internal ==== | + | |
- | + | ||
- | The // | + | |
- | + | ||
- | This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https:// | + | |
- | + | ||
- | ==== CLI Injection ==== | + | |
- | + | ||
- | Rather than using functions such as: | + | |
- | + | ||
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | Frameworks (or PHP) could introduce something similar to // | + | |
- | + | ||
- | Or, take a verified literal for the command, and use parameters for the arguments (like SQL): | + | |
- | + | ||
- | $output = parameterised_exec(' | + | |
- | ' | + | |
- | ]); | + | |
- | + | ||
- | Rough implementation: | + | |
- | + | ||
- | function parameterised_exec($cmd, | + | |
- | | + | |
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset | + | |
- | $k = 0; | + | |
- | while (($pos | + | |
- | | + | |
- | throw new Exception('Missing parameter "' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, 0, $pos) . $arg . substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
} | } | ||
- | | ||
- | return exec($cmd); | ||
- | | ||
} | } | ||
+ | } | ||
- | ==== HTML Injection ==== | + | function example($input) { |
+ | literal_check($input); | ||
+ | // ... | ||
+ | } | ||
- | Template engines should receive variables separately from the raw HTML. | + | example(' |
+ | example(strtoupper(' | ||
+ | </ | ||
- | Often the engine will get the HTML from static files: | + | Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//: |
- | | + | <code php> |
+ | $order_fields | ||
+ | | ||
+ | ' | ||
+ | ' | ||
+ | ]; | ||
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | $order_id = array_search(($_GET[' |
- | | + | $sql = ' |
- | < | + | </code> |
- | < | + | |
- | Where the variables are supplied separately, in this example | + | Undefined number of parameters; for example |
- | $values = [ | + | <code php> |
- | '// | + | function |
- | | + | $sql = '?'; |
- | ' | + | |
- | ' | + | $sql .= ',?'; |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | + | ||
- | Being sure the HTML does not contain unsafe variables, the templating engine can accept and apply the supplied variables for the relevant context, for example: | + | |
- | + | ||
- | | + | |
- | | + | |
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | | + | |
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == 'class' | + | |
- | if (in_array($value, [' | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | | + | |
} | } | ||
+ | return $sql; | ||
+ | } | ||
+ | $sql = 'WHERE id IN (' . where_in_sql(count($ids)) . ' | ||
+ | </ | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | Not sure | + | None |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8? | + | PHP 8.1? |
===== RFC Impact ===== | ===== RFC Impact ===== | ||
Line 352: | Line 188: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
- | - Would this cause performance issues? | + | On [[https://github.com/ |
- | - Can // | + | |
- | - Should the function be named // | + | |
- | | + | |
- | ===== Alternatives ===== | + | |
- | + | - Would this cause performance issues? A [[https://github.com/craigfrancis/ | |
- | | + | - Systems/ |
- | - Using static analysis (not runtime), for example | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
Line 368: | Line 200: | ||
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by [[https:// |
+ | |||
+ | **Phase 2** could introduce a way for programmers | ||
+ | |||
+ | For example, | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), there could be a // | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | Not sure | + | N/A |
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
- | N/A | + | Joe Watkins has [[https:// |
+ | |||
+ | Dan Ackroyd also [[https:// | ||
===== References ===== | ===== References ===== | ||
- | - https:// | + | N/A |
===== Rejected Features ===== | ===== Rejected Features ===== | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for surprising me with the first implementation, | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag, and creating the implementation that supports string concat. | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the [[https:// | ||
+ | - **MarkR, **for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis