rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2021/02/12 18:42] – Add notes about the implementation from Danack, GC Flags, and Future Scope craigfrancis | rfc:is_literal [2021/05/02 16:38] – Add performance stats, and notes from Dan craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.6 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 11: | Line 11: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | A new function, |
- | As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer. | + | This takes the concept of "taint checking" |
- | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection | + | It does not allow a variable to be marked as untainted, and it does not allowing escaping |
- | ===== The Problem ===== | + | For example, a database library |
- | + | ||
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | |
- | + | ||
- | The vast majority of programmers should never do this (mistakes will be made). | + | |
- | + | ||
- | Unsafe values (often user supplied) **must** be kept separate (e.g. parameterised SQL), or be processed by something | + | |
- | + | ||
- | This is primarily for security reasons, but it can also cause data to be damaged (e.g. ASCII/UTF-8 issues). | + | |
- | + | ||
- | Take these mistakes, where the value has come from the user: | + | |
<code php> | <code php> | ||
- | echo "< | + | $db-> |
- | </code> | + | |
- | Flawed, and unfortunately very common, a classic XSS vulnerability. | + | $db->query(' |
- | + | ||
- | <code php> | + | |
- | echo "< | + | |
</ | </ | ||
- | Flawed because | + | By rejecting |
- | <code php> | + | This definition of an "inherently safe API" |
- | echo "<img src='" . htmlentities($url) . "' | + | |
- | </ | + | |
- | Flawed because // | + | By adding a way for libraries to check if the strings they receive came from the developer |
- | <code php> | + | ===== Why ===== |
- | echo '<a href="' | + | |
- | </ | + | |
- | Flawed because a link can include JavaScript, e.g. //$url = ' | + | The [[https://owasp.org/www-project-top-ten/|OWASP Top 10]] lists common vulnerabilities sorted by prevalence, exploitability, |
- | <code html> | + | **Injection vulnerabilities** remain at the top of the list (common prevalence, easy for attackers to detect/ |
- | < | + | |
- | var url = "<? | + | |
- | </script> | + | |
- | </ | + | |
- | Flawed | + | This is because |
- | <code php> | + | ===== Examples ===== |
- | echo '<a href="/ | + | |
- | </ | + | |
- | Flawed because | + | The [[https://www.doctrine-project.org/projects/doctrine-orm/en/current/reference/query-builder.html# |
- | + | ||
- | <code html> | + | |
- | < | + | |
- | </code> | + | |
- | + | ||
- | Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. | + | |
- | + | ||
- | Also flawed because some browsers (e.g. IE 11), if the charset isn't defined (header or meta tag), could guess the output as UTF-7, e.g. //$url = ' | + | |
<code php> | <code php> | ||
- | example.html: | + | // INSECURE |
- | <img src={{ url }} alt='' | + | $qb-> |
- | + | ->from('User', 'u') | |
- | $loader = new \Twig\Loader\FilesystemLoader('./ | + | |
- | $twig = new \Twig\Environment($loader, ['autoescape' | + | |
- | + | ||
- | echo $twig->render('example.html', ['url' | + | |
</ | </ | ||
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ onerror=alert(1)'// | + | The definition of the //where()// method could check with //is_literal()// and throw an exception, advising the programmer to replace it with a safer use of placeholders: |
<code php> | <code php> | ||
- | $sql = 'SELECT 1 FROM user WHERE id=' | + | $qb-> |
+ | | ||
+ | | ||
+ | ->setParameter(' | ||
</ | </ | ||
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | Similarly, Twig allows [[https:// |
<code php> | <code php> | ||
- | $sql = 'SELECT 1 FROM user WHERE id="' . $mysqli->escape_string($id) . '"' | + | // INSECURE |
+ | echo $twig-> | ||
</ | </ | ||
- | Flawed if ' | + | If //createTemplate()// checked with //is_literal()//, the programmer could be advised to write this instead: |
<code php> | <code php> | ||
- | $sql = ' | + | echo $twig-> |
</ | </ | ||
- | Flawed if 'SET NAMES latin1' | + | ===== Failed Solutions ===== |
- | <code php> | + | ==== Education ==== |
- | $parameters | + | |
- | // $parameters = ' | + | Developer training has not worked, it simply does not scale (people start programming every day), and learning about every single issue is difficult. |
- | mail('a@example.com' | + | Keeping in mind that programmers will frequently do just enough to complete their task (busy), where they often copy/ |
- | </ | + | |
- | Flawed because it's not possible | + | We cannot keep saying they 'need to be careful' |
- | ===== Previous Solutions ===== | + | ==== Escaping |
- | [[https:// | + | Escaping is hard, and error prone. |
- | [[https://wiki.php.net/rfc/sql_injection_protection|Automatic SQL Injection Protection]] by Matt Tait, where it was noted: | + | We have a list of common |
- | * " | + | Developers should |
- | * this amount of work isn't ideal for "just for one use case" | + | |
- | * It would have effected every SQL function, such as // | + | |
- | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | + | |
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | ==== Taint Checking ==== |
- | ===== Related Go Implementation ===== | + | Some languages implement a "taint flag" which tracks whether values are considered " |
- | As discussed by [[https://www.usenix.org/ | + | There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[https://wiki.php.net/rfc/taint|a previous RFC proposing it be added to the language]]. |
- | The Go language can do this by checking | + | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. |
- | Google currently | + | This proposal |
- | ===== Related JavaScript Implementation ===== | + | ==== Static Analysis |
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, | + | While I agree with [[https:// |
- | https:// | + | But they nearly always focus on other issues (type checking, basic logic flaws, code formatting, etc). |
- | https:// | + | |
- | They are looking at " | + | Those that attempt to address injection vulnerabilities, |
- | ===== Solution ===== | + | For a quick example, psalm, even in its strictest errorLevel (1), and/or running // |
- | + | ||
- | Literals are safe values, defined within | + | |
<code php> | <code php> | ||
- | $a = ' | + | $db = new mysqli(' |
- | is_literal($a); // true | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = 'Example ' . $_GET[' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = sprintf(' | + | |
- | is_literal($a); // false | + | |
- | $c = count($ids); | + | $id = (string) ($_GET[' |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); // true, the odd one that involves functions. | + | |
- | $limit = 10; | + | $db-> |
- | $a = ' | + | |
- | is_literal($a); // false, but might need some discussion. | + | |
</ | </ | ||
- | This uses a similar definition | + | When psalm comes to taint checking the usage of a library (like Doctrine), it assumes all methods are safe, because none of them note the sinks (and even if they did, you're back to escaping being an issue). |
- | Thanks to [[https:// | + | But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming (usage tends to be higher by those writing well tested libraries). |
- | As an aside, [[https:// | + | ===== Proposal ===== |
- | Commands can be checked to ensure they are a " | + | This RFC proposes adding three functions: |
- | This approach allows all systems/frameworks | + | * // |
+ | | ||
+ | * //literal_implode(string $glue, array $pieces): string// to implode an array of literals, with a literal. | ||
- | Unlike | + | A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way. |
- | ==== Solution: SQL Injection ==== | + | <code php> |
+ | is_literal(' | ||
- | Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. | + | $a = ' |
+ | $b = ' | ||
- | [[https://www.doctrine-project.org/projects/doctrine-orm/ | + | is_literal($a); |
+ | is_literal($a | ||
- | <code php> | + | is_literal(literal_combine($a, $b)); // true |
- | $users = $queryBuilder | + | |
- | ->select(' | + | |
- | ->from(' | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | // example.php? | + | is_literal($_GET[' |
+ | is_literal(' | ||
+ | is_literal(rand(0, | ||
+ | is_literal(sprintf(' | ||
</ | </ | ||
- | This mistake can be easily identified by: | + | There is no way to manually mark a string as a literal (i.e. no equivalent to // |
- | <code php> | + | *Technical detail: Strings that are concatenated in place at compile time are treated as a literal.* |
- | public function where($predicates) | + | |
- | { | + | |
- | if (function_exists(' | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | | + | |
- | } | + | |
- | </ | + | |
- | [[https:// | + | ===== Previous Work ===== |
- | <code php> | + | Google uses " |
- | $users = R:: | + | |
- | </code> | + | |
- | [[http://propelorm.org/Propel/ | + | Google also uses [[https://errorprone.info/|Error Prone]] in Java to augment the compiler' |
- | <code php> | + | Perl has a [[https:// |
- | $users = UserQuery::create()->where('id = ' . $_GET[' | + | |
- | </ | + | |
- | The // | + | [[https://github.com/tc39/proposal-array-is-template-object|JavaScript might get isTemplateObject]] to " |
- | ==== Solution: SQL Injection, Basic ==== | + | As noted above, there is the [[https:// |
- | A simple example: | + | And there is the [[https:// |
- | < | + | * " |
- | $sql = ' | + | * this amount of work isn't ideal for "just for one use case" ([[https:// |
+ | * It would have effected every SQL function, such as // | ||
+ | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | $result = $db-> | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | </code> | + | |
- | Checked in the framework by: | + | ===== Usage ===== |
+ | |||
+ | By libraries: | ||
<code php> | <code php> | ||
class db { | class db { | ||
- | + | protected $level = 2; // Probably should default to 1 at first. | |
- | | + | function |
- | + | if (function_exists(' | |
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | // Programmer aware, and is choosing to bypass this check. |
+ | } else if ($this-> | ||
+ | trigger_error(' | ||
+ | } else { | ||
+ | | ||
+ | } | ||
} | } | ||
- | |||
- | $statement = $this-> | ||
- | $statement-> | ||
- | return $statement-> | ||
- | |||
} | } | ||
+ | function unsafe_disable_injection_protection() { | ||
+ | $this-> | ||
+ | } | ||
+ | function where($sql, $parameters = []) { | ||
+ | $this-> | ||
+ | // ... | ||
+ | } | ||
} | } | ||
- | </ | ||
- | This also works with string concatenation: | + | $db->where('id = ?'); // OK |
- | + | $db-> | |
- | <code php> | + | |
- | define('TABLE', | + | |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); // Returns true | + | |
- | + | ||
- | $sql .= ' | + | |
- | + | ||
- | is_literal($sql); // Returns false | + | |
</ | </ | ||
- | ==== Solution: | + | Table and Fields in SQL, which cannot use parameters; for example |
- | + | ||
- | To ensure | + | |
<code php> | <code php> | ||
Line 286: | Line 216: | ||
$order_id = array_search(($_GET[' | $order_id = array_search(($_GET[' | ||
- | $sql = ' ORDER BY ' | + | $sql = literal_combine(' ORDER BY ', $order_fields[$order_id]); |
</ | </ | ||
- | ==== Solution: SQL Injection, WHERE IN ==== | + | Undefined number |
- | + | ||
- | Most SQL strings can be a simple concatenations | + | |
- | + | ||
- | There needs to be a special case for // | + | |
<code php> | <code php> | ||
- | $in_sql | + | function where_in_sql($count) { // Should check for 0 |
- | + | $sql = []; | |
- | $sql = 'SELECT * FROM table WHERE id IN (' | + | for ($k = 0; $k < $count; $k++) { |
+ | $sql[] = '?' | ||
+ | } | ||
+ | return literal_implode(',', | ||
+ | } | ||
+ | $sql = literal_combine('WHERE id IN (', where_in_sql(count($ids)), | ||
</ | </ | ||
- | ==== Solution: CLI Injection | + | ===== Considerations ===== |
- | Rather than using functions such as: | + | ==== Naming ==== |
- | * //exec()// | + | Literal string is the standard name for strings in source code. See https://www.google.com/search? |
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | Frameworks (or PHP) could introduce something similar to // | + | > A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax. ... The heredoc preserves the line breaks and other whitespace |
- | Or, take a safe literal for the command, and use parameters for the arguments | + | Alternatives suggestions have included // |
- | <code php> | + | ==== Supporting Int/Float/Boolean values. ==== |
- | $output | + | |
- | ' | + | |
- | ]); | + | |
- | </ | + | |
- | Rough implementation: | + | When converting to string, they aren't guaranteed (and often don't) have the exact same value they have in source code. |
- | <code php> | + | For example, //TRUE// and //true// when cast to string give " |
- | function parameterised_exec($cmd, $args = []) { | + | |
- | if (!is_literal($cmd)) { | + | It's also a very low value feature, where there might not be space for a flag to be added. |
- | throw new Exception('The first argument must be a literal' | + | |
- | } | + | |
- | $offset | + | ==== Supporting Concatenation |
- | $k = 0; | + | |
- | while (($pos | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | return exec($cmd); | + | Unfortunately early testing suggests there will be too much of a performance impact, and is why // |
- | } | + | It isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string. |
- | </ | + | |
- | ==== Solution: HTML Injection ==== | + | It was considered because it would have made it easier for existing projects currently using string concatenation to adopt. |
- | Template engines should receive variables separately from the raw HTML. | + | Joe Watkins has created a version that does support string concatenation, |
- | Often the engine will get the HTML from static files (safe): | + | Máté Kocsis did the [[https:// |
- | < | + | In my own [[https:// |
- | $html = file_get_contents(' | + | |
- | </code> | + | |
- | But small snippets | + | Dan Ackroyd also notes that the use of // |
<code php> | <code php> | ||
- | $template_html | + | $sortOrder |
- | < | + | |
- | < | + | |
- | </ | + | |
- | Where the variables are supplied separately, in this example I'm using XPath: | + | // 300 lines of code, or multiple function calls |
- | <code php> | + | $sql .= ' |
- | $values | + | |
- | | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | echo template_parse($template_html, $values); | + | // 300 lines of code, or multiple function calls |
+ | |||
+ | $db-> | ||
</ | </ | ||
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | If a developer changed the literal //' |
- | + | ||
- | As a simple example, this can be done with: | + | |
<code php> | <code php> | ||
- | function template_parse($html, $values) { | + | $sql = literal_combine($sql, ' ORDER BY name ', $sortOrder); |
+ | </ | ||
- | if (!is_literal($html)) { | + | ==== Performance ==== |
- | throw new Exception(' | + | |
- | } | + | |
- | $dom = new DomDocument(); | + | The proposed implementation has: |
- | $dom-> | + | |
- | $xpath = new DOMXPath($dom); | + | [TODO] |
- | foreach ($values as $query => $attributes) { | + | Also, see the section above, where string concatenation support would have introduced a ~1% performance hit. |
- | if (!is_literal($query)) { | + | ==== Values from INI/ |
- | throw new Exception(' | + | |
- | } | + | |
- | foreach ($xpath->query($query) as $element) { | + | As noted by [[https:// |
- | foreach | + | |
- | if (!is_literal($attribute)) { | + | ==== Existing String Functions ==== |
- | throw new Exception(' | + | |
- | } | + | |
- | | + | Trying to determine |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match(' | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, [' | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match(' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | } | + | For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate query builder, or by using // |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | + | ||
- | } | + | |
- | </code> | + | |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | None | + | No known BC breaks, except for code-bases that already contain userland functions // |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8.1? | + | PHP 8.1 |
===== RFC Impact ===== | ===== RFC Impact ===== | ||
Line 467: | Line 314: | ||
==== To SAPIs ==== | ==== To SAPIs ==== | ||
- | Not sure | + | None known |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
Line 479: | Line 326: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
- | On [[https:// | + | None |
- | + | ||
- | - Would this cause performance issues? Presumably not as bad a type checking. | + | |
- | - Can // | + | |
- | - Should the function be named // | + | |
- | - Systems/ | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
- | Not sure | + | None known |
===== Future Scope ===== | ===== Future Scope ===== | ||
- | As noted by [[https:// | + | As noted by MarkR, the biggest |
- | This check could be used to throw an exception, or generate an error/ | + | **Phase 2** could introduce |
- | PHP could also have a mode where output | + | For example, |
- | And maybe there could be a [[https:// | + | **Phase 3** could set a default of 'only literals' |
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), MarkR would like a // | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | N/A | + | Accept the RFC. Yes/No |
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 515: | Line 354: | ||
===== Implementation ===== | ===== Implementation ===== | ||
- | [[https:// | + | Dan Ackroyd has [[https:// |
+ | |||
+ | Joe Watkins has [[https:// | ||
+ | |||
+ | ===== References ===== | ||
+ | |||
+ | N/A | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat. | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that does support string concat. | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the " | ||
+ | - **Mark Randall**, MarkR, for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis