rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/03/21 17:38] – created craigfrancis | rfc:is_literal [2021/05/04 05:29] – craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | https:// | ||
- | |||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.1 | + | * Version: 0.6 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
+ | * Updated: 2021-04-30 | ||
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
* First Published at: https:// | * First Published at: https:// | ||
+ | * GitHub Repo: https:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | A new function, |
- | This allows developers/ | + | This takes the concept |
- | It allows commands | + | It does not allow a variable |
- | This will also allow systems/ | + | For example, take a database library that supports parametrised queries at the driver level, today a programmer could use either of these: |
- | ===== Related JavaScript Implementation ===== | + | <code php> |
+ | $db-> | ||
- | This proposal is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | $db-> |
+ | </ | ||
- | https:// | + | If the library only accepted a literal SQL string (written by the programmer), |
- | https:// | + | |
- | They are looking at "Distinguishing strings | + | This definition of an "inherently safe API" comes from Christoph Kern, who did a talk in 2016 about [[https:// |
- | ===== Taint Checking ===== | + | By adding a way for libraries to check if the strings they receive came from the developer (from trusted PHP source code), it allows the library to check they are being used in a safe way. |
- | Xinchen Hui has done some amazing work with the Taint extension: | + | ===== Why ===== |
- | https://github.com/laruence/taint | + | The [[https://owasp.org/www-project-top-ten/|OWASP Top 10]] lists common vulnerabilities sorted by prevalence, exploitability, |
- | Unfortunately this approach does not address all issues, mainly because it still allows string escaping, which is only " | + | **A1: Injection** - common prevalence (2), easy for attackers to detect/exploit |
- | $sql = ' | + | **A7: XSS** - widespread prevalence |
- | + | ||
- | | + | |
- | + | ||
- | // DELETE FROM table WHERE id = id | + | |
- | $html = '< | + | And these two have always been listed: 2003 (A6/A4), 2004 (A6/A4), 2007 (A2/A1), 2010 (A1/A2), 2013 (A1/A3), 2017 (A1/A7). |
- | + | ||
- | | + | |
- | + | ||
- | // <img src=x onerror=alert(cookie) /> | + | |
- | The Taint extension also [[https:// | + | It's because these mistakes are very easy to make, and hard to identify - is_literal() directly addresses this problem. |
- | ===== Previous RFC ===== | + | ===== Examples |
- | Matt Tait suggested | + | The [[https://www.doctrine-project.org/projects/doctrine-orm/ |
- | It was noted that " | + | <code php> |
+ | // INSECURE | ||
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | </ | ||
- | Where it would have effected every SQL function, such as //mysqli_query()//, //$pdo-> | + | The definition of the //where()// method could check with //is_literal()// and throw an exception, advising the programmer to replace it with a safer use of placeholders: |
- | And each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" | + | <code php> |
+ | $qb-> | ||
+ | ->from(' | ||
+ | | ||
+ | -> | ||
+ | </ | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | Similarly, Twig allows |
- | ===== Proposal ===== | + | <code php> |
+ | // INSECURE | ||
+ | echo $twig-> | ||
+ | </ | ||
- | Add an //is_literal()// function to check if a given variable has only been created by Literal(s). | + | If //createTemplate()// checked with // |
- | This uses a similar definition as the [[https:// | + | < |
+ | echo $twig-> | ||
+ | </ | ||
- | Thanks to [[https:// | + | ===== Failed Solutions ===== |
- | Unlike the Taint extension, there is no need to provide an equivalent // | + | ==== Education ==== |
- | ===== Examples ===== | + | Developer training has not worked, it simply does not scale (people start programming every day), and learning about every single issue is difficult. |
- | ==== SQL Injection, Basic ==== | + | Keeping in mind that programmers will frequently do just enough to complete their task (busy), where they often copy/paste a solution to their problem they find online (risky), modify it for their needs (risky), then move on. |
- | A simple example: | + | We cannot keep saying they 'need to be careful', |
- | $sql = ' | + | ==== Escaping ==== |
- | + | ||
- | $result | + | |
- | Checked in the framework by: | + | Escaping is hard, and error prone. |
- | class db { | + | We have a list of common [[https:// |
- | + | ||
- | public function exec($sql, $parameters = []) { | + | |
- | + | ||
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $statement = $this->pdo-> | + | |
- | $statement-> | + | |
- | return $statement-> | + | |
- | + | ||
- | } | + | |
- | + | ||
- | } | + | |
- | It will also work with string concatenation: | + | Developers should use parameterised queries (e.g. SQL), or a well tested library that knows how to escape values based on their context (e.g. HTML). |
- | define(' | + | ==== Taint Checking ==== |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . mysqli_real_escape_string($db, | + | |
- | + | ||
- | is_literal($sql); | + | |
- | ==== SQL Injection, ORDER BY ==== | + | Some languages implement a "taint flag" which tracks whether values are considered " |
- | To ensure | + | There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[https:// |
- | $order_fields = [ | + | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. |
- | ' | + | |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | $order_id = array_search(($_GET[' | + | |
- | + | ||
- | $sql = ' ORDER BY ' | + | |
- | ==== SQL Injection, WHERE IN ==== | + | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. |
- | Most SQL strings can be a concatenations of literal values, but //WHERE x IN (?,?,?)// need to use a variable number of literal placeholders. | + | ==== Static Analysis ==== |
- | So there //might// need to be a special case for // | + | While I agree with [[https://news-web.php.net/php.internals/109192|Tyson Andre]], it is highly recommended |
- | $in_sql = implode(',', array_fill(0, | + | But they nearly always focus on other issues |
- | + | ||
- | // or | + | |
- | + | ||
- | $in_sql = substr(str_repeat('?,', | + | |
- | To be used with: | + | Those that attempt to address injection vulnerabilities, |
- | $sql = ' | + | For a quick example, psalm, even in its strictest errorLevel |
- | ==== SQL Injection, ORM Usage ==== | + | <code php> |
+ | $db = new mysqli(' | ||
- | [[https://www.doctrine-project.org/ | + | $id = (string) ($_GET[' |
- | | + | $db->prepare('SELECT * FROM users WHERE id = ' . $db->real_escape_string($id)); |
- | | + | </code> |
- | -> | + | |
- | -> | + | |
- | | + | |
- | -> | + | |
- | + | ||
- | | + | |
- | Where this mistake could be identified by: | + | When psalm comes to taint checking the usage of a library (like Doctrine), it assumes all methods are safe, because none of them note the sinks (and even if they did, you're back to escaping being an issue). |
- | public function where($predicates) | + | But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming |
- | { | + | |
- | if (!is_literal($predicates)) { | + | |
- | throw new Exception('Can only accept a literal' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | [[https:// | + | ===== Proposal ===== |
- | $users = R:: | + | This RFC proposes adding four functions: |
- | [[http://propelorm.org/Propel/reference/model-criteria.html# | + | * // |
+ | * //literal_implode(string $glue, array $pieces): string// - implode an array of literals, with a literal. | ||
+ | * //literal_combine(string | ||
+ | * // | ||
- | $users = UserQuery:: | + | A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way. |
- | ==== SQL Injection, ORM Internal ==== | + | <code php> |
+ | is_literal(' | ||
- | The // | + | $a = ' |
+ | $b = ' | ||
- | This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https://framework.zend.com/ | + | is_literal($a); |
+ | is_literal($a | ||
- | ==== CLI Injection ==== | + | $c = literal_combine($a, |
+ | is_literal($c); | ||
- | Rather than using functions such as: | + | is_literal($_GET[' |
+ | is_literal(' | ||
+ | is_literal(rand(0, | ||
+ | is_literal(sprintf(' | ||
+ | </ | ||
- | * //exec()// | + | There is no way to manually mark a string as a literal |
- | * // | + | |
- | * //system()// | + | |
- | * // | + | |
- | Frameworks (or PHP) could introduce something similar to // | + | ===== Previous Work ===== |
- | Or, take a verified literal for the command, and use parameters for the arguments (like SQL): | + | Google uses " |
- | $output = parameterised_exec(' | + | Google also uses [[https://errorprone.info/|Error Prone]] in Java to augment the compiler's type analysis, where [[https:// |
- | ' | + | |
- | | + | |
- | Rough implementation: | + | Perl has a [[https:// |
- | function parameterised_exec($cmd, $args = []) { | + | [[https:// |
- | + | ||
- | if (!is_literal($cmd)) { | + | As noted above, there is the [[https:// |
- | throw new Exception('The first argument must be a literal' | + | |
- | } | + | And there is the [[https:// |
- | + | ||
- | $offset | + | * " |
- | $k = 0; | + | |
- | | + | * It would have effected every SQL function, such as // |
- | if (!isset($args[$k])) { | + | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// |
- | | + | |
- | | + | I also agree that "SQL injection is almost |
+ | |||
+ | ===== Usage ===== | ||
+ | |||
+ | By libraries: | ||
+ | |||
+ | <code php> | ||
+ | class db { | ||
+ | protected | ||
+ | | ||
+ | | ||
+ | if ($this-> | ||
+ | | ||
+ | } else if ($this-> | ||
+ | trigger_error('Non-literal detected!', E_USER_WARNING); | ||
+ | } else { | ||
+ | | ||
} | } | ||
- | $arg = escapeshellarg($args[$k]); | ||
- | $cmd = substr($cmd, | ||
- | $offset = ($pos + strlen($arg)); | ||
- | $k++; | ||
} | } | ||
- | if (isset($args[$k])) { | ||
- | throw new Exception(' | ||
- | exit(); | ||
- | } | ||
- | | ||
- | return exec($cmd); | ||
- | | ||
} | } | ||
+ | function unsafe_disable_injection_protection() { | ||
+ | $this-> | ||
+ | } | ||
+ | function where($sql, $parameters = []) { | ||
+ | $this-> | ||
+ | // ... | ||
+ | } | ||
+ | } | ||
- | ==== HTML Injection ==== | + | $db-> |
+ | $db-> | ||
+ | </ | ||
- | Template engines should receive variables separately from the raw HTML. | + | Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//: |
- | Often the engine will get the HTML from static files: | + | <code php> |
+ | $order_fields = [ | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ]; | ||
- | | + | $order_id |
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | $sql = literal_combine(' |
+ | </ | ||
- | $template_html = ' | + | Undefined number of parameters; for example |
- | < | + | |
- | < | + | |
- | Where the variables are supplied separately, in this example I'm using XPaths: | + | <code php> |
+ | function where_in_sql($count) { // Should check for 0 | ||
+ | $sql = []; | ||
+ | for ($k = 0; $k < $count; $k++) { | ||
+ | $sql[] = '?'; | ||
+ | } | ||
+ | return literal_implode(' | ||
+ | } | ||
+ | $sql = literal_combine(' | ||
+ | </ | ||
- | $values | + | ===== Considerations ===== |
- | '// | + | |
- | NULL | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | Being sure the HTML does not contain unsafe variables, the templating engine can accept and apply the supplied variables for the relevant context, for example: | + | ==== Naming ==== |
- | function template_parse($html, $values) { | + | Literal string is the standard name for strings in source code. See [[https:// |
- | + | ||
- | | + | > A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax... |
- | throw new Exception(' | + | |
- | } | + | Alternatives suggestions have included // |
- | + | ||
- | $dom = new DomDocument(); | + | ==== Supporting Int/ |
- | | + | |
- | + | When converting to string, they aren't guaranteed | |
- | | + | |
- | + | For example, //TRUE// and //true// when cast to string give "1". | |
- | | + | |
- | + | It's also a very low value feature, where there might not be space for a flag to be added. | |
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | ==== Supporting Concatenation ==== |
- | } | + | |
- | + | Máté Kocsis has done some [[https:// | |
- | | + | |
- | | + | In my own [[https:// |
- | + | ||
- | if (!is_literal($attribute)) { | + | Laravel Demo App: +0.30% with, vs +0.18% without concat. |
- | throw new Exception('Invalid Template Attribute.'); | + | |
- | } | + | My Concat Test: |
- | + | ||
- | if ($attribute) { | + | In my basic test, I used a RAM Disk, and disabled the processors Turbo Boost. With the Demo Apps, I used /// |
- | $safe = false; | + | |
- | | + | There is still a small impact without concat because the // |
- | if (preg_match(' | + | |
- | $safe = true; // Not " | + | Technically string concat isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string. But it would make adoption of is_literal() easier for existing projects that are currently using string concat for their SQL, HTML Templates, etc. |
- | } | + | |
- | } else if ($attribute == 'class') { | + | And supporting runtime concat would make the literal check easier to understand, as it would be consistent |
- | if (in_array($value, ['admin', ' | + | |
- | | + | The non-concat version would use // |
- | } | + | |
- | } else if (preg_match(' | + | <code php> |
- | if (preg_match('/^[a-z0-9 | + | $sortOrder |
- | $safe = true; | + | |
- | } | + | // 300 lines of code, or multiple function calls |
- | } | + | |
- | if ($safe) { | + | $sql .= ' |
- | $element-> | + | |
- | } | + | // 300 lines of code, or multiple function calls |
- | } else { | + | |
- | | + | $db-> |
- | } | + | </code> |
- | + | ||
- | } | + | If a developer changed the literal //'ASC'// to //$_GET[' |
- | } | + | |
- | + | <code php> | |
- | } | + | $sql = literal_combine($sql, ' ORDER BY name ', $sortOrder); |
- | + | </ | |
- | $html = ''; | + | |
- | $body = $dom-> | + | ==== Performance ==== |
- | if ($body-> | + | |
- | foreach | + | TBC |
- | $html .= $dom-> | + | |
- | } | + | See the section above. |
- | } | + | |
- | + | ==== Values from INI/ | |
- | | + | |
- | + | As noted by [[https:// | |
- | } | + | |
+ | ==== Existing String Functions ==== | ||
+ | |||
+ | Trying to determine if the // | ||
+ | |||
+ | For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate query builder, or by using // | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | Not sure | + | No known BC breaks, except for code-bases that already contain userland functions // |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8? | + | PHP 8.1 |
===== RFC Impact ===== | ===== RFC Impact ===== | ||
Line 331: | Line 320: | ||
==== To SAPIs ==== | ==== To SAPIs ==== | ||
- | Not sure | + | None known |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
Line 343: | Line 332: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
- | - Can // | + | None |
- | - Systems/ | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
- | Not sure | + | None known |
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar |
+ | |||
+ | **Phase 2** could introduce a way for programmers | ||
+ | |||
+ | For example, | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), MarkR would like a // | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | Not sure | + | Accept the RFC. Yes/No |
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
- | N/A | + | Dan Ackroyd has [[https:// |
+ | |||
+ | Joe Watkins has [[https:// | ||
===== References ===== | ===== References ===== | ||
- | - https:// | + | N/A |
===== Rejected Features ===== | ===== Rejected Features ===== | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat. | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that supports string concat. | ||
+ | - **Máté Kocsis**, mate-kocsis, | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the " | ||
+ | - **Mark Randall**, MarkR, for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis