rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/03/21 17:38] – Remove edit link craigfrancis | rfc:is_literal [2021/05/03 19:38] – Don't write off the concat version (inc more perf stats) craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.1 | + | * Version: 0.6 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
+ | * Updated: 2021-04-30 | ||
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
* First Published at: https:// | * First Published at: https:// | ||
+ | * GitHub Repo: https:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | A new function, |
- | This allows developers/ | + | This takes the concept |
- | It allows commands | + | It does not allow a variable |
- | This will also allow systems/ | + | For example, take a database library that supports parametrised queries at the driver level, today a programmer could use either of these: |
- | ===== Related JavaScript Implementation ===== | + | <code php> |
+ | $db-> | ||
- | This proposal is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | $db-> |
+ | </ | ||
- | https:// | + | If the library only accepted a literal SQL string (written by the programmer), |
- | https:// | + | |
- | They are looking at "Distinguishing strings | + | This definition of an "inherently safe API" comes from Christoph Kern, who did a talk in 2016 about [[https:// |
- | ===== Taint Checking ===== | + | By adding a way for libraries to check if the strings they receive came from the developer (from trusted PHP source code), it allows the library to check they are being used in a safe way. |
- | Xinchen Hui has done some amazing work with the Taint extension: | + | ===== Why ===== |
- | https://github.com/laruence/taint | + | The [[https://owasp.org/www-project-top-ten/|OWASP Top 10]] lists common vulnerabilities sorted by prevalence, exploitability, |
- | Unfortunately this approach does not address all issues, mainly because it still allows string escaping, which is only " | + | **A1: Injection** - common prevalence (2), easy for attackers to detect/exploit |
- | $sql = ' | + | **A7: XSS** - widespread prevalence |
- | + | ||
- | | + | |
- | + | ||
- | // DELETE FROM table WHERE id = id | + | |
- | $html = '< | + | And these two have always been listed: 2003 (A6/A4), 2004 (A6/A4), 2007 (A2/A1), 2010 (A1/A2), 2013 (A1/A3), 2017 (A1/A7). |
- | + | ||
- | | + | |
- | + | ||
- | // <img src=x onerror=alert(cookie) /> | + | |
- | The Taint extension also [[https:// | + | It's because these mistakes are very easy to make, and hard to identify - is_literal() directly addresses this problem. |
- | ===== Previous RFC ===== | + | ===== Examples |
- | Matt Tait suggested | + | The [[https://www.doctrine-project.org/projects/doctrine-orm/ |
- | It was noted that " | + | <code php> |
+ | // INSECURE | ||
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | </ | ||
- | Where it would have effected every SQL function, such as //mysqli_query()//, //$pdo-> | + | The definition of the //where()// method could check with //is_literal()// and throw an exception, advising the programmer to replace it with a safer use of placeholders: |
- | And each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" | + | <code php> |
+ | $qb-> | ||
+ | ->from(' | ||
+ | | ||
+ | -> | ||
+ | </ | ||
- | I also agree that "SQL injection | + | Similarly, Twig allows [[https:// |
+ | |||
+ | <code php> | ||
+ | // INSECURE | ||
+ | echo $twig-> | ||
+ | </ | ||
+ | |||
+ | If // | ||
+ | |||
+ | <code php> | ||
+ | echo $twig-> | ||
+ | </ | ||
+ | |||
+ | ===== Failed Solutions ===== | ||
+ | |||
+ | ==== Education ==== | ||
+ | |||
+ | Developer training has not worked, it simply does not scale (people start programming every day), and learning about every single issue is difficult. | ||
+ | |||
+ | Keeping in mind that programmers will frequently do just enough to complete their task (busy), where they often copy/paste a solution to their problem they find online (risky), modify it for their needs (risky), then move on. | ||
+ | |||
+ | We cannot keep saying they 'need to be careful', | ||
+ | |||
+ | ==== Escaping ==== | ||
+ | |||
+ | Escaping is hard, and error prone. | ||
+ | |||
+ | We have a list of common [[https:// | ||
+ | |||
+ | Developers should use parameterised queries (e.g. SQL), or a well tested library that knows how to escape values based on their context (e.g. HTML). | ||
+ | |||
+ | ==== Taint Checking ==== | ||
+ | |||
+ | Some languages implement a "taint flag" which tracks whether values are considered " | ||
+ | |||
+ | There is a [[https:// | ||
+ | |||
+ | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. | ||
+ | |||
+ | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. | ||
+ | |||
+ | ==== Static Analysis ==== | ||
+ | |||
+ | While I agree with [[https:// | ||
+ | |||
+ | But they nearly always focus on other issues (type checking, basic logic flaws, code formatting, etc). | ||
+ | |||
+ | Those that attempt to address injection vulnerabilities, do so via Taint Checking (see above), and are [[https:// | ||
+ | |||
+ | For a quick example, psalm, even in its strictest errorLevel (1), and/or running // | ||
+ | |||
+ | <code php> | ||
+ | $db = new mysqli(' | ||
+ | |||
+ | $id = (string) ($_GET[' | ||
+ | |||
+ | $db-> | ||
+ | </ | ||
+ | |||
+ | When psalm comes to taint checking the usage of a library (like Doctrine), it assumes all methods are safe, because none of them note the sinks (and even if they did, you're back to escaping being an issue). | ||
+ | |||
+ | But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming (usage tends to be higher by those writing well tested libraries). | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | Add an // | + | This RFC proposes adding four functions: |
- | This uses a similar definition as the [[https://wiki.php.net/rfc/sql_injection_protection# | + | * // |
+ | * // | ||
+ | * //literal_combine(string $piece, string ...$pieces): string// - allow concatenating literal strings. | ||
+ | * // | ||
- | Thanks to [[https:// | + | A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way. |
- | Unlike the Taint extension, there is no need to provide an equivalent //untaint()// function. | + | <code php> |
+ | is_literal(' | ||
- | ===== Examples ===== | + | $a = ' |
+ | $b = ' | ||
- | ==== SQL Injection, Basic ==== | + | is_literal($a); |
+ | is_literal($a . $b); // TBC, details below. | ||
- | A simple example: | + | $c = literal_combine($a, |
+ | is_literal($c); | ||
- | | + | is_literal($_GET[' |
- | + | is_literal(' | |
- | $result | + | is_literal(rand(0, |
+ | is_literal(sprintf(' | ||
+ | </ | ||
- | Checked in the framework by: | + | There is no way to manually mark a string as a literal (i.e. no equivalent to // |
- | class db { | + | ===== Previous Work ===== |
- | + | ||
- | | + | Google uses " |
- | + | ||
- | if (!is_literal($sql)) { | + | Google also uses [[https:// |
- | throw new Exception(' | + | |
+ | Perl has a [[https:// | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | As noted above, there is the [[https:// | ||
+ | |||
+ | And there is the [[https:// | ||
+ | |||
+ | | ||
+ | * this amount of work isn't ideal for "just for one use case" ([[https:// | ||
+ | * It would have effected every SQL function, such as // | ||
+ | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
+ | |||
+ | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | ||
+ | |||
+ | ===== Usage ===== | ||
+ | |||
+ | By libraries: | ||
+ | |||
+ | <code php> | ||
+ | class db { | ||
+ | | ||
+ | | ||
+ | | ||
+ | if ($this-> | ||
+ | // Programmer aware, and is choosing to bypass this check. | ||
+ | } else if ($this-> | ||
+ | trigger_error(' | ||
+ | } else { | ||
+ | throw new Exception(' | ||
} | } | ||
- | | ||
- | $statement = $this-> | ||
- | $statement-> | ||
- | return $statement-> | ||
- | | ||
} | } | ||
- | | ||
} | } | ||
+ | function unsafe_disable_injection_protection() { | ||
+ | $this-> | ||
+ | } | ||
+ | function where($sql, $parameters = []) { | ||
+ | $this-> | ||
+ | // ... | ||
+ | } | ||
+ | } | ||
- | It will also work with string concatenation: | + | $db-> |
+ | $db-> | ||
+ | </ | ||
- | define(' | + | Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//: |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . mysqli_real_escape_string($db, | + | |
- | + | ||
- | is_literal($sql); | + | |
- | ==== SQL Injection, ORDER BY ==== | + | <code php> |
+ | $order_fields | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ]; | ||
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | $order_id = array_search(($_GET[' |
- | | + | $sql = literal_combine(' ORDER BY ', $order_fields[$order_id]); |
- | ' | + | </ |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | $order_id | + | |
- | + | ||
- | $sql = ' ORDER BY ' | + | |
- | ==== SQL Injection, | + | Undefined number of parameters; for example //WHERE IN//: |
- | Most SQL strings can be a concatenations of literal values, but // | + | <code php> |
+ | function where_in_sql($count) { // Should check for 0 | ||
+ | $sql = []; | ||
+ | for ($k = 0; $k < $count; $k++) { | ||
+ | $sql[] = '?'; | ||
+ | } | ||
+ | return literal_implode(',', | ||
+ | } | ||
+ | $sql = literal_combine(' | ||
+ | </code> | ||
- | So there //might// need to be a special case for // | + | ===== Considerations ===== |
- | $in_sql | + | ==== Naming ==== |
- | + | ||
- | // or | + | |
- | + | ||
- | $in_sql | + | |
- | To be used with: | + | Literal string is the standard name for strings in source code. See [[https:// |
- | $sql = ' | + | > A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax... |
- | ==== SQL Injection, ORM Usage ==== | + | Alternatives suggestions have included // |
- | [[https://www.doctrine-project.org/ | + | ==== Supporting Int/Float/Boolean values. ==== |
- | $users = $queryBuilder | + | When converting to string, they aren't guaranteed |
- | -> | + | |
- | -> | + | |
- | ->where('u.id = ' . $_GET[' | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | // example.php? | + | |
- | Where this mistake could be identified by: | + | For example, //TRUE// and //true// when cast to string give " |
- | public function where($predicates) | + | It's also a very low value feature, where there might not be space for a flag to be added. |
- | { | + | |
- | if (!is_literal($predicates)) { | + | |
- | throw new Exception('Can only accept | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | [[https:// | + | ==== Supporting Concatenation ==== |
- | $users = R:: | + | This is the big question. |
- | [[http://propelorm.org/Propel/reference/model-criteria.html# | + | Máté Kocsis has done some [[https://github.com/craigfrancis/php-is-literal-rfc/ |
- | $users = UserQuery::create()->where('id = ' | + | In my own [[https:// |
- | ==== SQL Injection, ORM Internal ==== | + | Laravel Demo App: +0.30% with, vs +0.18% without concat. |
+ | Symfony Demo App: +0.06% with, vs +0.06% without concat. | ||
+ | My Concat Test: | ||
- | The //is_literal()// function could be used by ORM developers, so they can be sure they have created an SQL string out of literals. | + | In my basic test, I used a RAM Disk, and disabled the processors Turbo Boost. With the Demo Apps, I used /// |
- | This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https://framework.zend.com/security/advisory/ZF2014-04|1]]/[[https://framework.zend.com/ | + | There is still a small impact without concat because |
- | ==== CLI Injection ==== | + | Technically string concat isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string. But it would make adoption of is_literal() easier for existing projects that are currently using string concat for their SQL, HTML Templates, etc. |
- | Rather than using functions such as: | + | And supporting runtime concat would make the literal check easier to understand, |
- | * //exec()// | + | The non-concat version would use //literal_combine()// or //literal_implode()// as special functions to avoid most of the work during runtime contact. Where Dan Ackroyd notes that these functions would make it easier to identify exactly where mistakes are made, rather than it being picked up at the end of a potentially long script, after multiple string concatenations, |
- | * //shell_exec()// | + | |
- | * // | + | |
- | * //passthru()// | + | |
- | Frameworks (or PHP) could introduce something similar to // | + | <code php> |
+ | $sortOrder = ' | ||
- | Or, take a verified literal for the command, and use parameters for the arguments (like SQL): | + | // 300 lines of code, or multiple function calls |
- | | + | $sql .= ' |
- | ' | + | |
- | ]); | + | |
- | Rough implementation: | + | // 300 lines of code, or multiple function calls |
- | function parameterised_exec($cmd, $args = []) { | + | $db-> |
- | + | </ | |
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset = 0; | + | |
- | $k = 0; | + | |
- | while (($pos = strpos($cmd, | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | ==== HTML Injection ==== | + | If a developer changed the literal //' |
- | Template engines should receive variables separately from the raw HTML. | + | <code php> |
+ | $sql = literal_combine($sql, | ||
+ | </ | ||
- | Often the engine will get the HTML from static files: | + | ==== Performance ==== |
- | $html = file_get_contents('/ | + | TBC |
- | But small snippets of HTML are often easier to define as a literal within | + | See the section above. |
- | $template_html | + | ==== Values from INI/JSON/YAML ==== |
- | < | + | |
- | < | + | |
- | Where the variables | + | As noted by [[https:// |
- | $values | + | ==== Existing String Functions |
- | '// | + | |
- | NULL | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | Being sure the HTML does not contain unsafe variables, the templating engine can accept and apply the supplied variables for the relevant context, for example: | + | Trying to determine if the // |
- | function template_parse($html, $values) { | + | For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate |
- | + | ||
- | if (!is_literal($html)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $dom = new DomDocument(); | + | |
- | $dom-> | + | |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | |
- | + | ||
- | if (!is_literal($query)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | foreach ($xpath-> | + | |
- | foreach ($attributes as $attribute => $value) { | + | |
- | + | ||
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | if ($attribute) { | + | |
- | $safe = false; | + | |
- | if ($attribute == ' | + | |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | |
- | } | + | |
- | } else if ($attribute == ' | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match(' | + | |
- | if (preg_match(' | + | |
- | $safe = true; | + | |
- | } | + | |
- | } | + | |
- | if ($safe) { | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach ($body-> | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | + | ||
- | } | + | |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | Not sure | + | No known BC breaks, except for code-bases that already contain userland functions // |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8? | + | PHP 8.1 |
===== RFC Impact ===== | ===== RFC Impact ===== | ||
Line 329: | Line 322: | ||
==== To SAPIs ==== | ==== To SAPIs ==== | ||
- | Not sure | + | None known |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
Line 341: | Line 334: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
- | - Can // | + | None |
- | - Systems/ | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
- | Not sure | + | None known |
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar |
+ | |||
+ | **Phase 2** could introduce a way for programmers | ||
+ | |||
+ | For example, | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), MarkR would like a // | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | Not sure | + | Accept the RFC. Yes/No |
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
- | N/A | + | Dan Ackroyd has [[https:// |
+ | |||
+ | Joe Watkins has [[https:// | ||
===== References ===== | ===== References ===== | ||
- | - https:// | + | N/A |
===== Rejected Features ===== | ===== Rejected Features ===== | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat. | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that supports string concat. | ||
+ | - **Máté Kocsis**, mate-kocsis, | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the " | ||
+ | - **Mark Randall**, MarkR, for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis