rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/03/23 17:44] – Add a note about "interned strings" craigfrancis | rfc:is_literal [2021/05/02 16:38] – Add performance stats, and notes from Dan craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.1 | + | * Version: 0.6 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
+ | * Updated: 2021-04-30 | ||
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
* First Published at: https:// | * First Published at: https:// | ||
+ | * GitHub Repo: https:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | A new function, |
- | This function would allows developers/ | + | This takes the concept |
- | Commands can then be tested to ensure they are a " | + | It does not allow a variable to be marked as untainted, and it does not allowing escaping |
- | This will also allow systems/ | + | For example, a database library that supports parametrised queries at the driver level, today a programmer could use either of these: |
- | Literals are values defined within the PHP scripts, for example: | + | <code php> |
+ | $db-> | ||
- | | + | $db-> |
- | $b = 'Example ' . $a; | + | </code> |
- | is_literal($b); | + | |
- | + | ||
- | $c = ' | + | |
- | | + | |
- | ===== Related JavaScript Implementation ===== | + | By rejecting the SQL that was not written as a literal (first example), and only accepting a literal string (written by the programmer), |
- | This proposal is taking some ideas from TC39, where a similar | + | This definition of an " |
- | https:// | + | By adding a way for libraries to check if the strings they receive came from the developer (from trusted PHP source code), it allows the library to check they are being used in a safe way. |
- | https:// | + | |
- | They are looking at " | + | ===== Why ===== |
- | ===== Taint Checking ===== | + | The [[https:// |
- | Xinchen Hui has done some amazing work with the Taint extension: | + | **Injection vulnerabilities** remain at the top of the list (common prevalence, easy for attackers to detect/ |
- | https:// | + | This is because injection mistakes are very easy to make, and hard to identify - is_literal() addresses this. |
- | Unfortunately this approach does not address all issues, mainly because it still allows string escaping, which is only " | + | ===== Examples ===== |
- | $sql = ' | + | The [[https://www.doctrine-project.org/projects/doctrine-orm/ |
- | + | ||
- | | + | |
- | + | ||
- | | + | |
- | | + | <code php> |
- | + | // INSECURE | |
- | // example.php? | + | $qb-> |
- | + | | |
- | // <img src=x onerror=alert(cookie) | + | |
+ | </code> | ||
- | The Taint extension also [[https://github.com/laruence/taint/blob/4a6c4cb2613e27f5604d2021802c144a954caff8/ | + | The definition of the //where()// method could check with //is_literal()// and throw an exception, advising the programmer to replace it with a safer use of placeholders: |
- | ===== Previous RFC ===== | + | <code php> |
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
- | Matt Tait suggested | + | Similarly, Twig allows |
- | It was noted that " | + | <code php> |
+ | // INSECURE | ||
+ | echo $twig-> | ||
+ | </code> | ||
- | Where it would have effected every SQL function, such as //mysqli_query()//, //$pdo-> | + | If //createTemplate()// checked with //is_literal()//, the programmer could be advised to write this instead: |
- | And each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" | + | <code php> |
+ | echo $twig-> | ||
+ | </code> | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | ===== Failed Solutions ===== |
- | ===== Proposal ===== | + | ==== Education |
- | Add an // | + | Developer training |
- | This uses a similar definition as the [[https:// | + | Keeping in mind that programmers will frequently do just enough to complete their task (busy), where they often copy/paste a solution |
- | Thanks | + | We cannot keep saying they ' |
- | And thanks to [[https:// | + | ==== Escaping ==== |
- | Unlike the Taint extension, there is no need to provide an equivalent // | + | Escaping |
- | ===== Examples ===== | + | We have a list of common [[https:// |
- | ==== SQL Injection, Basic ==== | + | Developers should use parameterised queries (e.g. SQL), or a well tested library that knows how to escape values based on their context (e.g. HTML). |
- | A simple example: | + | ==== Taint Checking ==== |
- | $sql = ' | + | Some languages implement a "taint flag" which tracks whether values are considered " |
- | + | ||
- | $result = $db-> | + | |
- | Checked in the framework | + | There is a [[https:// |
- | class db { | + | These solutions rely on the assumption that the output of an escaping |
- | + | ||
- | public | + | |
- | + | ||
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $statement = $this-> | + | |
- | $statement-> | + | |
- | return $statement-> | + | |
- | + | ||
- | } | + | |
- | + | ||
- | } | + | |
- | It will also work with string concatenation: | + | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. |
- | define(' | + | ==== Static Analysis ==== |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . mysqli_real_escape_string($db, | + | |
- | + | ||
- | is_literal($sql); | + | |
- | ==== SQL Injection, ORDER BY ==== | + | While I agree with [[https:// |
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | But they nearly always focus on other issues (type checking, basic logic flaws, code formatting, etc). |
- | $order_fields = [ | + | Those that attempt to address injection vulnerabilities, do so via Taint Checking |
- | ' | + | |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | $order_id = array_search(($_GET[' | + | |
- | + | ||
- | $sql = ' ORDER BY ' | + | |
- | ==== SQL Injection, WHERE IN ==== | + | For a quick example, psalm, even in its strictest errorLevel (1), and/or running // |
- | Most SQL strings can be a concatenations of literal values, but //WHERE x IN (?,?,?)// need to use a variable number of literal placeholders. | + | <code php> |
+ | $db = new mysqli('...'); | ||
- | So there //might// need to be a special case for // | + | $id = (string) ($_GET[' |
- | | + | $db-> |
- | + | </code> | |
- | | + | |
- | + | ||
- | $in_sql = substr(str_repeat('?,', | + | |
- | To be used with: | + | When psalm comes to taint checking the usage of a library (like Doctrine), it assumes all methods are safe, because none of them note the sinks (and even if they did, you're back to escaping being an issue). |
- | $sql = ' | + | But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming |
- | ==== SQL Injection, ORM Usage ==== | + | ===== Proposal ===== |
- | [[https:// | + | This RFC proposes adding three functions: |
- | | + | |
- | ->select(' | + | * // |
- | | + | * // |
- | | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | | + | |
- | Where this mistake could be identified | + | A literal is defined as a value (string) which has been written |
- | public function where($predicates) | + | <code php> |
- | { | + | is_literal(' |
- | if (!is_literal($predicates)) { | + | |
- | throw new Exception('Can only accept a literal'); | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | [[https:// | + | $a = ' |
+ | $b = ' | ||
- | | + | is_literal($a); // true |
+ | is_literal($a . $b); // false, no concat at run time (details below). | ||
- | [[http:// | + | is_literal(literal_combine($a, $b)); // true |
- | $users = UserQuery:: | + | is_literal($_GET[' |
+ | is_literal('WHERE id = ' . intval($_GET[' | ||
+ | is_literal(rand(0, 10)); // false | ||
+ | is_literal(sprintf(' | ||
+ | </ | ||
- | ==== SQL Injection, ORM Internal ==== | + | There is no way to manually mark a string as a literal (i.e. no equivalent to // |
- | The // | + | *Technical detail: Strings that are concatenated in place at compile time are treated as a literal.* |
- | This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https:// | + | ===== Previous Work ===== |
- | ==== CLI Injection ==== | + | Google uses " |
- | Rather than using functions such as: | + | Google also uses [[https:// |
- | * //exec()// | + | Perl has a [[https://perldoc.perl.org/perlsec# |
- | * // | + | |
- | * //system()// | + | |
- | * // | + | |
- | Frameworks | + | [[https:// |
- | Or, take a verified literal for the command, and use parameters | + | As noted above, there is the [[https:// |
- | $output = parameterised_exec(' | + | And there is the [[https://wiki.php.net/rfc/ |
- | ' | + | |
- | ]); | + | |
- | Rough implementation: | + | * " |
+ | * this amount of work isn't ideal for "just for one use case" ([[https:// | ||
+ | * It would have effected every SQL function, such as // | ||
+ | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | function parameterised_exec($cmd, $args = []) { | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
- | + | ||
- | if (!is_literal($cmd)) { | + | ===== Usage ===== |
- | throw new Exception(' | + | |
- | } | + | By libraries: |
- | + | ||
- | $offset | + | <code php> |
- | $k = 0; | + | class db { |
- | | + | protected |
- | if (!isset($args[$k])) { | + | |
- | | + | |
- | | + | if ($this-> |
+ | | ||
+ | } else if ($this-> | ||
+ | trigger_error('Non-literal detected!', E_USER_WARNING); | ||
+ | } else { | ||
+ | | ||
} | } | ||
- | $arg = escapeshellarg($args[$k]); | ||
- | $cmd = substr($cmd, | ||
- | $offset = ($pos + strlen($arg)); | ||
- | $k++; | ||
} | } | ||
- | if (isset($args[$k])) { | ||
- | throw new Exception(' | ||
- | exit(); | ||
- | } | ||
- | | ||
- | return exec($cmd); | ||
- | | ||
} | } | ||
+ | function unsafe_disable_injection_protection() { | ||
+ | $this-> | ||
+ | } | ||
+ | function where($sql, $parameters = []) { | ||
+ | $this-> | ||
+ | // ... | ||
+ | } | ||
+ | } | ||
- | ==== HTML Injection ==== | + | $db-> |
+ | $db-> | ||
+ | </ | ||
- | Template engines should receive variables separately from the raw HTML. | + | Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//: |
- | Often the engine will get the HTML from static files: | + | <code php> |
+ | $order_fields = [ | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ]; | ||
- | | + | $order_id |
- | But small snippets of HTML are often easier to define as a literal within the PHP script: | + | $sql = literal_combine(' |
+ | </ | ||
- | $template_html = ' | + | Undefined number of parameters; for example |
- | < | + | |
- | < | + | |
- | Where the variables are supplied separately, in this example I'm using XPaths: | + | <code php> |
+ | function where_in_sql($count) { // Should check for 0 | ||
+ | $sql = []; | ||
+ | for ($k = 0; $k < $count; $k++) { | ||
+ | $sql[] = '?'; | ||
+ | } | ||
+ | return literal_implode(' | ||
+ | } | ||
+ | $sql = literal_combine(' | ||
+ | </ | ||
- | $values | + | ===== Considerations ===== |
- | '// | + | |
- | NULL | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '// | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | Being sure the HTML does not contain unsafe variables, the templating engine can accept and apply the supplied variables for the relevant context, for example: | + | ==== Naming ==== |
- | function template_parse($html, | + | Literal string is the standard name for strings in source code. See https:// |
- | + | ||
- | | + | > A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax. ... The heredoc preserves the line breaks and other whitespace |
- | throw new Exception(' | + | |
- | } | + | Alternatives suggestions have included // |
- | + | ||
- | $dom = new DomDocument(); | + | ==== Supporting Int/ |
- | $dom-> | + | |
- | + | When converting to string, they aren't guaranteed | |
- | | + | |
- | + | For example, //TRUE// and //true// when cast to string give " | |
- | foreach ($values | + | |
- | + | It's also a very low value feature, where there might not be space for a flag to be added. | |
- | | + | |
- | throw new Exception(' | + | ==== Supporting Concatenation ==== |
- | } | + | |
- | + | Unfortunately early testing suggests there will be too much of a performance impact, and is why // | |
- | | + | |
- | | + | It isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string. |
- | + | ||
- | | + | It was considered because it would have made it easier for existing projects currently using string concatenation to adopt. |
- | throw new Exception('Invalid Template Attribute.'); | + | |
- | } | + | Joe Watkins has created a version that does support string concatenation, |
- | + | ||
- | if ($attribute) { | + | Máté Kocsis did the [[https://github.com/craigfrancis/ |
- | $safe = false; | + | |
- | if ($attribute == 'href' | + | In my own [[https://github.com/ |
- | if (preg_match('/ | + | |
- | $safe = true; // Not " | + | Dan Ackroyd also notes that the use of // |
- | } | + | |
- | } else if ($attribute == ' | + | <code php> |
- | if (in_array($value, [' | + | $sortOrder |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | // 300 lines of code, or multiple function calls |
- | } else if (preg_match('/ | + | |
- | if (preg_match('/^[a-z0-9 | + | $sql .= ' |
- | $safe = true; | + | |
- | } | + | // 300 lines of code, or multiple function calls |
- | } | + | |
- | if ($safe) { | + | $db-> |
- | $element-> | + | </ |
- | } | + | |
- | } else { | + | If a developer changed the literal //'ASC'// to //$_GET[' |
- | | + | |
- | } | + | <code php> |
- | + | $sql = literal_combine($sql, ' ORDER BY name ', $sortOrder); | |
- | } | + | </ |
- | } | + | |
- | + | ==== Performance ==== | |
- | } | + | |
- | + | The proposed implementation has: | |
- | | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach | + | Also, see the section above, where string concatenation support would have introduced a ~1% performance hit. |
- | $html .= $dom-> | + | |
- | } | + | ==== Values from INI/ |
- | } | + | |
- | + | As noted by [[https:// | |
- | | + | |
- | + | ==== Existing String Functions ==== | |
- | } | + | |
+ | Trying to determine if the // | ||
+ | |||
+ | For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate query builder, or by using // | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | Not sure | + | No known BC breaks, except for code-bases that already contain userland functions // |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8? | + | PHP 8.1 |
===== RFC Impact ===== | ===== RFC Impact ===== | ||
Line 340: | Line 314: | ||
==== To SAPIs ==== | ==== To SAPIs ==== | ||
- | Not sure | + | None known |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
Line 352: | Line 326: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
- | - Would this cause performance issues? | + | None |
- | - Can // | + | |
- | - Should the function be named // | + | |
- | - Systems/ | + | |
- | + | ||
- | ===== Alternatives ===== | + | |
- | + | ||
- | - The current Taint Extension (notes above) | + | |
- | - Using static analysis (not runtime), for example [[https:// | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
- | Not sure | + | None known |
===== Future Scope ===== | ===== Future Scope ===== | ||
- | Certain | + | As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar |
+ | |||
+ | **Phase 2** could introduce a way for programmers | ||
+ | |||
+ | For example, | ||
+ | |||
+ | **Phase 3** could set a default of 'only literals' | ||
+ | |||
+ | And, for a bit of silliness (Spaß ist verboten), MarkR would like a // | ||
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | Not sure | + | Accept the RFC. Yes/No |
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
- | N/A | + | Dan Ackroyd has [[https:// |
+ | |||
+ | Joe Watkins has [[https:// | ||
===== References ===== | ===== References ===== | ||
- | - https:// | + | N/A |
===== Rejected Features ===== | ===== Rejected Features ===== | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat. | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that does support string concat. | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the " | ||
+ | - **Mark Randall**, MarkR, for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis