rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2021/03/20 10:18] – Update Future Scope to suggest the use of value objects craigfrancis | rfc:is_literal [2021/05/02 16:38] – Add performance stats, and notes from Dan craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.3 | + | * Version: 0.6 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: 2021-02-19 | + | * Updated: 2021-04-30 |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 11: | Line 11: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | A new function, |
- | As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer. | + | This takes the concept of "taint checking" |
- | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection | + | It does not allow a variable to be marked as untainted, and it does not allowing escaping |
- | See the [[https:// | + | For example, a database library that supports parametrised queries at the driver level, today a programmer could use either of these: |
- | In short, abstractions like Doctrine could protect against | + | <code php> |
+ | $db-> | ||
+ | |||
+ | $db-> | ||
+ | </ | ||
+ | |||
+ | By rejecting the SQL that was not written as a literal (first example), and only accepting a literal string (written by the programmer), | ||
+ | |||
+ | This definition of an " | ||
+ | |||
+ | By adding a way for libraries to check if the strings they receive came from the developer (from trusted PHP source code), it allows the library to check they are being used in a safe way. | ||
+ | |||
+ | ===== Why ===== | ||
+ | |||
+ | The [[https://owasp.org/ | ||
+ | |||
+ | **Injection vulnerabilities** remain at the top of the list (common prevalence, easy for attackers to detect/ | ||
+ | |||
+ | This is because injection mistakes are very easy to make, and hard to identify - is_literal() addresses | ||
+ | |||
+ | ===== Examples ===== | ||
+ | |||
+ | The [[https:// | ||
<code php> | <code php> | ||
- | $users = $queryBuilder | + | // INSECURE |
- | | + | $qb-> |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
</ | </ | ||
- | Or this Twig [[https://twig.symfony.com/doc/2.x/recipes.html# | + | The definition of the //where()// method could check with // |
<code php> | <code php> | ||
- | echo $twig->createTemplate('<p>Hi ' | + | $qb->select('u') |
+ | ->from('User', 'u') | ||
+ | ->where(' | ||
+ | ->setParameter(' | ||
</ | </ | ||
- | ===== Proposal ===== | + | Similarly, Twig allows [[https:// |
+ | |||
+ | <code php> | ||
+ | // INSECURE | ||
+ | echo $twig-> | ||
+ | </ | ||
- | Literals are safe values, defined within | + | If // |
<code php> | <code php> | ||
- | is_literal('Example'); // true | + | echo $twig-> |
+ | </code> | ||
- | $a = ' | + | ===== Failed Solutions ===== |
- | is_literal($a); | + | |
- | is_literal(4); | + | ==== Education ==== |
- | is_literal(0.3); | + | |
- | is_literal(' | + | |
- | $a = ' | + | Developer training has not worked, it simply does not scale (people start programming every day), and learning about every single issue is difficult. |
- | $b = $a . ' B ' . 3; | + | |
- | is_literal($b); // true, ideally (more details below) | + | |
- | is_literal($_GET[' | + | Keeping in mind that programmers will frequently do just enough to complete their task (busy), where they often copy/paste a solution to their problem they find online (risky), modify it for their needs (risky), then move on. |
- | is_literal(rand(0, 10)); // false | + | We cannot keep saying they 'need to be careful', |
+ | |||
+ | ==== Escaping ==== | ||
+ | |||
+ | Escaping is hard, and error prone. | ||
+ | |||
+ | We have a list of common [[https:// | ||
+ | |||
+ | Developers should use parameterised queries | ||
+ | |||
+ | ==== Taint Checking ==== | ||
+ | |||
+ | Some languages implement a "taint flag" which tracks whether values are considered " | ||
+ | |||
+ | There is a [[https:// | ||
+ | |||
+ | These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable. | ||
+ | |||
+ | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. | ||
+ | |||
+ | ==== Static Analysis ==== | ||
+ | |||
+ | While I agree with [[https:// | ||
+ | |||
+ | But they nearly always focus on other issues (type checking, basic logic flaws, code formatting, etc). | ||
+ | |||
+ | Those that attempt to address injection vulnerabilities, | ||
+ | |||
+ | For a quick example, psalm, even in its strictest errorLevel (1), and/or running // | ||
+ | |||
+ | <code php> | ||
+ | $db = new mysqli(' | ||
- | is_literal(sprintf('LIMIT %d', 3)); // false | + | $id = (string) |
- | $c = count($ids); | + | $db-> |
- | $a = 'WHERE id IN (' . implode(',', | + | |
- | is_literal($a); | + | |
</ | </ | ||
- | Ideally string concatenation would be allowed, but [[https:// | + | When psalm comes to taint checking the usage of a library (like Doctrine), it assumes all methods are safe, because none of them note the sinks (and even if they did, you're back to escaping being an issue). |
- | Thanks to [[https:// | + | But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming (usage tends to be higher by those writing well tested libraries). |
- | As an aside, [[https:// | + | ===== Proposal ===== |
- | Unlike the Taint extension, there must **not** | + | This RFC proposes adding three functions: |
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way. | ||
+ | |||
+ | <code php> | ||
+ | is_literal(' | ||
+ | |||
+ | $a = ' | ||
+ | $b = ' | ||
+ | |||
+ | is_literal($a); | ||
+ | is_literal($a . $b); // false, no concat at run time (details below). | ||
+ | |||
+ | is_literal(literal_combine($a, | ||
+ | |||
+ | is_literal($_GET[' | ||
+ | is_literal(' | ||
+ | is_literal(rand(0, | ||
+ | is_literal(sprintf(' | ||
+ | </ | ||
+ | |||
+ | There is no way to manually mark a string as a literal (i.e. no equivalent | ||
+ | |||
+ | *Technical detail: Strings that are concatenated in place at compile time are treated as a literal.* | ||
===== Previous Work ===== | ===== Previous Work ===== | ||
- | There is the [[https:// | + | Google uses " |
+ | |||
+ | Google also uses [[https://errorprone.info/|Error Prone]] in Java to augment the compiler' | ||
- | Google currently uses a [[https://github.com/craigfrancis/ | + | Perl has a [[https://perldoc.perl.org/perlsec#Taint-mode|Taint Mode]], via the -T flag, where all input is marked as "tainted", and cannot be used by some methods (like commands that modify files), unless you use a regular expression to match and return known-good values (where regular expressions are easy to get wrong). |
- | And there are discussions about [[https:// | + | [[https:// |
- | As noted be [[https://news-web.php.net/php.internals/109192|Tyson Andre]], it might be possible to use static analysis, | + | As noted above, there is the [[https://github.com/laruence/taint|Taint extension |
And there is the [[https:// | And there is the [[https:// | ||
Line 90: | Line 172: | ||
* Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | ||
- | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
+ | |||
+ | ===== Usage ===== | ||
+ | |||
+ | By libraries: | ||
+ | |||
+ | <code php> | ||
+ | class db { | ||
+ | protected $level = 2; // Probably should default to 1 at first. | ||
+ | function literal_check($var) { | ||
+ | if (function_exists(' | ||
+ | if ($this-> | ||
+ | // Programmer aware, and is choosing to bypass this check. | ||
+ | } else if ($this-> | ||
+ | trigger_error(' | ||
+ | } else { | ||
+ | throw new Exception(' | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | function unsafe_disable_injection_protection() { | ||
+ | $this-> | ||
+ | } | ||
+ | function where($sql, $parameters = []) { | ||
+ | $this-> | ||
+ | // ... | ||
+ | } | ||
+ | } | ||
+ | |||
+ | $db-> | ||
+ | $db-> | ||
+ | </ | ||
+ | |||
+ | Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//: | ||
+ | |||
+ | <code php> | ||
+ | $order_fields = [ | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ]; | ||
+ | |||
+ | $order_id = array_search(($_GET[' | ||
+ | |||
+ | $sql = literal_combine(' | ||
+ | </ | ||
+ | |||
+ | Undefined number of parameters; for example //WHERE IN//: | ||
+ | |||
+ | <code php> | ||
+ | function where_in_sql($count) { // Should check for 0 | ||
+ | $sql = []; | ||
+ | for ($k = 0; $k < $count; $k++) { | ||
+ | $sql[] = '?'; | ||
+ | } | ||
+ | return literal_implode(',', | ||
+ | } | ||
+ | $sql = literal_combine(' | ||
+ | </ | ||
+ | |||
+ | ===== Considerations ===== | ||
+ | |||
+ | ==== Naming ==== | ||
+ | |||
+ | Literal string is the standard name for strings in source code. See https:// | ||
+ | |||
+ | > A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax. ... The heredoc preserves the line breaks and other whitespace (including indentation) in the text. | ||
+ | |||
+ | Alternatives suggestions have included // | ||
+ | |||
+ | ==== Supporting Int/ | ||
+ | |||
+ | When converting to string, they aren't guaranteed (and often don't) have the exact same value they have in source code. | ||
+ | |||
+ | For example, //TRUE// and //true// when cast to string give " | ||
+ | |||
+ | It's also a very low value feature, where there might not be space for a flag to be added. | ||
+ | |||
+ | ==== Supporting Concatenation ==== | ||
+ | |||
+ | Unfortunately early testing suggests there will be too much of a performance impact, and is why // | ||
+ | |||
+ | It isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string. | ||
+ | |||
+ | It was considered because it would have made it easier for existing projects currently using string concatenation to adopt. | ||
+ | |||
+ | Joe Watkins has created a version that does support string concatenation, | ||
+ | |||
+ | Máté Kocsis did the [[https:// | ||
+ | |||
+ | In my own [[https:// | ||
+ | |||
+ | Dan Ackroyd also notes that the use of // | ||
+ | |||
+ | <code php> | ||
+ | $sortOrder = ' | ||
+ | |||
+ | // 300 lines of code, or multiple function calls | ||
+ | |||
+ | $sql .= ' ORDER BY name ' . $sortOrder; | ||
+ | |||
+ | // 300 lines of code, or multiple function calls | ||
+ | |||
+ | $db-> | ||
+ | </ | ||
+ | |||
+ | If a developer changed the literal //' | ||
+ | |||
+ | <code php> | ||
+ | $sql = literal_combine($sql, | ||
+ | </ | ||
+ | |||
+ | ==== Performance ==== | ||
+ | |||
+ | The proposed implementation has: | ||
+ | |||
+ | [TODO] | ||
+ | |||
+ | Also, see the section above, where string concatenation support would have introduced a ~1% performance hit. | ||
+ | |||
+ | ==== Values from INI/ | ||
+ | |||
+ | As noted by [[https:// | ||
+ | |||
+ | ==== Existing String Functions ==== | ||
+ | |||
+ | Trying to determine if the // | ||
+ | |||
+ | For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate query builder, or by using // | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | None | + | No known BC breaks, except for code-bases that already contain userland functions // |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8.1? | + | PHP 8.1 |
===== RFC Impact ===== | ===== RFC Impact ===== | ||
Line 104: | Line 314: | ||
==== To SAPIs ==== | ==== To SAPIs ==== | ||
- | Not sure | + | None known |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
Line 116: | Line 326: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
- | On [[https:// | + | None |
- | + | ||
- | - Should this be named something else? ([[https:// | + | |
- | - Would this cause performance issues? | + | |
- | - Can // | + | |
- | - Systems/ | + | |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== | ||
- | Not sure | + | None known |
===== Future Scope ===== | ===== Future Scope ===== | ||
- | As noted by [[https:// | + | As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar functions (// |
- | **Phase 2** could introduce a way for certain function arguments | + | **Phase 2** could introduce a way for programmers to specify |
- | For example, a project could require the second argument for // | + | For example, a project could require the second argument for // |
**Phase 3** could set a default of 'only literals' | **Phase 3** could set a default of 'only literals' | ||
- | And, for a bit of silliness (Spaß ist verboten), | + | And, for a bit of silliness (Spaß ist verboten), |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | N/A | + | Accept the RFC. Yes/No |
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
Line 149: | Line 354: | ||
===== Implementation ===== | ===== Implementation ===== | ||
- | [[https:// | + | Dan Ackroyd has [[https:// |
+ | |||
+ | Joe Watkins has [[https:// | ||
===== References ===== | ===== References ===== | ||
Line 158: | Line 365: | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat. | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that does support string concat. | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the " | ||
+ | - **Mark Randall**, MarkR, for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis