rfc:is_literal
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:is_literal [2020/12/23 19:53] – New examples, with a focus on escaping craigfrancis | rfc:is_literal [2021/05/03 19:38] – Don't write off the concat version (inc more perf stats) craigfrancis | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Is Literal Check ====== | ====== PHP RFC: Is Literal Check ====== | ||
- | * Version: 0.2 | + | * Version: 0.6 |
* Date: 2020-03-21 | * Date: 2020-03-21 | ||
- | * Updated: | + | * Updated: |
* Author: Craig Francis, craig# | * Author: Craig Francis, craig# | ||
* Status: Draft | * Status: Draft | ||
Line 11: | Line 11: | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | Add an // | + | A new function, |
- | As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer. | + | This takes the concept of "taint checking" |
- | This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection | + | It does not allow a variable to be marked as untainted, and it does not allow escaping |
- | ===== The Problem ===== | + | For example, take a database library that supports parametrised queries at the driver level, today a programmer could use either of these: |
- | Escaping strings for SQL, HTML, Commands, etc is **very** error prone. | + | <code php> |
+ | $db-> | ||
- | The vast majority of programmers should never do this (mistakes will be made). | + | $db-> |
+ | </ | ||
- | Unsafe values | + | If the library only accepted a literal SQL string |
- | This is primarily for security reasons, but it also causes data to be damaged | + | This definition of an " |
- | Take these mistakes: | + | By adding a way for libraries to check if the strings they receive came from the developer (from trusted PHP source code), it allows the library to check they are being used in a safe way. |
- | echo "< | + | ===== Why ===== |
- | Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)' | + | The [[https://owasp.org/www-project-top-ten/|OWASP Top 10]] lists common vulnerabilities sorted by prevalence, exploitability, |
- | echo "< | + | **A1: Injection** - common prevalence |
- | Flawed because // | + | **A7: XSS** - widespread prevalence |
- | echo '<a href="' | + | And these two have always been listed: 2003 (A6/A4), 2004 (A6/A4), 2007 (A2/A1), 2010 (A1/A2), 2013 (A1/A3), 2017 (A1/A7). |
- | Flawed | + | It' |
- | < | + | ===== Examples ===== |
- | var url = "<?= addslashes($url) ?>"; | + | |
- | </ | + | |
- | Flawed because | + | The [[https://www.doctrine-project.org/projects/doctrine-orm/en/current/reference/query-builder.html# |
- | echo '<a href="/path/?name=' | + | <code php> |
+ | // INSECURE | ||
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | </code> | ||
- | Flawed because | + | The definition of the //where()// method could check with //is_literal()// and throw an exception, advising the programmer to replace it with a safer use of placeholders: |
- | | + | <code php> |
+ | $qb-> | ||
+ | | ||
+ | | ||
+ | | ||
+ | </code> | ||
- | Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. | + | Similarly, Twig allows [[https:// |
- | Also flawed because some browsers | + | <code php> |
+ | // INSECURE | ||
+ | echo $twig-> | ||
+ | </code> | ||
- | example.html: | + | If // |
- | <img src={{ url }} alt='' | + | |
- | + | ||
- | $loader = new \Twig\Loader\FilesystemLoader('./templates/'); | + | |
- | $twig = new \Twig\Environment($loader, [' | + | |
- | + | ||
- | echo $twig-> | + | |
- | Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ | + | <code php> |
+ | echo $twig-> | ||
+ | </code> | ||
- | $sql = ' | + | ===== Failed Solutions ===== |
- | Flawed because the value has not been quoted, e.g. //$id = ' | + | ==== Education ==== |
- | $sql = ' | + | Developer training has not worked, it simply does not scale (people start programming every day), and learning about every single issue is difficult. |
- | Flawed if ' | + | Keeping in mind that programmers will frequently do just enough to complete their task (busy), where they often copy/paste a solution to their problem they find online (risky), modify it for their needs (risky), then move on. |
- | $sql = 'INSERT INTO user (name) VALUES ("' . $mysqli-> | + | We cannot keep saying they 'need to be careful', and relying on them to never make a mistake. |
- | Flawed if 'SET NAMES latin1' | + | ==== Escaping ==== |
- | $parameters = " | + | Escaping is hard, and error prone. |
- | + | ||
- | // $parameters = ' | + | |
- | + | ||
- | mail(' | + | |
- | Flawed because it's not possible to safely escape values in //$additional_parameters// for //mail()//, e.g. //$email = ' | + | We have a list of common [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#common-mistakes|escaping mistakes]]. |
- | ===== Previous Solutions ===== | + | Developers should use parameterised queries (e.g. SQL), or a well tested library that knows how to escape values based on their context (e.g. HTML). |
- | [[https:// | + | ==== Taint Checking ==== |
- | [[https:// | + | Some languages implement a "taint flag" which tracks whether values are considered " |
- | * " | + | There is a [[https://github.com/laruence/taint|Taint extension |
- | * this amount of work isn't ideal for "just for one use case" ([[https:// | + | |
- | * It would have effected every SQL function, such as // | + | |
- | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | + | |
- | I also agree that "SQL injection | + | These solutions rely on the assumption |
- | ===== Related JavaScript Implementation ===== | + | This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user. |
- | This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. | + | ==== Static Analysis ==== |
- | https://github.com/tc39/proposal-array-is-template-object\\ | + | While I agree with [[https://news-web.php.net/php.internals/109192|Tyson Andre]], it is highly recommended to use Static Analysis. |
- | https:// | + | |
- | They are looking at " | + | But they nearly always focus on other issues (type checking, basic logic flaws, code formatting, etc). |
- | ===== Solution ===== | + | Those that attempt to address injection vulnerabilities, |
- | Literals are safe values, defined within | + | For a quick example, psalm, even in its strictest errorLevel (1), and/or running // |
- | $a = ' | + | <code php> |
- | | + | $db = new mysqli(' |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); // true | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = ' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $a = sprintf(' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $c = count($ids); | + | |
- | $a = 'WHERE id IN (' | + | |
- | is_literal($a); | + | |
- | + | ||
- | $limit = 10; | + | |
- | $a = 'LIMIT ' . ($limit + 1); | + | |
- | is_literal($a); | + | |
- | This uses a similar definition of [[https:// | + | $id = (string) ($_GET[' |
- | Thanks to [[https:// | + | $db-> |
+ | </code> | ||
- | And thanks | + | When psalm comes to taint checking |
- | Commands can be checked to ensure they are a " | + | But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming |
- | This approach allows all systems/ | + | ===== Proposal ===== |
- | Unlike the Taint extension, there must **not** be an equivalent // | + | This RFC proposes adding four functions: |
- | ==== Solution: SQL Injection ==== | + | * // |
+ | * // | ||
+ | * // | ||
+ | * // | ||
- | Database abstractions | + | A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way. |
- | [[https:// | + | <code php> |
+ | is_literal(' | ||
- | | + | $a = 'Hello'; |
- | -> | + | $b = 'World'; |
- | | + | |
- | -> | + | |
- | -> | + | |
- | -> | + | |
- | + | ||
- | // example.php? | + | |
- | This mistake can be easily identified by: | + | is_literal($a); |
+ | is_literal($a . $b); // TBC, details below. | ||
- | public function where($predicates) | + | $c = literal_combine($a, $b); |
- | { | + | is_literal($c); // true |
- | if (function_exists(' | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | ... | + | |
- | } | + | |
- | [[https://redbeanphp.com/index.php? | + | is_literal($_GET[' |
+ | is_literal(' | ||
+ | is_literal(rand(0, | ||
+ | is_literal(sprintf(' | ||
+ | </code> | ||
- | $users = R::find(' | + | There is no way to manually mark a string as a literal |
- | [[http:// | + | ===== Previous Work ===== |
- | $users = UserQuery:: | + | Google uses " |
- | The // | + | Google |
- | ==== Solution: SQL Injection, Basic ==== | + | Perl has a [[https:// |
- | A simple example: | + | [[https:// |
- | $sql = ' | + | As noted above, there is the [[https:// |
- | + | ||
- | $result = $db-> | + | |
- | Checked in the framework | + | And there is the [[https:// |
- | | + | |
- | + | | |
- | | + | * It would have effected every SQL function, such as // |
- | + | * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https:// | |
- | if (!is_literal($sql)) { | + | |
- | throw new Exception(' | + | I also agree that "SQL injection is almost a solved problem [by using] prepared statements" |
+ | |||
+ | ===== Usage ===== | ||
+ | |||
+ | By libraries: | ||
+ | |||
+ | <code php> | ||
+ | class db { | ||
+ | | ||
+ | | ||
+ | | ||
+ | if ($this-> | ||
+ | // Programmer aware, and is choosing to bypass this check. | ||
+ | } else if ($this-> | ||
+ | trigger_error(' | ||
+ | } else { | ||
+ | throw new Exception(' | ||
} | } | ||
- | | ||
- | $statement = $this-> | ||
- | $statement-> | ||
- | return $statement-> | ||
- | | ||
} | } | ||
- | | ||
} | } | ||
+ | function unsafe_disable_injection_protection() { | ||
+ | $this-> | ||
+ | } | ||
+ | function where($sql, $parameters = []) { | ||
+ | $this-> | ||
+ | // ... | ||
+ | } | ||
+ | } | ||
- | This also works with string concatenation: | + | $db-> |
+ | $db-> | ||
+ | </ | ||
- | define(' | + | Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//: |
- | + | ||
- | $sql = ' | + | |
- | + | ||
- | is_literal($sql); | + | |
- | + | ||
- | $sql .= ' AND id = ' . $mysqli-> | + | |
- | + | ||
- | is_literal($sql); | + | |
- | ==== Solution: SQL Injection, ORDER BY ==== | + | <code php> |
+ | $order_fields | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ]; | ||
- | To ensure //ORDER BY// can be set via the user, but only use acceptable values: | + | $order_id = array_search(($_GET[' |
- | | + | $sql = literal_combine(' ORDER BY ', $order_fields[$order_id]); |
- | ' | + | </ |
- | ' | + | |
- | ' | + | |
- | ]; | + | |
- | + | ||
- | $order_id | + | |
- | + | ||
- | $sql = ' ORDER BY ' | + | |
- | ==== Solution: SQL Injection, | + | Undefined number of parameters; for example //WHERE IN//: |
- | Most SQL strings can be a simple concatenations of literal values, but // | + | <code php> |
+ | function where_in_sql($count) { // Should check for 0 | ||
+ | $sql = []; | ||
+ | for ($k = 0; $k < $count; $k++) { | ||
+ | $sql[] = '?'; | ||
+ | } | ||
+ | return literal_implode(',', | ||
+ | } | ||
+ | $sql = literal_combine(' | ||
+ | </code> | ||
- | There needs to be a special case for // | + | ===== Considerations ===== |
- | $in_sql | + | ==== Naming ==== |
- | + | ||
- | $sql = ' | + | |
- | ==== Solution: CLI Injection ==== | + | Literal string is the standard name for strings in source code. See [[https:// |
- | Rather than using functions such as: | + | > A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax... |
- | * //exec()// | + | Alternatives suggestions have included |
- | * //shell_exec()// | + | |
- | * //system()// | + | |
- | * //passthru()// | + | |
- | Frameworks (or PHP) could introduce something similar to //pcntl_exec()//, | + | ==== Supporting Int/Float/Boolean values. ==== |
- | Or, take a safe literal for the command, and use parameters for the arguments (like SQL does): | + | When converting to string, they aren't guaranteed (and often don't) have the exact same value they have in source code. |
- | $output = parameterised_exec(' | + | For example, |
- | ' | + | |
- | ]); | + | |
- | Rough implementation: | + | It's also a very low value feature, where there might not be space for a flag to be added. |
- | function parameterised_exec($cmd, | + | ==== Supporting Concatenation |
- | + | ||
- | if (!is_literal($cmd)) { | + | |
- | throw new Exception(' | + | |
- | } | + | |
- | + | ||
- | $offset | + | |
- | $k = 0; | + | |
- | while (($pos | + | |
- | if (!isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | $arg = escapeshellarg($args[$k]); | + | |
- | $cmd = substr($cmd, | + | |
- | $offset = ($pos + strlen($arg)); | + | |
- | $k++; | + | |
- | } | + | |
- | if (isset($args[$k])) { | + | |
- | throw new Exception(' | + | |
- | exit(); | + | |
- | } | + | |
- | + | ||
- | return exec($cmd); | + | |
- | + | ||
- | } | + | |
- | ==== Solution: HTML Injection ==== | + | This is the big question. |
- | Template engines should receive variables separately from the raw HTML. | + | Máté Kocsis has done some [[https:// |
- | Often the engine will get the HTML from static files (safe): | + | In my own [[https:// |
- | | + | Laravel Demo App: +0.30% with, vs +0.18% without concat. |
+ | Symfony Demo App: +0.06% with, vs +0.06% without concat. | ||
+ | My Concat Test: | ||
- | But small snippets of HTML are often easier to define as a literal within | + | In my basic test, I used a RAM Disk, and disabled |
- | $template_html = ' | + | There is still a small impact without concat because the // |
- | < | + | |
- | < | + | |
- | Where the variables are supplied separately, in this example I'm using XPath: | + | Technically string concat isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string. But it would make adoption of is_literal() easier for existing projects that are currently |
- | $values = [ | + | And supporting runtime concat would make the literal check easier to understand, as it would be consistent (e.g. compiler vs runtime concat, where the compiler can concat two strings to create |
- | '// | + | |
- | NULL => ' | + | |
- | ' | + | |
- | ' | + | |
- | ], | + | |
- | '//a' => [ | + | |
- | ' | + | |
- | ], | + | |
- | ]; | + | |
- | + | ||
- | echo template_parse($template_html, | + | |
- | The templating engine can then accept and apply the supplied variables for the relevant context. | + | The non-concat version would use // |
- | As a simple example, this can be done with: | + | <code php> |
+ | $sortOrder = ' | ||
- | | + | // 300 lines of code, or multiple |
- | + | ||
- | if (!is_literal($html)) { | + | $sql .= ' ORDER BY name ' . $sortOrder; |
- | throw new Exception('Invalid Template HTML.'); | + | |
- | } | + | // 300 lines of code, or multiple function calls |
- | + | ||
- | $dom = new DomDocument(); | + | $db->query($sql); |
- | $dom->loadHTML('<? | + | </code> |
- | + | ||
- | $xpath = new DOMXPath($dom); | + | If a developer changed the literal //' |
- | + | ||
- | foreach ($values as $query => $attributes) { | + | <code php> |
- | + | $sql = literal_combine($sql, ' | |
- | if (!is_literal($query)) { | + | </ |
- | throw new Exception('Invalid Template XPath.'); | + | |
- | } | + | ==== Performance ==== |
- | + | ||
- | foreach ($xpath->query($query) as $element) { | + | TBC |
- | | + | |
- | + | See the section above. | |
- | if (!is_literal($attribute)) { | + | |
- | throw new Exception('Invalid Template Attribute.'); | + | ==== Values from INI/JSON/YAML ==== |
- | } | + | |
- | + | As noted by [[https://news-web.php.net/ | |
- | if ($attribute) { | + | |
- | $safe = false; | + | ==== Existing String Functions ==== |
- | if ($attribute | + | |
- | if (preg_match('/ | + | Trying to determine |
- | $safe = true; // Not " | + | |
- | } | + | For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate query builder, or by using // |
- | } else if ($attribute | + | |
- | if (in_array($value, | + | |
- | $safe = true; // Only allow specific classes? | + | |
- | } | + | |
- | } else if (preg_match(' | + | |
- | if (preg_match(' | + | |
- | | + | |
- | } | + | |
- | } | + | |
- | | + | |
- | $element-> | + | |
- | } | + | |
- | } else { | + | |
- | $element-> | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | } | + | |
- | + | ||
- | } | + | |
- | + | ||
- | $html = ''; | + | |
- | $body = $dom-> | + | |
- | if ($body-> | + | |
- | foreach | + | |
- | $html .= $dom-> | + | |
- | } | + | |
- | } | + | |
- | + | ||
- | return $html; | + | |
- | + | ||
- | } | + | |
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | None | + | No known BC breaks, except for code-bases that already contain userland functions // |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
- | PHP 8.1? | + | PHP 8.1 |
===== RFC Impact ===== | ===== RFC Impact ===== | ||
Line 399: | Line 322: | ||
==== To SAPIs ==== | ==== To SAPIs ==== | ||
- | Not sure | + | None known |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== | ||
Line 411: | Line 334: | ||
===== Open Issues ===== | ===== Open Issues ===== | ||
- | On [[https:// | + | None |
- | - Would this cause performance issues? Presumably not as bad a type checking. | + | ===== Unaffected PHP Functionality ===== |
- | - Can // | + | |
- | - Should the function be named // | + | |
- | - Systems/ | + | |
- | ===== Alternatives ===== | + | None known |
- | - The current Taint Extension (notes above) | + | ===== Future Scope ===== |
- | - Using static analysis (not at runtime), for example [[https:// | + | |
- | ===== Unaffected PHP Functionality ===== | + | As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar functions (// |
- | Not sure | + | **Phase 2** could introduce a way for programmers to specify certain PHP function/ |
- | ===== Future Scope ===== | + | For example, a project could require the second argument for // |
- | Certain functions (// | + | **Phase 3** could set a default of 'only literals' |
- | PHP could also have a mode where output | + | And, for a bit of silliness |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
- | N/A | + | Accept the RFC. Yes/No |
===== Patches and Tests ===== | ===== Patches and Tests ===== | ||
- | A volunteer is needed to help with implementation. | + | N/A |
===== Implementation ===== | ===== Implementation ===== | ||
+ | |||
+ | Dan Ackroyd has [[https:// | ||
+ | |||
+ | Joe Watkins has [[https:// | ||
+ | |||
+ | ===== References ===== | ||
N/A | N/A | ||
Line 448: | Line 373: | ||
N/A | N/A | ||
+ | |||
+ | ===== Thanks ===== | ||
+ | |||
+ | - **Dan Ackroyd**, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat. | ||
+ | - **Joe Watkins**, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that supports string concat. | ||
+ | - **Máté Kocsis**, mate-kocsis, | ||
+ | - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries. | ||
+ | - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the " | ||
+ | - **Mark Randall**, MarkR, for alternative ideas, and noting that " | ||
+ | - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, | ||
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis