rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
rfc:is_literal [2021/06/17 11:02] – Updated note about integers (could be included), why not Boolean/Float, and new 'Faking It' section craigfrancisrfc:is_literal [2021/07/04 22:16] craigfrancis
Line 1: Line 1:
-====== PHP RFC: Is Literal Check ======+====== PHP RFC: Is_Literal ======
  
-  * Version: 0.8+  * Version: 1.1
   * Date: 2020-03-21   * Date: 2020-03-21
-  * Updated: 2021-06-06+  * Updated: 2021-07-04
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
-  * Contributors: Joe Watkins, Dan Ackroyd, Máté Kocsis+  * Contributors: Joe Watkins, Máté Kocsis
   * Status: Under Discussion   * Status: Under Discussion
   * First Published at: https://wiki.php.net/rfc/is_literal   * First Published at: https://wiki.php.net/rfc/is_literal
Line 12: Line 12:
 ===== Introduction ===== ===== Introduction =====
  
-Add the function //is_literal(string $string)//, so strings can be tested to ensure they were written by the developer (defined in the PHP source code, not containing any user input).+Add the function //is_literal()//, a lightweight and effective way to identify if a string was written by developer, removing the risk of a variable containing an Injection Vulnerability.
  
-This flag provides lightweight, simple, and very effective way to identify common Injection Vulnerabilities.+It's a simple process where a flag is set internally on strings that have been written by developer (as opposed to a user)where the flag persists through concatenation with other 'literal' strings. The function checks the flag is present and thus no user data is included.
  
-It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]].+It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries.
  
-Developers should not escape anything themselves; they should use parameterised queries, and/or well-tested libraries. +//is_literal()// can be used by libraries to deal with a difficult problem - developers using them incorrectlyLibraries expect certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:
- +
-These libraries require certain sensitive strings to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:+
  
 <code php> <code php>
Line 28: Line 26:
 </code> </code>
  
-The "Future Scope" section explains how native functions will be able to use //is_literal()//.+(The "Future Scope" section explains why a dedicated type should come later, and how native functions could use the //is_literal// flag as well.)
  
 ===== Background ===== ===== Background =====
Line 36: Line 34:
 Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**. Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**.
  
-We like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.+With SQL Injection, it just takes 1 mistake, and the attacker can usually read everything in the database (SQL Map, Havij, jSQL, etc). 
 + 
 +When it comes to coding, we like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.
  
 It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact: It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact:
Line 58: Line 58:
 ==== Usage in PHP ==== ==== Usage in PHP ====
  
-Libraries would be able to use //is_literal()// immediately, allowing them to warn developers about Injection Issues as soon as they receive any non-literal strings, for example:+Libraries would be able to use //is_literal()// immediately, allowing them to warn developers about Injection Issues as soon as they receive any non-literal values. Some already plan to implement this, for example:
  
 **Propel** (Mark Scherer): "given that this would help to more safely work with user input, I think this syntax would really help in Propel." **Propel** (Mark Scherer): "given that this would help to more safely work with user input, I think this syntax would really help in Propel."
Line 68: Line 68:
 ===== Proposal ===== ===== Proposal =====
  
-Add //is_literal(string $string): bool// to check if a variable contains a string defined in the PHP script.+Add the function //is_literal()//
 + 
 +A string shall pass the //is_literal// check if it was defined by the programmer in source code, or is the result of a function or instruction whose inputs would all pass the //is_literal// check. 
 + 
 +Concatenation instructions and the following string functions are therefore able to produce literals: 
 + 
 +  - //str_repeat()// 
 +  - //str_pad()// 
 +  - //implode()// 
 +  - //join()// 
 + 
 +(Namespaces constructed for the programmer by the compiler will also be marked literal for convenience.)
  
 <code php> <code php>
Line 81: Line 92:
  
 is_literal($_GET['id']); // false is_literal($_GET['id']); // false
-is_literal(rand(0, 10)); // false +is_literal(sprintf('Hi %s', $_GET['name'])); // false
-is_literal(sprintf('Example %d', true)); // false+
 is_literal('/bin/rm -rf ' . $_GET['path']); // false is_literal('/bin/rm -rf ' . $_GET['path']); // false
 is_literal('<img src=' . htmlentities($_GET['src']) . ' />'); // false is_literal('<img src=' . htmlentities($_GET['src']) . ' />'); // false
 is_literal('WHERE id = ' . $db->real_escape_string($_GET['id'])); // false is_literal('WHERE id = ' . $db->real_escape_string($_GET['id'])); // false
-is_literal(sprintf('LIMIT %d', 3)); // false 
  
 function example($input) { function example($input) {
   if (!is_literal($input)) {   if (!is_literal($input)) {
-    throw new Exception('Non-literal detected!');+    throw new Exception('Non-literal value detected!');
   }   }
   return $input;   return $input;
Line 96: Line 105:
  
 example($a); // OK example($a); // OK
-example(example($a)); // OK, still the same literal.+example(example($a)); // OK, still the same literal value.
 example(strtoupper($a)); // Exception thrown. example(strtoupper($a)); // Exception thrown.
 </code> </code>
  
-Because so much existing code uses string concatenation, and because it does not modify what the programmer has written, concatenated literals will keep the literal flag. This includes the use of //str_repeat()//, //str_pad()//, //implode()//, //join()//, //array_pad()//, and //array_fill()//+===== Try It =====
- +
-===== Try it =====+
  
-[[https://3v4l.org/#focus=rfc.literals|Have a play with it on 3v4l.org]]+[[https://3v4l.org/#focus=rfc.literals|Test it out on 3v4l.org]]
  
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using). [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using).
Line 112: Line 119:
 ==== Taint Checking ==== ==== Taint Checking ====
  
-**Taint checking is flawed, isn't this the same?** It is not the same.+**Taint checking is flawed, isn't this the same?**
  
-Taint Checking incorrectly assumes the output of an escaping function is "safe" for a particular context. While it sounds reasonable in theory, the operation of escaping functions, and the context for which their output is safe, is very hard to define. This leads to a feature that is both complex and unreliable.+It is not the same. Taint Checking incorrectly assumes the output of an escaping function is "safe" for a particular context. While it sounds reasonable in theory, the operation of escaping functions, and the context for which their output is safe, is very hard to define and led to a feature that is both complex and unreliable.
  
 <code php> <code php>
Line 122: Line 129:
 </code> </code>
  
-All three examples would be incorrectly considered "untainted". The first two need the values to be quoted. The third example, //htmlentities()// does not escape single quotes by default before PHP 8.1 ([[https://github.com/php/php-src/commit/50eca61f68815005f3b0f808578cc1ce3b4297f0|fixed]]), and it does not consider the issue of 'javascript:' URLs.+All three examples would be incorrectly considered "safe(untainted). The first two need the values to be quoted. The third example, //htmlentities()// does not escape single quotes by default before PHP 8.1 ([[https://github.com/php/php-src/commit/50eca61f68815005f3b0f808578cc1ce3b4297f0|fixed]]), and it does not consider the issue of 'javascript:' URLs.
  
-In comparison, //is_literal()// doesn't have an equivalent of //untaint()//, or support escaping. Instead PHP will set the literal flag, and as soon as the value has been manipulated or includes anything that is not from a literal (e.g. user data), the literal flag is lost.+In comparison, //is_literal()// doesn't have an equivalent of //untaint()//, or support escaping. Instead PHP will set the //is_literal// flag, and as soon as the value has been manipulated or includes anything that is not a literal (e.g. user data), the //is_literal// flag is removed.
  
-This allows libraries to use //is_literal()// to identify when they are provided a sensitive value that should not include user input. Then it's up to the library to handle the escaping (if it's even needed). The "Future Scope" section notes how native functions will be able to use the literal flag as well.+This allows libraries to use //is_literal()// to check the sensitive values they receive from the developer. Then it's up to the library to handle the escaping (if it's even needed). The "Future Scope" section notes how native functions would be able to use the //is_literal// flag as well.
  
 ==== Education ==== ==== Education ====
  
-**Why not educate everyone?** You can't - developer training simply does not scale, and mistakes still happen.+**Why not educate everyone instead?** 
 + 
 +You can't - developer training simply does not scale, and mistakes still happen.
  
 We cannot expect everyone to have formal training, know everything from day 1, and consider programming a full time job. We want new programmers, with a variety of experiences, ages, and backgrounds. Everyone should be guided to do the right thing, and notified as soon as they make a mistake (we all make mistakes). We also need to acknowledge that many programmers are busy, do copy/paste code, don't necessarily understand what it does, edit it for their needs, then simply move on to their next task. We cannot expect everyone to have formal training, know everything from day 1, and consider programming a full time job. We want new programmers, with a variety of experiences, ages, and backgrounds. Everyone should be guided to do the right thing, and notified as soon as they make a mistake (we all make mistakes). We also need to acknowledge that many programmers are busy, do copy/paste code, don't necessarily understand what it does, edit it for their needs, then simply move on to their next task.
Line 136: Line 145:
 ==== Static Analysis ==== ==== Static Analysis ====
  
-**Why not use static analysis?** It will never be used by most developers.+**Why not use static analysis?** 
 + 
 +Ultimately it will never be used by most developers.
  
 I still agree with [[https://news-web.php.net/php.internals/109192|Tyson Andre]], you should use Static Analysis, but it's an extra step that most programmers cannot be bothered to do, especially those who are new to programming (its usage tends to be higher among those writing well-tested libraries). I still agree with [[https://news-web.php.net/php.internals/109192|Tyson Andre]], you should use Static Analysis, but it's an extra step that most programmers cannot be bothered to do, especially those who are new to programming (its usage tends to be higher among those writing well-tested libraries).
  
-Also, these tools currently focus on other issues (type checking, basic logic flaws, code formatting, etc), rarely attempting to address injection vulnerabilities. Those that do are [[https://github.com/vimeo/psalm/commit/2122e4a1756dac68a83ec3f5abfbc60331630781|often incomplete]], need sinks specified on all library methods (unlikely to happen), and are not enabled by default. For example, Psalm, even in its strictest errorLevel (1), and running //--taint-analysis// (I bet you don't use this), will not notice the missing quote marks in this SQL, and incorrectly assume it's safe:+Also, these tools currently focus on other issues (type checking, basic logic flaws, code formatting, etc), rarely attempting to address Injection Vulnerabilities. Those that do are [[https://github.com/vimeo/psalm/commit/2122e4a1756dac68a83ec3f5abfbc60331630781|often incomplete]], need sinks specified on all library methods (unlikely to happen), and are not enabled by default. For example, Psalm, even in its strictest errorLevel (1), and running //--taint-analysis// (rarely used), will not notice the missing quote marks in this SQL, and incorrectly assume it's safe:
  
 <code php> <code php>
Line 152: Line 163:
 ==== Performance ==== ==== Performance ====
  
-**What about the performance impact?** Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], and the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|preliminary testing on this implementation]] has found a 0.124% performance hit for the Laravel Demo app, and 0.161% for Symfony (rounds 4-6, which involved 5000 requests). These tests do not connect to a database, as the variability introduced makes it impossible to measure the difference.+**What about the performance impact?**
  
-There is more severe 3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]], but this is not representative of typical PHP script (it's not normal to concatenate strings5 million times, with no other actions).+Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], and the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|preliminary testing on this implementation]] has found 0.124% performance hit for the Laravel Demo app, and 0.161% for Symfony (rounds 4-6which involved 5000 requests). These tests do not connect to a database, as the variability introduced makes it impossible to measure that low of a difference.
  
-Joe Watkins has also noted that further optimisations are possible (the implementation has focused on making it work).+The full stress-test is 3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]], but this is not representative of a typical PHP script (it's not normal to concatenate 4 strings, 5 million times, with no other actions).
  
 ==== String Concatenation ==== ==== String Concatenation ====
Line 162: Line 173:
 **Is string concatenation supported?** **Is string concatenation supported?**
  
-Yes. The literal flag is preserved when two literal strings are concatenated; this makes it easier to use //is_literal()//, especially by developers that use concatenation for their SQL/HTML/CLI/etc.+Yes. The //is_literal// flag is preserved when two literal values are concatenated; this makes it easier to use //is_literal()//, especially by developers that use concatenation for their SQL/HTML/CLI/etc.
  
-Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further, and doing so might help developers with debugging. The idea was to require everyone to use special //literal_concat()// and //literal_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes.+Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further. The idea was to require everyone to use special //literal_concat()// and //literal_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes.
  
 Performance wise, my [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/tests|simplistic testing]] found there was still [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/local.pdf|a small impact without run-time concat]]: Performance wise, my [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/tests|simplistic testing]] found there was still [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/local.pdf|a small impact without run-time concat]]:
Line 174: Line 185:
     Website with 22 SQL queries: Inconclusive, too variable.     Website with 22 SQL queries: Inconclusive, too variable.
  
-(This is because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// which needs to remove the literal flag. Also "zend_vm_def.h" does the same; and supports a quick concat with an empty string (x2), which would need its flag removed as well).+(Under The Hood: This is because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// which needs to remove the //is_literal// flag. Also "zend_vm_def.h" does the same; and supports a quick concat with an empty string (x2), which would need its flag removed as well).
  
 And by supporting both forms of concatenation, it makes it easier for developers to understand (many are not aware of the difference). And by supporting both forms of concatenation, it makes it easier for developers to understand (many are not aware of the difference).
Line 180: Line 191:
 ==== String Splitting ==== ==== String Splitting ====
  
-**Why don't you support string splitting?** In short, we can't find any use cases (security features should try to keep the implementation as simple as possible).+**Why don't you support string splitting then?**
  
-Also, the security considerations are different. Concatenation joins known/fixed units together, whereas if you're starting with a "developer created string" (which is trusted), and the program allows the evil user to split the string (e.g. setting the length in substr), then they get considerable control over the result (it creates an untrusted modification).+In short, we can't find any real use cases (security features should try to keep the implementation as simple as possible). 
 + 
 +Also, the security considerations are different. Concatenation joins known/fixed units together, whereas if you're starting with a literal string, and the program allows the Evil-User to split the string (e.g. setting the length in substr), then they get considerable control over the result (it creates an untrusted modification).
  
 These are unlikely to be written by a programmer, but consider these: These are unlikely to be written by a programmer, but consider these:
Line 194: Line 207:
 If that URL was used in a Content-Security-Policy, then it's necessary to remove the query string, but as more of the string is removed, the more resources can be included ("https:" basically allows resources from anywhere). With the HTML example, moving from the tag content to the attribute can be a problem (technically the HTML Templating Engine should be fine, but unfortunately libraries like Twig are not currently context aware, so you need to change from the default 'html' encoding to explicitly using 'html_attr' encoding). If that URL was used in a Content-Security-Policy, then it's necessary to remove the query string, but as more of the string is removed, the more resources can be included ("https:" basically allows resources from anywhere). With the HTML example, moving from the tag content to the attribute can be a problem (technically the HTML Templating Engine should be fine, but unfortunately libraries like Twig are not currently context aware, so you need to change from the default 'html' encoding to explicitly using 'html_attr' encoding).
  
-Or in other words; trying to determine if the //literal// flag should be passed through functions like //substr()// is difficult. Having a security feature be difficult to reason about, gives a much higher chance of mistakes.+Or in other words; trying to determine if the //is_literal// flag should be passed through functions like //substr()// is complex. Having a security feature be difficult to reason about, gives a much higher chance of mistakes.
  
 Krzysztof Kotowicz has confirmed that, at Google, with "go-safe-html", splitting is explicitly not supported because it "can cause issues"; for example, "arbitrary split position of a HTML string can change the context". Krzysztof Kotowicz has confirmed that, at Google, with "go-safe-html", splitting is explicitly not supported because it "can cause issues"; for example, "arbitrary split position of a HTML string can change the context".
Line 200: Line 213:
 ==== WHERE IN ==== ==== WHERE IN ====
  
-**What about an undefined number of parameters, e.g. //WHERE id IN (?, ?, ?)//?** You should already be following the advice from [[https://stackoverflow.com/a/23641033/538216|Levi Morrison]], [[https://www.php.net/manual/en/pdostatement.execute.php#example-1012|PDO Execute]], and [[https://www.drupal.org/docs/7/security/writing-secure-code/database-access#s-multiple-arguments|Drupal Multiple Arguments]]:+**What about an undefined number of parameters, e.g. //WHERE id IN (?, ?, ?)//?** 
 + 
 +You can follow the advice from [[https://stackoverflow.com/a/23641033/538216|Levi Morrison]], [[https://www.php.net/manual/en/pdostatement.execute.php#example-1012|PDO Execute]], and [[https://www.drupal.org/docs/7/security/writing-secure-code/database-access#s-multiple-arguments|Drupal Multiple Arguments]], and implement as such:
  
 <code php> <code php>
Line 206: Line 221:
 </code> </code>
  
-Or, if you prefer to use concatenation:+Or, you could use concatenation:
  
 <code php> <code php>
Line 215: Line 230:
 </code> </code>
  
-This pushes everyone to use parameters properly; rather than using implode() on user values, and including them directly in the SQL (which is easy to get wrong).+And libraries can easily abstract this for the developer.
  
 ==== Non-Parameterised Values ==== ==== Non-Parameterised Values ====
  
-**How can this work with Table and Field names in SQL, which cannot use parameters?** They are often in variables written as literals anyway (so no changes needed); and if they are dependent on user input, you //can// and //should// use literals:+**How can this work with Table and Field names in SQL, which cannot use parameters?** 
 + 
 +They are often in variables written as literal strings anyway (so no changes needed); and if they are dependent on user input, in most cases you can (and shoulduse literals:
  
 <code php> <code php>
Line 237: Line 254:
 ==== Non-Literal Values ==== ==== Non-Literal Values ====
  
-**How does this work in cases where you can't use literals?**+**How does this work in cases where you can't use literal values?**
  
-For example [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]] noted that some Systems/Frameworks currently define some variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml).+For example [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]] noted that some Systems/Frameworks currently define some variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml). And Larry Garfield noted that in Drupal's ORM "the table name itself is user-defined" (not in the PHP script).
  
-And Larry Garfield noted that in Drupal's ORM "the table name itself is user-defined" (not in the PHP script). +While most systems can use literal values entirely, these special non-literal values should still be handled separately (and carefully). This approach allows the library to ensure the majority of the input (SQL) is a literal, and then it can consistently check/escape those special values (e.g. does it match a valid table/field name, which can be included safely).
- +
-While most systems can use literals entirely, these special non-literal values should still be handled separately (and carefully). This approach allows the library to ensure the majority of the input (SQL) is a literal, then it can consistently check/escape those special values (e.g. does it match a valid table/field name, which can be included safely).+
  
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L194|How this can be done with aliases]], or the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L229|example Query Builder]]. [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L194|How this can be done with aliases]], or the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L229|example Query Builder]].
  
-==== Usage by Libraries ====+==== Faking It ====
  
-**Could libraries use is_literal() internally?** Yes, they could.+**What if I really really need to mark a value as a literal?**
  
-It would be fantastic if they did use additional //is_literal()// checks after receiving the strings from developers (it ensures they haven't introduced vulnerability either); but this isn't a prioritysimply because libraries are rarely the source of Injection Vulnerabilities.+This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark all escaped values as a literal (seeing it as safe valuewhich is [[#taint_checking|wrong]]).
  
-That said, consider the Drupalgeddon vulnerability; where //$db->expandArguments()// allowed unsafe/non-literal values to be used as placeholders with //IN (:arg_0, :arg_1)//By using something like the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L229|example Query Builder]], //is_literal()// would have been used to check the raw SQLand the field/parameter names (which are not literals in this case) get checked and appended separately.+That said, we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-literal-bypass.php|var_export]])but doing so is clearly the developer doing something wrong. We want to provide safety railsbut there is nothing stopping the developer from jumping over them if that's their choice.
  
-Zend also had a couple of issues with ORDER BY, where it didn't check the inputs either ([[https://framework.zend.com/security/advisory/ZF2014-04|1]]/[[https://framework.zend.com/security/advisory/ZF2016-03|2]]).+==== Usage by Libraries ====
  
-==== Naming ====+**How can libraries use is_literal()?**
  
-**Why is it called is_literal()?** A "Literal String" is the standard name for strings in source code. See [[https://www.google.com/search?q=what+is+literal+string+in+php|Google]].+The main focus is on values that developers provide to the library, this [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|example library]] shows how certain sensitive values are checked as they are received, where it just uses basic warnings by default, could raise exceptions, or have the checks turned off on a per query basis (or entirely). Libraries could choose to only run these checks in development mode (and turned off in production), or do additional checks to see if the value is likely to be an issue (e.g. value matches a field name), or write to a log, or report via an API/email, etc.
  
-> A string literal is the notation for representing a string value within the text of computer program. In PHPstrings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax...+They could also use additional //is_literal()// checks later in the process (internally), to ensure the library hasn't introduced vulnerability either; but this isn't a prioritysimply because libraries are rarely the source of Injection Vulnerabilities.
  
-Alternative suggestions have included //is_from_literal()// from [[https://news-web.php.net/php.internals/109197|Jakob Givoni]], and references to alternative implementations like "compile time constants" and "code string".+==== Integer Values ====
  
-We cannot call it //is_safe_string()//, because we cannot say that string is safe:+We wanted to flag integers defined in the source code, in the same way we are doing with strings. Unfortunately [[https://news-web.php.net/php.internals/114964|it would require big change to add a literal flag on integers]]. Changing how integers work internally would have made a big performance impact, and potentially affected every part of PHP (including extensions).
  
-<code php> +Due to this limitation, we considered an approach to trust all integers. It was noted that existing code and tutorials already use integers directly. While this is not as philosophically pure, we continued to explore this possibility because we could not find any way that an Injection Vulnerability could be introduced with integers in SQL, HTML, CLIand other contexts as well (e.gpreg, mail additional_params, XPath query, and even eval).
-$cli = 'rm -rf ?'; +
-$sql = 'DELETE FROM my_table WHERE my_date >= ?'; +
-eval('$name = "' $_GET['name''";'); // INSECURE +
-</code>+
  
-While the first two cannot include Injection Vulnerabilities, the parameters could be set to "/" or "0000-00-00" (providing a nice vanishing magic trick); and the last onewellthey have much bigger issues to worry about (it'clearly irresponsible, and intentionally dangerous).+We could not find any character encoding issues either (The closest we could find was EBCDICan old IBM character encoding, which encodes the 0-9 characters differently; which anyone using it would need to re-encode either way, and [[https://www.php.net/manual/en/migration80.other-changes.php#migration80.other-changes.ebcdic|EBCDIC is not supported by PHP]]). And we could not find any issue with a 64bit PHP server sending a large number to a 32bit databasebecause the number is being encoded as characters in a stringso that'also fine.
  
-Alsothis name is unlikely to clash with any userland functions.+Howeverthe feedback received on the Internals mailing list was that while safe from Injection Vulnerabilities it might cause developers to assume them to be safe from developer/logic errors, and ultimately the preference was the simpler approach, that did not allow integers from any source.
  
-==== Integer Values ====+==== Other Values ====
  
-**Can you support Integer values?** This is being considered. +**Why don'you support Boolean/Float values?**
- +
-Matthew Brown wants to support integer values, simply because so much code already includes them, and we cannot find a single way that integers can cause issues from an Injection Vulnerability point of view (but if anyone can, we absolutely welcome their input). +
- +
-==== Other Values ====+
  
-**Why don't you support Boolean/Float values?** It's a very low value feature, and we cannot be sure of the security implications.+It's a very low-value feature, and we cannot be sure of the security implications.
  
-For example, the value you put in often is not always the same as what you get out:+For example, the value you put in is not always the same as what you get out:
  
 <code php> <code php>
Line 299: Line 306:
  // Pre 8.0 this also happened with string casting.  // Pre 8.0 this also happened with string casting.
 </code> </code>
 +
 +==== Naming ====
 +
 +**Why is it called is_literal()?**
 +
 +A "Literal String" is the standard name for strings in source code. See [[https://www.google.com/search?q=what+is+literal+string+in+php|Google]].
 +
 +> A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax.
 +
 +We also need to keep to a single word name (to support a dedicated type in the future).
  
 ==== Support Functions ==== ==== Support Functions ====
  
-**What about other support functions?** We did consider //literal_concat()// and //literal_implode()// functions (see [[#string_concatenation|String Concatenation]] above), but these can be userland functions:+**What about other support functions?** 
 + 
 +We did consider //literal_concat()// and //literal_implode()// functions (see [[#string_concatenation|String Concatenation]] above), but these can be userland functions:
  
 <code php> <code php>
Line 310: Line 329:
       // You will probably only want to raise       // You will probably only want to raise
       // an exception on your development server.       // an exception on your development server.
-    throw new Exception('Non-literal detected!');+    throw new Exception('Non-literal value detected!');
   }   }
   return $return;   return $return;
Line 334: Line 353:
 </code> </code>
  
-If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, and highlight exactly where the mistake happened:+If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, stopping script execution, and highlight exactly where the mistake happened:
  
 <code php> <code php>
Line 342: Line 361:
 ==== Other Functions ==== ==== Other Functions ====
  
-**Why not support other string functions?** We might do, but like [[#string_splitting|String Splitting]], we can't find any use cases, and don't want to make this complicated (just identifying strings defined in the PHP source code). For example //strtoupper()// might be reasonable, but we will need to consider how it would be used (good and bad), and check for any oddities (e.g. output varying based on the current locale). Also, functions like //str_shuffle()// create unpredictable results.+**Why not support other string functions?**
  
-==== Faking it ====+Like [[#string_splitting|String Splitting]], we can't find any real use cases, and don't want to make this complicated. For example //strtoupper()// might be reasonable, but we would need to consider how it would be used, and check for any oddities (e.g. output varying based on the current locale). Also, functions like //str_shuffle()// create unpredictable results.
  
-**What happens if I really want a non-literal to appear as one?**+==== Limitations ====
  
-This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark escaped values as a literal, incorrectly seeing this as a "safe" flag.+**Does this mean the value is completely safe?**
  
-That said, we do not pretend there aren't ways around this (e.gusing var_export), but doing so is clearly the developer doing something wrongWe want to provide safety railsthere is nothing stopping the developer from jumping over them.+While these values are not at risk of containing an Injection Vulnerability, obviously they cannot be completely safe from every kind of developer/logic issue, For example: 
 + 
 +<code php> 
 +$cli = 'rm -rf ?'; // RISKY 
 +$sql = 'DELETE FROM my_table WHERE my_date >= ?'; // RISKY 
 +</code> 
 + 
 +The parameters could be set to "/" or "0000-00-00", which can result in deleting a lot more data than expected. 
 + 
 +There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table. 
 + 
 +==== Compiler Optimisations ==== 
 + 
 +The implementation has been updated to avoid situations that could have confused the developer
 + 
 +<code php> 
 +$one = 1; 
 +$a = 'A' $one; // false, flag removed because it's being concatenated with an integer. 
 +$b = 'A' . 1; // Was true, as the compiler optimised this to the literal 'A1'
 + 
 +$a = "Hello "; 
 +$b = $a . 2; // Was trueas the 2 was coerced to the string '2' (to optimise the concatenation). 
 + 
 +$a = implode("-", [1, 2, 3]); // Was true with OPcache, as it could optimise this to the literal '1-2-3' 
 + 
 +$a = chr(97); // Was true, due to the use of Interned Strings. 
 +</code> 
 + 
 +This has been achieved by using the Lexer to mark strings as a literal (i.e. earlier in the process).
  
 ==== Extensions ==== ==== Extensions ====
  
-**Extensions create and manipulate strings, won't this break the literal flag?** Strings have multiple flags already, and are off by defaultthis is the correct behaviour when extensions create their own strings (should not be considered a literal). If an extension is found to be changing the literal flag incorrectly (unlikely), that's the same as any new flag being introduced, and will need to be fixed in the same way.+**Extensions create and manipulate strings, won't this break the flag on strings?** 
 + 
 +Strings have multiple flags already that are off by default this is the correct behaviour when extensions create their own strings (should not be flagged as a literal). If an extension is found to be already using the flag we're using for is_literal (unlikely), that's the same as any new flag being introduced into PHP, and will need to be updated in the same way.
  
 ==== Reflection API ==== ==== Reflection API ====
  
-**Why don't you use the reflection API?** It currently allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the literal was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for these checks happening in native functions (see "Future Scope").+**Why don't you use the Reflection API?** 
 + 
 +This allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the literal value was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for a dedicated type, nor could native functions work with this (see "Future Scope").
  
-===== Previous Work =====+===== Previous Examples =====
  
 **Go** programs can use "ScriptFromConstant" to express the concept of a "compile time constant" ([[https://blogtitle.github.io/go-safe-html/|more details]]). **Go** programs can use "ScriptFromConstant" to express the concept of a "compile time constant" ([[https://blogtitle.github.io/go-safe-html/|more details]]).
Line 407: Line 458:
 ===== Open Issues ===== ===== Open Issues =====
  
-  - Supporting Integers/Interned values. +None
-  - The name. +
- +
-===== Unaffected PHP Functionality ===== +
- +
-None known+
  
 ===== Future Scope ===== ===== Future Scope =====
  
-1) As noted by someniatko and Matthew Brown, having a dedicated type would be useful in the future, as "it would serve clearer intent", which can be used by IDEs, Static Analysis, etc. It was agreed to do via a separate RFC as it leads into the next point...+1) As noted by someniatko and Matthew Brown, having a dedicated type would be useful in the future, as "it would serve clearer intent", which can be used by IDEs, Static Analysis, etc. It was [[https://externals.io/message/114835#114847|agreed we would add this type later]], via a separate RFC, so this RFC can focus on the //is_literal// flag, and provide libraries a simple backwards-compatible function, where they can decide how to handle non-literal values.
  
 2) As noted by MarkR, the biggest benefit will come when this flag can be used by PDO and similar functions (//mysqli_query//, //preg_match//, //exec//, etc). 2) As noted by MarkR, the biggest benefit will come when this flag can be used by PDO and similar functions (//mysqli_query//, //preg_match//, //exec//, etc).
  
-First we need libraries to start using //is_literal()// to check their inputs, and use the appropriate escaping. This can result in strings that are no longer literals, but can still be trusted.+However, first we need libraries to start using //is_literal()// to check their inputs. The library can then do their thing, and apply the appropriate escaping, which can result in a value that no longer has the //is_literal// flag set, but is perfectly safe for the native functions.
  
-Then, with a future RFC, we can introduce checks for the native functions. By using the [[https://web.dev/trusted-types/|Trusted Types]] concept from JavaScript (which protects [[https://www.youtube.com/watch?v=po6GumtHRmU&t=92s|60+ Injection Sinks]], like innerHTML), the libraries create a stringable object as their output. These objects would be marked as "trusted" (because the library is sure they do not contain any Injection Vulnerabilities). The native functions can then **warn** developers when they do not receive a literal, or one of these trusted objects. These warnings would not **break anything**, they just make developers aware of the mistakes they have made.+With a future RFC, we could potentially introduce checks for the native functions. For example, if we use the [[https://web.dev/trusted-types/|Trusted Types]] concept from JavaScript (which protects [[https://www.youtube.com/watch?v=po6GumtHRmU&t=92s|60+ Injection Sinks]], like innerHTML), the libraries create a stringable object as their output. These objects can be added to a list of safe objects for the relevant native functions. The native functions could then **warn** developers when they do not receive a value with the //is_literal// flag, or one of the safe objects. These warnings would **not break anything**, they just make developers aware of the mistakes they have made, and we will always need a way of switching them off entirely (e.g. phpMyAdmin).
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
Line 438: Line 484:
 ===== Rejected Features ===== ===== Rejected Features =====
  
-N/A+  - [[#integer_values|Supporting Integers]]
  
 ===== Thanks ===== ===== Thanks =====
  
-  - **Dan Ackroyd**, DanAck, for starting the [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|first implementation]]which made this a reality, providing //literal_concat()// and //literal_implode()//, and followup on how it should work. +  - **Joe Watkins**, krakjoe, for writing the full implementation, including support for concatenation and integers, and helping me though the RFC process.
-  - **Joe Watkins**, krakjoe, for finding how to set the literal flag, and creating the implementation that supports string concat.+
   - **Máté Kocsis**, mate-kocsis, for setting up and doing the performance testing.   - **Máté Kocsis**, mate-kocsis, for setting up and doing the performance testing.
 +  - **Scott Arciszewski**, CiPHPerCoder, for checking over the RFC, and provided text on how we could implement integer support under a //is_noble()// name.
 +  - **Dan Ackroyd**, DanAck, for starting the [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|first implementation]], which made this a reality, providing //literal_concat()// and //literal_implode()//, and followup on how it should work.
   - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].   - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].
   - **Rowan Francis**, for proof-reading, and helping me make an RFC that contains readable English.   - **Rowan Francis**, for proof-reading, and helping me make an RFC that contains readable English.
   - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries.   - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries.
-  - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the "GC_PROTECTED flag for strings", which allowed Dan to start the first implementation.+  - **Nikita Popov**, NikiC, for suggesting where the flag could be stored. Initially this was going to be the "GC_PROTECTED flag for strings", which allowed Dan to start the first implementation.
   - **Mark Randall**, MarkR, for suggestions, and noting that "interned strings in PHP have a flag", which started the conversation on how this could be implemented.   - **Mark Randall**, MarkR, for suggestions, and noting that "interned strings in PHP have a flag", which started the conversation on how this could be implemented.
   - **Sara Golemon**, SaraMG, for noting how this RFC had to explain how //is_literal()// is different to the flawed Taint Checking approach, so we don't get "a false sense of security or require far too much escape hatching".   - **Sara Golemon**, SaraMG, for noting how this RFC had to explain how //is_literal()// is different to the flawed Taint Checking approach, so we don't get "a false sense of security or require far too much escape hatching".
  
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis