rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:is_literal [2021/07/04 16:34] krakjoerfc:is_literal [2022/02/14 00:36] (current) – Add some more examples from other languages craigfrancis
Line 1: Line 1:
 ====== PHP RFC: Is_Literal ====== ====== PHP RFC: Is_Literal ======
  
-  * Version: 1.0 +  * Version: 1.1 
-  * Date: 2020-03-21 +  * Voting Start: 2021-07-05 19:30 BST / 18:30 UTC 
-  * Updated: 2021-06-06+  * Voting End: 2021-07-19 19:30 BST / 18:30 UTC 
 +  * RFC Started: 2020-03-21 
 +  * RFC Updated: 2021-07-04
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
   * Contributors: Joe Watkins, Máté Kocsis   * Contributors: Joe Watkins, Máté Kocsis
-  * Status: Under Discussion+  * Status: Voting
   * First Published at: https://wiki.php.net/rfc/is_literal   * First Published at: https://wiki.php.net/rfc/is_literal
   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc
 +  * Implementation: https://github.com/php/php-src/compare/master...krakjoe:literals
  
 ===== Introduction ===== ===== Introduction =====
  
-Add the function //is_literal()//, a lightweight and effective way to identify if a string was written by a developer, to remove the risk of a variable containing an Injection Vulnerability.+Add the function //is_literal()//, a lightweight and effective way to identify if a string was written by a developer, removing the risk of a variable containing an Injection Vulnerability.
  
-It's a simple process where a flag is set internally on strings that have been written by a developer (as opposed to a user) and the flag persists through concatenation with other 'literal' strings. The function checks the flag is present and thus no user data is included.+It's a simple process where a flag is set internally on strings that have been written by a developer (as opposed to a user), where the flag persists through concatenation with other 'literal' strings. The function checks the flag is present and thus no user data is included.
  
 It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries. It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries.
  
-//is_literal()// would be used by libraries, as they require certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:+//is_literal()// can be used by libraries to deal with a difficult problem - developers using them incorrectly. Libraries expect certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:
  
 <code php> <code php>
Line 34: Line 37:
 Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**. Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**.
  
-We like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.+With SQL Injection, it just takes 1 mistake, and the attacker can usually read everything in the database (SQL Map, Havij, jSQL, etc). 
 + 
 +When it comes to coding, we like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.
  
 It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact: It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact:
Line 62: Line 67:
 **RedBean** (Gabor de Mooij): "You can list RedBeanPHP as a supporter, we will implement this into the core." **RedBean** (Gabor de Mooij): "You can list RedBeanPHP as a supporter, we will implement this into the core."
  
-**Psalm** (Matthew Brown): "I've just added support for a //literal-string// type to Psalm: https://psalm.dev/r/9440908f39"+**Psalm** (Matthew Brown): 13th June "I was skeptical about the first draft of this RFC when I saw it last month, but now I see the light (especially with the concat changes)". Then on the 14th June, "I've just added support for a //literal-string// type to Psalm: https://psalm.dev/r/9440908f39" ([[https://github.com/vimeo/psalm/releases/tag/4.8.0|4.8.0]]) 
 + 
 +**PHPStan** (Ondřej Mirtes): 1st September, has been implemented in [[https://github.com/phpstan/phpstan/releases/tag/0.12.97|0.12.97]].
  
 ===== Proposal ===== ===== Proposal =====
Line 68: Line 75:
 Add the function //is_literal()//. Add the function //is_literal()//.
  
-A string shall pass the is_literal check if it was defined by the programmer in source code, or is the result of a function or instruction whose inputs would all pass the is_literal check.+A string shall pass the //is_literal// check if it was defined by the programmer in source code, or is the result of a function or instruction whose inputs would all pass the //is_literal// check.
  
 Concatenation instructions and the following string functions are therefore able to produce literals: Concatenation instructions and the following string functions are therefore able to produce literals:
  
-   //str_repeat()// +  - //str_repeat()// 
-   * //str_pad()// +  //str_pad()// 
-   * //implode()// +  //implode()// 
-   * //join()//+  //join()//
  
-//Namespaces constructed for the programmer by the compiler will also be marked literal for convenience.//+(Namespaces constructed for the programmer by the compiler will also be marked literal for convenience.)
  
 <code php> <code php>
Line 109: Line 116:
 ===== Try It ===== ===== Try It =====
  
-[[https://3v4l.org/#focus=rfc.literals|Test it out on 3v4l.org]] - Currently using the function name //is_noble()//, while we tested integer support.+[[https://3v4l.org/#focus=rfc.literals|Test it out on 3v4l.org]]
  
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using). [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using).
Line 163: Line 170:
 **What about the performance impact?** **What about the performance impact?**
  
-Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], and the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|preliminary testing on this implementation]] has found a 0.124performance hit for the Laravel Demo app, and 0.161% for Symfony (rounds 4-6, which involved 5000 requests). These tests do not connect to a database, as the variability introduced makes it impossible to measure that low of a difference+Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], the preliminary results found a 0.47impact with the Symfony demo app (it did not connect to a database, as the variability introduced would make it impossible to measure the difference).
- +
-The full stress-test is 3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]], but this is not representative of a typical PHP script (it's not normal to concatenate 4 strings, 5 million times, with no other actions).+
  
 ==== String Concatenation ==== ==== String Concatenation ====
Line 175: Line 180:
 Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further. The idea was to require everyone to use special //literal_concat()// and //literal_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes. Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further. The idea was to require everyone to use special //literal_concat()// and //literal_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes.
  
-Performance wise, my [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/tests|simplistic testing]] found there was still [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/local.pdf|a small impact without run-time concat]]+Performance wise, my [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/tests|simplistic testing]] found there was still [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/local.pdf|a small impact without run-time concat]].
- +
-    Laravel Demo App: +0.30% with, vs +0.18% without. +
-    Symfony Demo App: +0.06% with, vs +0.06% without. +
-    My Concat Test:   +4.36% with, vs +2.23% without. +
-    - +
-    Website with 22 SQL queries: Inconclusive, too variable.+
  
 > (Under The Hood: This is because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// which needs to remove the //is_literal// flag. Also "zend_vm_def.h" does the same; and supports a quick concat with an empty string (x2), which would need its flag removed as well). > (Under The Hood: This is because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// which needs to remove the //is_literal// flag. Also "zend_vm_def.h" does the same; and supports a quick concat with an empty string (x2), which would need its flag removed as well).
Line 227: Line 226:
 } }
 </code> </code>
 +
 +And libraries can easily abstract this for the developer.
  
 ==== Non-Parameterised Values ==== ==== Non-Parameterised Values ====
Line 264: Line 265:
 This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark all escaped values as a literal (seeing it as a safe value, which is [[#taint_checking|wrong]]). This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark all escaped values as a literal (seeing it as a safe value, which is [[#taint_checking|wrong]]).
  
-That said, we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-trusted-bypass.php|var_export]]), but doing so is clearly the developer doing something wrong. We want to provide safety rails, but there is nothing stopping the developer from jumping over them if that's their choice.+That said, we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-literal-bypass.php|var_export]]), but doing so is clearly the developer doing something wrong. We want to provide safety rails, but there is nothing stopping the developer from jumping over them if that's their choice.
  
 ==== Usage by Libraries ==== ==== Usage by Libraries ====
Line 349: Line 350:
 </code> </code>
  
-If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, and highlight exactly where the mistake happened:+If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, stopping script execution, and highlight exactly where the mistake happened:
  
 <code php> <code php>
Line 375: Line 376:
  
 There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table. There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table.
 +
 +==== Compiler Optimisations ====
 +
 +The implementation has been updated to avoid situations that could have confused the developer:
 +
 +<code php>
 +$one = 1;
 +$a = 'A' . $one; // false, flag removed because it's being concatenated with an integer.
 +$b = 'A' . 1; // Was true, as the compiler optimised this to the literal 'A1'.
 +
 +$a = "Hello ";
 +$b = $a . 2; // Was true, as the 2 was coerced to the string '2' (to optimise the concatenation).
 +
 +$a = implode("-", [1, 2, 3]); // Was true with OPcache, as it could optimise this to the literal '1-2-3'
 +
 +$a = chr(97); // Was true, due to the use of Interned Strings.
 +</code>
 +
 +This has been achieved by using the Lexer to mark strings as a literal (i.e. earlier in the process).
  
 ==== Extensions ==== ==== Extensions ====
Line 390: Line 410:
 ===== Previous Examples ===== ===== Previous Examples =====
  
-**Go** programs can use "ScriptFromConstantto express the concept of "compile time constant" ([[https://blogtitle.github.io/go-safe-html/|more details]]).+**Go** can use an "[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/go/index.go|un-exported string type]]"technique which is used by [[https://blogtitle.github.io/go-safe-html/|go-safe-html]]
 + 
 +**C++** can use a "[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/cpp/index.cpp|consteval annotation]]"
 + 
 +**Rust** can use a "[[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/others/rust|procedural macro]]", to check the provided value is a literal at compile time (a bit complicated)
 + 
 +**Java** can use a "[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/java/src/main/java/com/example/isliteral/index.java|@CompileTimeConstant annotation]]" from [[https://errorprone.info/bugpattern/CompileTimeConstant|Error Prone]] to ensure method parameters can only use "compile-time constant expressions".
  
-**Java** can use [[https://errorprone.info/|Error Prone]] with [[https://errorprone.info/bugpattern/CompileTimeConstant|@CompileTimeConstant]] to ensure method parameters can only use "compile-time constant expressions".+**Node** has the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm/index.js|is-template-object polyfill]], which checks a tag function was provided a "tagged template literal" (this technique is used in [[https://www.npmjs.com/package/safesql|safesql]], via [[https://www.npmjs.com/package/template-tag-common|template-tag-common]]). Alternatively Node developers can use [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm-closure-library/index.js|goog.string.Const]] from Google's Closure Library.
  
 **JavaScript** is getting [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]], for "Distinguishing strings from a trusted developer from strings that may be attacker controlled" (intended to be [[https://github.com/mikewest/tc39-proposal-literals|used with Trusted Types]]). **JavaScript** is getting [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]], for "Distinguishing strings from a trusted developer from strings that may be attacker controlled" (intended to be [[https://github.com/mikewest/tc39-proposal-literals|used with Trusted Types]]).
  
-**Perl** has a [[https://perldoc.perl.org/perlsec#Taint-mode|Taint Mode]], via the -T flag, where all input is marked as "tainted", and cannot be used by some methods (like commands that modify files), unless you use a regular expression to match and return known-good values (where regular expressions are easy to get wrong).+**Perl** has a [[https://perldoc.perl.org/perlsec#Taint-mode|Taint Mode]], via the -T flag, where all input is marked as "tainted", and cannot be used by some methods (like commands that modify files), unless you use a regular expression to match and return known-good values (regular expressions are easy to get wrong).
  
 There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[https://wiki.php.net/rfc/taint|a previous RFC proposing it be added to the language]] by Wietse Venema. There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[https://wiki.php.net/rfc/taint|a previous RFC proposing it be added to the language]] by Wietse Venema.
Line 435: Line 461:
 ===== Open Issues ===== ===== Open Issues =====
  
-==== Inconsistencies ==== +None
- +
-While compile-time and run-time concatenation with literal strings is consistent and works, when it comes to some developer-defined values (like integers, [[#integer_values|which we cannot flag]]), the compiling process can optimise a value so it's seen as a literal. +
- +
-As these optimisations happen during the compilation process, they cannot contain user data; but developers may wonder why a developer defined integer can sometimes be seen as a literal, e.g. +
- +
-<code php> +
-$one = 1; +
-is_literal('A' . $one); // false, flag removed with runtime concatenation of an integer. +
-is_literal('A' . 1); // true, compiler optimises this to the literal 'A1'+
- +
-$a = "Hello "; +
-$b = $a . 2; // The 2 is coerced to the string '2' to optimise the concatenation. +
-is_literal($b); // true +
- +
-$a = implode("-", [1, 2, 3]); +
-is_literal($a); // Normally false, but OPcache can optimise to the literal '1-2-3' +
-</code> +
- +
-==== Interned Strings ==== +
- +
-Output from //chr()// appears to be a literal. This was noticed by Claude Pache, and on a technical level is due to the [[https://news-web.php.net/php.internals/114877|use of Interned Strings]], an optimisation used by //RETURN_CHAR// that re-uses single character values. While this could be used as a way to intentionally [[#faking_it|fake a literal string]], it's unlikely to be used to create sensitive strings.+
  
 ===== Future Scope ===== ===== Future Scope =====
Line 468: Line 473:
 With a future RFC, we could potentially introduce checks for the native functions. For example, if we use the [[https://web.dev/trusted-types/|Trusted Types]] concept from JavaScript (which protects [[https://www.youtube.com/watch?v=po6GumtHRmU&t=92s|60+ Injection Sinks]], like innerHTML), the libraries create a stringable object as their output. These objects can be added to a list of safe objects for the relevant native functions. The native functions could then **warn** developers when they do not receive a value with the //is_literal// flag, or one of the safe objects. These warnings would **not break anything**, they just make developers aware of the mistakes they have made, and we will always need a way of switching them off entirely (e.g. phpMyAdmin). With a future RFC, we could potentially introduce checks for the native functions. For example, if we use the [[https://web.dev/trusted-types/|Trusted Types]] concept from JavaScript (which protects [[https://www.youtube.com/watch?v=po6GumtHRmU&t=92s|60+ Injection Sinks]], like innerHTML), the libraries create a stringable object as their output. These objects can be added to a list of safe objects for the relevant native functions. The native functions could then **warn** developers when they do not receive a value with the //is_literal// flag, or one of the safe objects. These warnings would **not break anything**, they just make developers aware of the mistakes they have made, and we will always need a way of switching them off entirely (e.g. phpMyAdmin).
  
-===== Proposed Voting Choices =====+===== Voting =====
  
-Accept the RFCYes/No+Accept the RFC 
 + 
 +<doodle title="is_literal" auth="craigfrancis" voteType="single" closed="true"> 
 +   Yes 
 +   No 
 +</doodle>
  
 ===== Implementation ===== ===== Implementation =====
  
 [[https://github.com/php/php-src/compare/master...krakjoe:literals|Joe Watkin's implementation]] [[https://github.com/php/php-src/compare/master...krakjoe:literals|Joe Watkin's implementation]]
- 
-===== References ===== 
- 
-N/A 
  
 ===== Rejected Features ===== ===== Rejected Features =====
  
-N/A+  - [[#integer_values|Supporting Integers]]
  
 ===== Thanks ===== ===== Thanks =====
rfc/is_literal.1625416479.txt.gz · Last modified: 2021/07/04 16:34 by krakjoe