rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:is_literal [2021/06/25 21:37] – Back to is_literal() craigfrancisrfc:is_literal [2022/02/14 00:36] (current) – Add some more examples from other languages craigfrancis
Line 1: Line 1:
 ====== PHP RFC: Is_Literal ====== ====== PHP RFC: Is_Literal ======
  
-  * Version: 1.0 +  * Version: 1.1 
-  * Date: 2020-03-21 +  * Voting Start: 2021-07-05 19:30 BST / 18:30 UTC 
-  * Updated: 2021-06-06+  * Voting End: 2021-07-19 19:30 BST / 18:30 UTC 
 +  * RFC Started: 2020-03-21 
 +  * RFC Updated: 2021-07-04
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
-  * Contributors: Joe Watkins, Máté Kocsis, Dan Ackroyd +  * Contributors: Joe Watkins, Máté Kocsis 
-  * Status: Under Discussion+  * Status: Voting
   * First Published at: https://wiki.php.net/rfc/is_literal   * First Published at: https://wiki.php.net/rfc/is_literal
   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc
 +  * Implementation: https://github.com/php/php-src/compare/master...krakjoe:literals
  
 ===== Introduction ===== ===== Introduction =====
  
-Add the function //is_literal()//, a lightweight and effective way to identify if a string was written by a developer, to remove the risk of a variable containing an Injection Vulnerability.+Add the function //is_literal()//, a lightweight and effective way to identify if a string was written by a developer, removing the risk of a variable containing an Injection Vulnerability.
  
-It's a simple process where a flag is set internally on strings that have been written by a developer (as opposed to a user) and the flag persists through concatenation with other 'literal' strings. The function checks the flag is present and thus no user data is included.+It's a simple process where a flag is set internally on strings that have been written by a developer (as opposed to a user), where the flag persists through concatenation with other 'literal' strings. The function checks the flag is present and thus no user data is included.
  
 It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries. It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries.
  
-//is_literal()// would be used by libraries, as they require certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:+//is_literal()// can be used by libraries to deal with a difficult problem - developers using them incorrectly. Libraries expect certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:
  
 <code php> <code php>
Line 34: Line 37:
 Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**. Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**.
  
-We like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.+With SQL Injection, it just takes 1 mistake, and the attacker can usually read everything in the database (SQL Map, Havij, jSQL, etc). 
 + 
 +When it comes to coding, we like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.
  
 It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact: It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact:
Line 62: Line 67:
 **RedBean** (Gabor de Mooij): "You can list RedBeanPHP as a supporter, we will implement this into the core." **RedBean** (Gabor de Mooij): "You can list RedBeanPHP as a supporter, we will implement this into the core."
  
-**Psalm** (Matthew Brown): "I've just added support for a //literal-string// type to Psalm: https://psalm.dev/r/9440908f39"+**Psalm** (Matthew Brown): 13th June "I was skeptical about the first draft of this RFC when I saw it last month, but now I see the light (especially with the concat changes)". Then on the 14th June, "I've just added support for a //literal-string// type to Psalm: https://psalm.dev/r/9440908f39" ([[https://github.com/vimeo/psalm/releases/tag/4.8.0|4.8.0]]) 
 + 
 +**PHPStan** (Ondřej Mirtes): 1st September, has been implemented in [[https://github.com/phpstan/phpstan/releases/tag/0.12.97|0.12.97]].
  
 ===== Proposal ===== ===== Proposal =====
  
-Add the function //is_literal()//, where a literal is defined as:+Add the function //is_literal()//
 + 
 +A string shall pass the //is_literal// check if it was defined by the programmer in source codeor is the result of a function or instruction whose inputs would all pass the //is_literal// check.
  
-  - Strings defined by the programmer (literals+Concatenation instructions and the following string functions are therefore able to produce literals:
-  - Strings defined by the engine itself (called interned strings)+
  
-(Under The Hood: Interned strings do not include input values; instead interned strings are: Strings defined by the programmer, strings defined in the source code of php, and strings defined by the engine (either at compile or runtime), with known values.)+  - //str_repeat()// 
 +  - //str_pad()// 
 +  - //implode()// 
 +  - //join()//
  
-Any function or instruction that is aware of "literal" strings shall produce a "literal" string if all input would pass //is_literal()//This includes //sprintf()//, //str_repeat()//, //str_pad()//, //implode()//, //join()//, //array_pad()//, and //array_fill()//.+(Namespaces constructed for the programmer by the compiler will also be marked literal for convenience.)
  
 <code php> <code php>
Line 103: Line 114:
 </code> </code>
  
-===== Try it =====+===== Try It =====
  
-[[https://3v4l.org/#focus=rfc.literals|Test it out on 3v4l.org]] - Currently using the function name //is_noble()//, while we tested integer support.+[[https://3v4l.org/#focus=rfc.literals|Test it out on 3v4l.org]]
  
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using). [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using).
Line 159: Line 170:
 **What about the performance impact?** **What about the performance impact?**
  
-Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], and the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|preliminary testing on this implementation]] has found a 0.124performance hit for the Laravel Demo app, and 0.161% for Symfony (rounds 4-6, which involved 5000 requests). These tests do not connect to a database, as the variability introduced makes it impossible to measure that low of a difference+Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], the preliminary results found a 0.47impact with the Symfony demo app (it did not connect to a database, as the variability introduced would make it impossible to measure the difference).
- +
-The full stress-test is 3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]], but this is not representative of a typical PHP script (it's not normal to concatenate 4 strings, 5 million times, with no other actions).+
  
 ==== String Concatenation ==== ==== String Concatenation ====
Line 171: Line 180:
 Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further. The idea was to require everyone to use special //literal_concat()// and //literal_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes. Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further. The idea was to require everyone to use special //literal_concat()// and //literal_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes.
  
-Performance wise, my [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/tests|simplistic testing]] found there was still [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/local.pdf|a small impact without run-time concat]]+Performance wise, my [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/tests|simplistic testing]] found there was still [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/local.pdf|a small impact without run-time concat]].
- +
-    Laravel Demo App: +0.30% with, vs +0.18% without. +
-    Symfony Demo App: +0.06% with, vs +0.06% without. +
-    My Concat Test:   +4.36% with, vs +2.23% without. +
-    - +
-    Website with 22 SQL queries: Inconclusive, too variable.+
  
 > (Under The Hood: This is because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// which needs to remove the //is_literal// flag. Also "zend_vm_def.h" does the same; and supports a quick concat with an empty string (x2), which would need its flag removed as well). > (Under The Hood: This is because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// which needs to remove the //is_literal// flag. Also "zend_vm_def.h" does the same; and supports a quick concat with an empty string (x2), which would need its flag removed as well).
Line 223: Line 226:
 } }
 </code> </code>
 +
 +And libraries can easily abstract this for the developer.
  
 ==== Non-Parameterised Values ==== ==== Non-Parameterised Values ====
Line 254: Line 259:
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L194|How this can be done with aliases]], or the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L229|example Query Builder]]. [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L194|How this can be done with aliases]], or the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L229|example Query Builder]].
  
-==== Faking it ====+==== Faking It ====
  
 **What if I really really need to mark a value as a literal?** **What if I really really need to mark a value as a literal?**
Line 260: Line 265:
 This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark all escaped values as a literal (seeing it as a safe value, which is [[#taint_checking|wrong]]). This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark all escaped values as a literal (seeing it as a safe value, which is [[#taint_checking|wrong]]).
  
-That said, we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-trusted-bypass.php|var_export]]), but doing so is clearly the developer doing something wrong. We want to provide safety rails, but there is nothing stopping the developer from jumping over them if that's their choice.+That said, we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-literal-bypass.php|var_export]]), but doing so is clearly the developer doing something wrong. We want to provide safety rails, but there is nothing stopping the developer from jumping over them if that's their choice.
  
 ==== Usage by Libraries ==== ==== Usage by Libraries ====
Line 345: Line 350:
 </code> </code>
  
-If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, and highlight exactly where the mistake happened:+If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, stopping script execution, and highlight exactly where the mistake happened:
  
 <code php> <code php>
Line 371: Line 376:
  
 There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table. There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table.
 +
 +==== Compiler Optimisations ====
 +
 +The implementation has been updated to avoid situations that could have confused the developer:
 +
 +<code php>
 +$one = 1;
 +$a = 'A' . $one; // false, flag removed because it's being concatenated with an integer.
 +$b = 'A' . 1; // Was true, as the compiler optimised this to the literal 'A1'.
 +
 +$a = "Hello ";
 +$b = $a . 2; // Was true, as the 2 was coerced to the string '2' (to optimise the concatenation).
 +
 +$a = implode("-", [1, 2, 3]); // Was true with OPcache, as it could optimise this to the literal '1-2-3'
 +
 +$a = chr(97); // Was true, due to the use of Interned Strings.
 +</code>
 +
 +This has been achieved by using the Lexer to mark strings as a literal (i.e. earlier in the process).
  
 ==== Extensions ==== ==== Extensions ====
Line 384: Line 408:
 This allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the literal value was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for a dedicated type, nor could native functions work with this (see "Future Scope"). This allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the literal value was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for a dedicated type, nor could native functions work with this (see "Future Scope").
  
-==== Interned Strings ====+===== Previous Examples =====
  
-**Why does the output from //chr()// appear as a literal?**+**Go** can use an "[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/go/index.go|un-exported string type]]", a technique which is used by [[https://blogtitle.github.io/go-safe-html/|go-safe-html]].
  
-This was noticed by Claude Pache, and on technical level is due to the [[https://news-web.php.net/php.internals/114877|use of Interned Strings]], an optimisation used by //RETURN_CHAR// that re-uses single character valuesIt's effectively the same as calling //sprintf('%c', $i)//, which is also not an issue, as the developer is choosing to do this.+**C++** can use "[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/cpp/index.cpp|consteval annotation]]".
  
-===== Previous Examples =====+**Rust** can use a "[[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/others/rust|procedural macro]]", to check the provided value is a literal at compile time (a bit complicated).
  
-**Go** programs can use "ScriptFromConstant" to express the concept of a "compile time constant" ([[https://blogtitle.github.io/go-safe-html/|more details]]).+**Java** can use a "[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/java/src/main/java/com/example/isliteral/index.java|@CompileTimeConstant annotation]]" from [[https://errorprone.info/bugpattern/CompileTimeConstant|Error Prone]] to ensure method parameters can only use "compile-time constant expressions".
  
-**Java** can use [[https://errorprone.info/|Error Prone]] with [[https://errorprone.info/bugpattern/CompileTimeConstant|@CompileTimeConstant]] to ensure method parameters can only use "compile-time constant expressions".+**Node** has the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm/index.js|is-template-object polyfill]], which checks a tag function was provided a "tagged template literal" (this technique is used in [[https://www.npmjs.com/package/safesql|safesql]], via [[https://www.npmjs.com/package/template-tag-common|template-tag-common]]). Alternatively Node developers can use [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm-closure-library/index.js|goog.string.Const]] from Google's Closure Library.
  
 **JavaScript** is getting [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]], for "Distinguishing strings from a trusted developer from strings that may be attacker controlled" (intended to be [[https://github.com/mikewest/tc39-proposal-literals|used with Trusted Types]]). **JavaScript** is getting [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]], for "Distinguishing strings from a trusted developer from strings that may be attacker controlled" (intended to be [[https://github.com/mikewest/tc39-proposal-literals|used with Trusted Types]]).
  
-**Perl** has a [[https://perldoc.perl.org/perlsec#Taint-mode|Taint Mode]], via the -T flag, where all input is marked as "tainted", and cannot be used by some methods (like commands that modify files), unless you use a regular expression to match and return known-good values (where regular expressions are easy to get wrong).+**Perl** has a [[https://perldoc.perl.org/perlsec#Taint-mode|Taint Mode]], via the -T flag, where all input is marked as "tainted", and cannot be used by some methods (like commands that modify files), unless you use a regular expression to match and return known-good values (regular expressions are easy to get wrong).
  
 There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[https://wiki.php.net/rfc/taint|a previous RFC proposing it be added to the language]] by Wietse Venema. There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[https://wiki.php.net/rfc/taint|a previous RFC proposing it be added to the language]] by Wietse Venema.
Line 449: Line 473:
 With a future RFC, we could potentially introduce checks for the native functions. For example, if we use the [[https://web.dev/trusted-types/|Trusted Types]] concept from JavaScript (which protects [[https://www.youtube.com/watch?v=po6GumtHRmU&t=92s|60+ Injection Sinks]], like innerHTML), the libraries create a stringable object as their output. These objects can be added to a list of safe objects for the relevant native functions. The native functions could then **warn** developers when they do not receive a value with the //is_literal// flag, or one of the safe objects. These warnings would **not break anything**, they just make developers aware of the mistakes they have made, and we will always need a way of switching them off entirely (e.g. phpMyAdmin). With a future RFC, we could potentially introduce checks for the native functions. For example, if we use the [[https://web.dev/trusted-types/|Trusted Types]] concept from JavaScript (which protects [[https://www.youtube.com/watch?v=po6GumtHRmU&t=92s|60+ Injection Sinks]], like innerHTML), the libraries create a stringable object as their output. These objects can be added to a list of safe objects for the relevant native functions. The native functions could then **warn** developers when they do not receive a value with the //is_literal// flag, or one of the safe objects. These warnings would **not break anything**, they just make developers aware of the mistakes they have made, and we will always need a way of switching them off entirely (e.g. phpMyAdmin).
  
-===== Proposed Voting Choices =====+===== Voting =====
  
-Accept the RFCYes/No+Accept the RFC 
 + 
 +<doodle title="is_literal" auth="craigfrancis" voteType="single" closed="true"> 
 +   Yes 
 +   No 
 +</doodle>
  
 ===== Implementation ===== ===== Implementation =====
  
 [[https://github.com/php/php-src/compare/master...krakjoe:literals|Joe Watkin's implementation]] [[https://github.com/php/php-src/compare/master...krakjoe:literals|Joe Watkin's implementation]]
- 
-===== References ===== 
- 
-N/A 
  
 ===== Rejected Features ===== ===== Rejected Features =====
  
-N/A+  - [[#integer_values|Supporting Integers]]
  
 ===== Thanks ===== ===== Thanks =====
Line 469: Line 494:
   - **Joe Watkins**, krakjoe, for writing the full implementation, including support for concatenation and integers, and helping me though the RFC process.   - **Joe Watkins**, krakjoe, for writing the full implementation, including support for concatenation and integers, and helping me though the RFC process.
   - **Máté Kocsis**, mate-kocsis, for setting up and doing the performance testing.   - **Máté Kocsis**, mate-kocsis, for setting up and doing the performance testing.
 +  - **Scott Arciszewski**, CiPHPerCoder, for checking over the RFC, and provided text on how we could implement integer support under a //is_noble()// name.
   - **Dan Ackroyd**, DanAck, for starting the [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|first implementation]], which made this a reality, providing //literal_concat()// and //literal_implode()//, and followup on how it should work.   - **Dan Ackroyd**, DanAck, for starting the [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|first implementation]], which made this a reality, providing //literal_concat()// and //literal_implode()//, and followup on how it should work.
   - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].   - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].
rfc/is_literal.1624657049.txt.gz · Last modified: 2021/06/25 21:37 by craigfrancis