rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
rfc:is_literal [2021/06/25 21:37] – Back to is_literal() craigfrancisrfc:is_literal [2021/07/05 18:12] – Start voting craigfrancis
Line 1: Line 1:
 ====== PHP RFC: Is_Literal ====== ====== PHP RFC: Is_Literal ======
  
-  * Version: 1.0 +  * Version: 1.1 
-  * Date: 2020-03-21 +  * Voting Start: 2021-07-05 19:30 BST / 18:30 UTC 
-  * Updated: 2021-06-06+  * Voting End: 2021-07-19 19:30 BST / 18:30 UTC 
 +  * RFC Started: 2020-03-21 
 +  * RFC Updated: 2021-07-04
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
-  * Contributors: Joe Watkins, Máté Kocsis, Dan Ackroyd +  * Contributors: Joe Watkins, Máté Kocsis 
-  * Status: Under Discussion+  * Status: Voting
   * First Published at: https://wiki.php.net/rfc/is_literal   * First Published at: https://wiki.php.net/rfc/is_literal
   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc
 +  * Implementation: https://github.com/php/php-src/compare/master...krakjoe:literals
  
 ===== Introduction ===== ===== Introduction =====
  
-Add the function //is_literal()//, a lightweight and effective way to identify if a string was written by a developer, to remove the risk of a variable containing an Injection Vulnerability.+Add the function //is_literal()//, a lightweight and effective way to identify if a string was written by a developer, removing the risk of a variable containing an Injection Vulnerability.
  
-It's a simple process where a flag is set internally on strings that have been written by a developer (as opposed to a user) and the flag persists through concatenation with other 'literal' strings. The function checks the flag is present and thus no user data is included.+It's a simple process where a flag is set internally on strings that have been written by a developer (as opposed to a user), where the flag persists through concatenation with other 'literal' strings. The function checks the flag is present and thus no user data is included.
  
 It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries. It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries.
  
-//is_literal()// would be used by libraries, as they require certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:+//is_literal()// can be used by libraries to deal with a difficult problem - developers using them incorrectly. Libraries expect certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:
  
 <code php> <code php>
Line 34: Line 37:
 Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**. Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make**, **hard to identify**, and **very common**.
  
-We like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.+With SQL Injection, it just takes 1 mistake, and the attacker can usually read everything in the database (SQL Map, Havij, jSQL, etc). 
 + 
 +When it comes to coding, we like to think every developer reads the documentation, and would never directly include (inject) user values into their SQL/HTML/CLI - but we all know that's not the case.
  
 It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact: It's why these two issues have **always** been on the [[https://owasp.org/www-project-top-ten/|OWASP Top 10]]; a list designed to raise awareness of common issues, ranked on their prevalence, exploitability, detectability, and impact:
Line 66: Line 71:
 ===== Proposal ===== ===== Proposal =====
  
-Add the function //is_literal()//, where a literal is defined as:+Add the function //is_literal()//
 + 
 +A string shall pass the //is_literal// check if it was defined by the programmer in source codeor is the result of a function or instruction whose inputs would all pass the //is_literal// check.
  
-  - Strings defined by the programmer (literals+Concatenation instructions and the following string functions are therefore able to produce literals:
-  - Strings defined by the engine itself (called interned strings)+
  
-(Under The Hood: Interned strings do not include input values; instead interned strings are: Strings defined by the programmer, strings defined in the source code of php, and strings defined by the engine (either at compile or runtime), with known values.)+  - //str_repeat()// 
 +  - //str_pad()// 
 +  - //implode()// 
 +  - //join()//
  
-Any function or instruction that is aware of "literal" strings shall produce a "literal" string if all input would pass //is_literal()//This includes //sprintf()//, //str_repeat()//, //str_pad()//, //implode()//, //join()//, //array_pad()//, and //array_fill()//.+(Namespaces constructed for the programmer by the compiler will also be marked literal for convenience.)
  
 <code php> <code php>
Line 103: Line 112:
 </code> </code>
  
-===== Try it =====+===== Try It =====
  
-[[https://3v4l.org/#focus=rfc.literals|Test it out on 3v4l.org]] - Currently using the function name //is_noble()//, while we tested integer support.+[[https://3v4l.org/#focus=rfc.literals|Test it out on 3v4l.org]]
  
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using). [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_literal()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using).
Line 159: Line 168:
 **What about the performance impact?** **What about the performance impact?**
  
-Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], and the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|preliminary testing on this implementation]] has found a 0.124performance hit for the Laravel Demo app, and 0.161% for Symfony (rounds 4-6, which involved 5000 requests). These tests do not connect to a database, as the variability introduced makes it impossible to measure that low of a difference+Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], the preliminary results found a 0.47impact with the Symfony demo app (it did not connect to a database, as the variability introduced would make it impossible to measure the difference).
- +
-The full stress-test is 3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]], but this is not representative of a typical PHP script (it's not normal to concatenate 4 strings, 5 million times, with no other actions).+
  
 ==== String Concatenation ==== ==== String Concatenation ====
Line 223: Line 230:
 } }
 </code> </code>
 +
 +And libraries can easily abstract this for the developer.
  
 ==== Non-Parameterised Values ==== ==== Non-Parameterised Values ====
Line 254: Line 263:
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L194|How this can be done with aliases]], or the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L229|example Query Builder]]. [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L194|How this can be done with aliases]], or the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4#L229|example Query Builder]].
  
-==== Faking it ====+==== Faking It ====
  
 **What if I really really need to mark a value as a literal?** **What if I really really need to mark a value as a literal?**
Line 260: Line 269:
 This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark all escaped values as a literal (seeing it as a safe value, which is [[#taint_checking|wrong]]). This implementation does not provide a way for a developer to mark anything they want as a literal. This is on purpose. We do not want to recreate the biggest flaw of Taint Checking. It would be very easy for a naive developer to mark all escaped values as a literal (seeing it as a safe value, which is [[#taint_checking|wrong]]).
  
-That said, we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-trusted-bypass.php|var_export]]), but doing so is clearly the developer doing something wrong. We want to provide safety rails, but there is nothing stopping the developer from jumping over them if that's their choice.+That said, we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-literal-bypass.php|var_export]]), but doing so is clearly the developer doing something wrong. We want to provide safety rails, but there is nothing stopping the developer from jumping over them if that's their choice.
  
 ==== Usage by Libraries ==== ==== Usage by Libraries ====
Line 345: Line 354:
 </code> </code>
  
-If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, and highlight exactly where the mistake happened:+If a developer changed the literal //'ASC'// to //$_GET['order']//, the error would be noticed by //$db->query()//, but it's not clear where the non-literal value was introduced. Whereas, if they used //literal_concat()//, that would raise an exception much earlier, stopping script execution, and highlight exactly where the mistake happened:
  
 <code php> <code php>
Line 371: Line 380:
  
 There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table. There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table.
 +
 +==== Compiler Optimisations ====
 +
 +The implementation has been updated to avoid situations that could have confused the developer:
 +
 +<code php>
 +$one = 1;
 +$a = 'A' . $one; // false, flag removed because it's being concatenated with an integer.
 +$b = 'A' . 1; // Was true, as the compiler optimised this to the literal 'A1'.
 +
 +$a = "Hello ";
 +$b = $a . 2; // Was true, as the 2 was coerced to the string '2' (to optimise the concatenation).
 +
 +$a = implode("-", [1, 2, 3]); // Was true with OPcache, as it could optimise this to the literal '1-2-3'
 +
 +$a = chr(97); // Was true, due to the use of Interned Strings.
 +</code>
 +
 +This has been achieved by using the Lexer to mark strings as a literal (i.e. earlier in the process).
  
 ==== Extensions ==== ==== Extensions ====
Line 383: Line 411:
  
 This allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the literal value was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for a dedicated type, nor could native functions work with this (see "Future Scope"). This allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the literal value was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for a dedicated type, nor could native functions work with this (see "Future Scope").
- 
-==== Interned Strings ==== 
- 
-**Why does the output from //chr()// appear as a literal?** 
- 
-This was noticed by Claude Pache, and on a technical level is due to the [[https://news-web.php.net/php.internals/114877|use of Interned Strings]], an optimisation used by //RETURN_CHAR// that re-uses single character values. It's effectively the same as calling //sprintf('%c', $i)//, which is also not an issue, as the developer is choosing to do this. 
  
 ===== Previous Examples ===== ===== Previous Examples =====
Line 451: Line 473:
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
  
-Accept the RFCYes/No+Accept the RFC 
 + 
 +<doodle title="is_literal" auth="craigfrancis" voteType="single" closed="false"> 
 +   Yes 
 +   No 
 +</doodle>
  
 ===== Implementation ===== ===== Implementation =====
Line 463: Line 490:
 ===== Rejected Features ===== ===== Rejected Features =====
  
-N/A+  - [[#integer_values|Supporting Integers]]
  
 ===== Thanks ===== ===== Thanks =====
Line 469: Line 496:
   - **Joe Watkins**, krakjoe, for writing the full implementation, including support for concatenation and integers, and helping me though the RFC process.   - **Joe Watkins**, krakjoe, for writing the full implementation, including support for concatenation and integers, and helping me though the RFC process.
   - **Máté Kocsis**, mate-kocsis, for setting up and doing the performance testing.   - **Máté Kocsis**, mate-kocsis, for setting up and doing the performance testing.
 +  - **Scott Arciszewski**, CiPHPerCoder, for checking over the RFC, and provided text on how we could implement integer support under a //is_noble()// name.
   - **Dan Ackroyd**, DanAck, for starting the [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|first implementation]], which made this a reality, providing //literal_concat()// and //literal_implode()//, and followup on how it should work.   - **Dan Ackroyd**, DanAck, for starting the [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|first implementation]], which made this a reality, providing //literal_concat()// and //literal_implode()//, and followup on how it should work.
   - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].   - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis