rfc:is_trusted

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:is_trusted [2021/06/21 15:57] craigfrancisrfc:is_trusted [2021/06/21 19:36] (current) craigfrancis
Line 7: Line 7:
   * Contributors: Joe Watkins, Máté Kocsis, Dan Ackroyd   * Contributors: Joe Watkins, Máté Kocsis, Dan Ackroyd
   * Status: Under Discussion   * Status: Under Discussion
-  * First Published at: https://wiki.php.net/rfc/is_trusted+  * First Published at: https://wiki.php.net/rfc/is_literal
   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc   * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc
  
Line 16: Line 16:
 By trusting integers and literals (strings defined by the programmer), we can provide a lightweight, simple, and very effective way to prevent Injection Vulnerabilities. By trusting integers and literals (strings defined by the programmer), we can provide a lightweight, simple, and very effective way to prevent Injection Vulnerabilities.
  
-It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. Instead it's much safer for developers to use parameterised queries, and/or well-tested libraries.+It avoids the "false sense of security" that comes with the flawed "Taint Checking" approach, [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4|because escaping is very difficult to get right]]. It's much safer for developers to use parameterised queries, and well-tested libraries.
  
 These libraries require certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being: These libraries require certain sensitive values to only come from the developer; but because it's [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4|easy to incorrectly include user values]], Injection Vulnerabilities are still introduced by the thousands of developers using these libraries incorrectly. You will notice the linked examples are based on examples found in the Libraries' official documentation, they still "work", and are typically shorter/easier than doing it correctly (I've found many of them on live websites, and it's why I'm here). A simple Query Builder example being:
Line 110: Line 110:
 ===== Try it ===== ===== Try it =====
  
-[[https://3v4l.org/#focus=rfc.literals|Have a play with it on 3v4l.org]] - Note, the function has not yet been re-named and is still //is_literal()//, but all current functionality is the same.+[[https://3v4l.org/#focus=rfc.literals|Have a play with it on 3v4l.org]]
  
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_trusted()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using). [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|How it can be used by libraries]] - Notice how this example library just raises a warning, to simply let the developer know about the issue, **without breaking anything**. And it provides an //"unsafe_value"// value-object to bypass the //is_trusted()// check, but none of the examples need to use it (can be useful as a temporary thing, but there are much safer/better solutions, which developers are/should already be using).
Line 118: Line 118:
 ==== Taint Checking ==== ==== Taint Checking ====
  
-**Taint checking is flawed, isn't this the same?** It is not the same.+**Taint checking is flawed, isn't this the same?**
  
-Taint Checking incorrectly assumes the output of an escaping function is "safe" for a particular context. While it sounds reasonable in theory, the operation of escaping functions, and the context for which their output is safe, is very hard to define and led to a feature that is both complex and unreliable.+It is not the same. Taint Checking incorrectly assumes the output of an escaping function is "safe" for a particular context. While it sounds reasonable in theory, the operation of escaping functions, and the context for which their output is safe, is very hard to define and led to a feature that is both complex and unreliable.
  
 <code php> <code php>
Line 136: Line 136:
 ==== Education ==== ==== Education ====
  
-**Why not educate everyone?** You can't - developer training simply does not scale, and mistakes still happen.+**Why not educate everyone?** 
 + 
 +You can't - developer training simply does not scale, and mistakes still happen.
  
 We cannot expect everyone to have formal training, know everything from day 1, and consider programming a full time job. We want new programmers, with a variety of experiences, ages, and backgrounds. Everyone should be guided to do the right thing, and notified as soon as they make a mistake (we all make mistakes). We also need to acknowledge that many programmers are busy, do copy/paste code, don't necessarily understand what it does, edit it for their needs, then simply move on to their next task. We cannot expect everyone to have formal training, know everything from day 1, and consider programming a full time job. We want new programmers, with a variety of experiences, ages, and backgrounds. Everyone should be guided to do the right thing, and notified as soon as they make a mistake (we all make mistakes). We also need to acknowledge that many programmers are busy, do copy/paste code, don't necessarily understand what it does, edit it for their needs, then simply move on to their next task.
Line 142: Line 144:
 ==== Static Analysis ==== ==== Static Analysis ====
  
-**Why not use static analysis?** It will never be used by most developers.+**Why not use static analysis?** 
 + 
 +It will never be used by most developers.
  
 I still agree with [[https://news-web.php.net/php.internals/109192|Tyson Andre]], you should use Static Analysis, but it's an extra step that most programmers cannot be bothered to do, especially those who are new to programming (its usage tends to be higher among those writing well-tested libraries). I still agree with [[https://news-web.php.net/php.internals/109192|Tyson Andre]], you should use Static Analysis, but it's an extra step that most programmers cannot be bothered to do, especially those who are new to programming (its usage tends to be higher among those writing well-tested libraries).
Line 158: Line 162:
 ==== Performance ==== ==== Performance ====
  
-**What about the performance impact?** Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], and the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|preliminary testing on this implementation]] has found a 0.124% performance hit for the Laravel Demo app, and 0.161% for Symfony (rounds 4-6, which involved 5000 requests). These tests do not connect to a database, as the variability introduced makes it impossible to measure that low of a difference.+**What about the performance impact?** 
 + 
 +These stats from an early version of the implementation (new tests will be completed soon). 
 + 
 +Máté Kocsis has created a [[https://github.com/kocsismate/php-version-benchmarks/|php benchmark]] to replicate the old [[https://01.org/node/3774|Intel Tests]], and the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|preliminary testing on this implementation]] has found a 0.124% performance hit for the Laravel Demo app, and 0.161% for Symfony (rounds 4-6, which involved 5000 requests). These tests do not connect to a database, as the variability introduced makes it impossible to measure that low of a difference.
  
 The full stress-test is 3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]], but this is not representative of a typical PHP script (it's not normal to concatenate 4 strings, 5 million times, with no other actions). The full stress-test is 3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]], but this is not representative of a typical PHP script (it's not normal to concatenate 4 strings, 5 million times, with no other actions).
Line 166: Line 174:
 ==== String Concatenation ==== ==== String Concatenation ====
  
-**Is string concatenation supported?** Yes.+**Is string concatenation supported?**
  
-The trusted flag is preserved when two trusted values are concatenated; this makes it easier to use //is_trusted()//, especially by developers that use concatenation for their SQL/HTML/CLI/etc.+Yes. The trusted flag is preserved when two trusted values are concatenated; this makes it easier to use //is_trusted()//, especially by developers that use concatenation for their SQL/HTML/CLI/etc.
  
 Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further. The idea was to require everyone to use special //trusted_concat()// and //trusted_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes. Previously we tried a version that only supported concatenation at compile-time (not run-time), to see if it would reduce the performance impact even further. The idea was to require everyone to use special //trusted_concat()// and //trusted_implode()// functions, which would raise exceptions to highlight where mistakes were made. These two functions can still be implemented by developers themselves (see [[#support_functions|Support Functions]] below), as they can be useful; but requiring everyone to use them would have required big changes to existing projects, and exceptions are not a graceful way of handling mistakes.
Line 186: Line 194:
 ==== String Splitting ==== ==== String Splitting ====
  
-**Why don't you support string splitting?** In short, we can't find any real use cases (security features should try to keep the implementation as simple as possible).+**Why don't you support string splitting?** 
 + 
 +In short, we can't find any real use cases (security features should try to keep the implementation as simple as possible).
  
 Also, the security considerations are different. Concatenation joins known/fixed units together, whereas if you're starting with a trusted string, and the program allows the Evil-User to split the string (e.g. setting the length in substr), then they get considerable control over the result (it creates an untrusted modification). Also, the security considerations are different. Concatenation joins known/fixed units together, whereas if you're starting with a trusted string, and the program allows the Evil-User to split the string (e.g. setting the length in substr), then they get considerable control over the result (it creates an untrusted modification).
Line 227: Line 237:
 ==== Non-Parameterised Values ==== ==== Non-Parameterised Values ====
  
-**How can this work with Table and Field names in SQL, which cannot use parameters?** They are often in variables written as literal strings anyway (so no changes needed); and if they are dependent on user input, in most cases you can (and should) use literals:+**How can this work with Table and Field names in SQL, which cannot use parameters?** 
 + 
 +They are often in variables written as literal strings anyway (so no changes needed); and if they are dependent on user input, in most cases you can (and should) use literals:
  
 <code php> <code php>
Line 255: Line 267:
 ==== Usage by Libraries ==== ==== Usage by Libraries ====
  
-**Could libraries use is_trusted() internally?** Yes, they could.+**Could libraries use is_trusted() internally?** 
 + 
 +Yes, they could.
  
 It would be fantastic if they did use additional //is_trusted()// checks after receiving the values from developers (it ensures the library hasn't introduced a vulnerability either); but this isn't a priority, simply because libraries are rarely the source of Injection Vulnerabilities. It would be fantastic if they did use additional //is_trusted()// checks after receiving the values from developers (it ensures the library hasn't introduced a vulnerability either); but this isn't a priority, simply because libraries are rarely the source of Injection Vulnerabilities.
Line 265: Line 279:
 ==== Integer Values ==== ==== Integer Values ====
  
-**Can you support Integer values?** Yes, support for integers is now included.+**Can you support Integer values?** 
 + 
 +Yes, support for integers is now included.
  
 It was noted by Matthew Brown (and others) that a lot of existing code and tutorials uses integers directly, and they do not cause a security issue. It was noted by Matthew Brown (and others) that a lot of existing code and tutorials uses integers directly, and they do not cause a security issue.
Line 281: Line 297:
 ==== Other Values ==== ==== Other Values ====
  
-**Why don't you support Boolean/Float values?** It's a very low value feature, and we cannot be sure of the security implications.+**Why don't you support Boolean/Float values?** 
 + 
 +It's a very low value feature, and we cannot be sure of the security implications.
  
 For example, the value you put in often is not always the same as what you get out: For example, the value you put in often is not always the same as what you get out:
Line 302: Line 320:
 First, there is no perfect name. First, there is no perfect name.
  
-We did start with //is_literal()// as a placeholder name (at a time we only trusted literals). This name wasn't perfect, but it would have allowed developers to search and get an idea of what a literal was. When [[#integer_values|integer values]] were deemed necessary to help adoption, the name became more of a problem. We also need to keep to a single word name (to support a dedicated type in the future). This is where //is_trusted()// and //is_known()// was proposed. We had a [[https://strawpoll.com/bd2qed2xs/r|vote on the name]], which gave us a 18 to 3 result in favour of //is_trusted()//.+We did start with //is_literal()// as a placeholder name (at a time we only trusted literals). This name wasn't perfect, but it would have allowed developers to search and get an idea of what a literal was. When [[#integer_values|integer values]] were deemed necessary to help adoption, the name became more of a problem. We also need to keep to a single word name (to support a dedicated type in the future). This is where //is_trusted()// and //is_known()// was proposed. We had a [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/name/2021-07-20.png|vote on the name]], which gave us a 18 to 3 result in favour of //is_trusted()//.
  
 ==== Support Functions ==== ==== Support Functions ====
  
-**What about other support functions?** We did consider //trusted_concat()// and //trusted_implode()// functions (see [[#string_concatenation|String Concatenation]] above), but these can be userland functions:+**What about other support functions?** 
 + 
 +We did consider //trusted_concat()// and //trusted_implode()// functions (see [[#string_concatenation|String Concatenation]] above), but these can be userland functions:
  
 <code php> <code php>
Line 346: Line 366:
 ==== Other Functions ==== ==== Other Functions ====
  
-**Why not support other string functions?** Like [[#string_splitting|String Splitting]], we can't find any real use cases, and don't want to make this complicated. For example //strtoupper()// might be reasonable, but we will need to consider how it would be used (good and bad), and check for any oddities (e.g. output varying based on the current locale). Also, functions like //str_shuffle()// create unpredictable results.+**Why not support other string functions?** 
 + 
 +Like [[#string_splitting|String Splitting]], we can't find any real use cases, and don't want to make this complicated. For example //strtoupper()// might be reasonable, but we will need to consider how it would be used (good and bad), and check for any oddities (e.g. output varying based on the current locale). Also, functions like //str_shuffle()// create unpredictable results.
  
 ==== Limitations ==== ==== Limitations ====
  
-**Why can't we consider the value safe?**+**Does this mean the value is completely safe?**
  
-While these values can be trusted to not contain an Injection Vulnerability, they cannot be safe from every kind of issueFor example:+While these values can be trusted to not contain an Injection Vulnerability, they cannot be completely safe from every kind of issueFor example:
  
 <code php> <code php>
Line 360: Line 382:
  
 The parameters could be set to "/" or "0000-00-00", which can result in deleting a lot more data than expected. The parameters could be set to "/" or "0000-00-00", which can result in deleting a lot more data than expected.
 +
 +There's no single RFC that can completely solve all developer errors, but this takes one of the biggest ones off the table.
  
 ==== Faking it ==== ==== Faking it ====
Line 371: Line 395:
 ==== Extensions ==== ==== Extensions ====
  
-**Extensions create and manipulate strings, won't this break the flag on strings?** Strings have multiple flags already that are off by default - this is the correct behaviour when extensions create their own strings (should not be flagged as trusted). If an extension is found to be already using the flag we're using for is_trusted (unlikely), that's the same as any new flag being introduced into PHP, and will need to be updated in the same way.+**Extensions create and manipulate strings, won't this break the flag on strings?** 
 + 
 +Strings have multiple flags already that are off by default - this is the correct behaviour when extensions create their own strings (should not be flagged as trusted). If an extension is found to be already using the flag we're using for is_trusted (unlikely), that's the same as any new flag being introduced into PHP, and will need to be updated in the same way.
  
 ==== Reflection API ==== ==== Reflection API ====
  
-**Why don't you use the Reflection API?** This allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the trusted value was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for a dedicated type, nor could native functions work with this (see "Future Scope").+**Why don't you use the Reflection API?** 
 + 
 +This allows you to "introspect classes, interfaces, functions, methods and extensions"; it's not currently set up for object methods to inspect the code calling it. Even if that was to be added (unlikely), it could only check if the trusted value was defined there, it couldn't handle variables (tracking back to their source), nor could it provide any future scope for a dedicated type, nor could native functions work with this (see "Future Scope").
  
 ==== Interned Strings ==== ==== Interned Strings ====
  
-**Why does the output from //chr()// appear as trusted?** This was noticed by Claude Pache, and on a technical level is due to the [[https://news-web.php.net/php.internals/114877|use of Interned Strings]], an optimisation used by //RETURN_CHAR// that re-uses single character values. It's effectively the same as calling //sprintf('%c', $i)//, which is also not an issue, as the developer is choosing to do this.+**Why does the output from //chr()// appear as trusted?** 
 + 
 +This was noticed by Claude Pache, and on a technical level is due to the [[https://news-web.php.net/php.internals/114877|use of Interned Strings]], an optimisation used by //RETURN_CHAR// that re-uses single character values. It's effectively the same as calling //sprintf('%c', $i)//, which is also not an issue, as the developer is choosing to do this.
  
 ===== Previous Work ===== ===== Previous Work =====
rfc/is_trusted.txt · Last modified: 2021/06/21 19:36 by craigfrancis