rfc:literal_string

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:literal_string [2023/02/19 11:46] – Update examples craigfrancisrfc:literal_string [2023/04/20 12:18] (current) – Notes on integer/string-concat in Python/Go | Note XHP for templating | Note use of eval | Note future scope for LiteralInteger craigfrancis
Line 5: Line 5:
   * Voting End: ???   * Voting End: ???
   * RFC Started: 2022-12-27   * RFC Started: 2022-12-27
-  * RFC Updated: 2022-12-27+  * RFC Updated: 2023-03-16
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
   * Contributors: Joe Watkins, Máté Kocsis   * Contributors: Joe Watkins, Máté Kocsis
Line 15: Line 15:
 ===== Introduction ===== ===== Introduction =====
  
-Add //LiteralString// type, and //is_literal_string()//, to check that variable contains a "developer defined string".+Add //LiteralString// type, and //is_literal_string()//, to "distinguish strings from trusted developer, from strings that may be attacker controlled".
  
-This ensures the value cannot be a source of an Injection Vulnerabilitybecause it does not contain user input.+The vast majority of Injection Vulnerabilities involving libraries (e.g. database abstractions) are due to programmers using the library incorrectly. A simple LiteralString check would allow libraries to easily identify these mistakeswithout needing to make massive changes.
  
-This technique is used at Google (as described in "Building Secure and Reliable Systems", see [[https://static.googleusercontent.com/media/sre.google/en//static/pdf/building_secure_and_reliable_systems.pdf#page=287|Common Security Vulnerabilities, pages 251-255]], which shows how "developer-controlled input" prevents these issues in Go); it's used by FaceBook developers (ref [[https://eiv.dev/python-pyre/|pyre type-checker]], which has been added to Python 3.11 via [[https://peps.python.org/pep-0675/|PEP 675]]); and Christoph Kern discussed it in 2016 with [[https://www.youtube.com/watch?v=ccfEu-Jj0as|Preventing Security Bugs through Software Design]]. Also explained at [[https://www.usenix.org/conference/usenixsecurity15/symposium-program/presentation/kern|USENIX Security 2015]], [[https://www.youtube.com/watch?v=06_suQAAfBc|OWASP AppSec US 2021]], and summarised at [[https://eiv.dev/|eiv.dev]].+It also allows developers to easily check their Parameterised Queries. 
 + 
 +The //LiteralString// type has been added to Python 3.11 via [[https://peps.python.org/pep-0675/|PEP 675]]. 
 + 
 +This technique is used at Google (as described in "Building Secure and Reliable Systems", see [[https://static.googleusercontent.com/media/sre.google/en//static/pdf/building_secure_and_reliable_systems.pdf#page=287|Common Security Vulnerabilities, pages 251-255]], which shows how "developer-controlled input" prevents these issues in Go); it's used by FaceBook developers (ref [[https://eiv.dev/python-pyre/|pyre type-checker]]); and Christoph Kern discussed it in 2016 with [[https://www.youtube.com/watch?v=ccfEu-Jj0as|Preventing Security Bugs through Software Design]]. Also explained at [[https://www.usenix.org/conference/usenixsecurity15/symposium-program/presentation/kern|USENIX Security 2015]], [[https://www.youtube.com/watch?v=06_suQAAfBc|OWASP AppSec US 2021]], and summarised at [[https://eiv.dev/|eiv.dev]].
  
 ===== The Problem ===== ===== The Problem =====
  
-Injection and Cross-Site Scripting (XSS) vulnerabilities are **easy to make****hard to identify**, and **very common**.+Developers often believe Database Abstractions or Parameterised Queries have completely solved Injection and Cross-Site Scripting (XSS) vulnerabilities; and while the situation has improvedmistakes still happen:
  
 <code php> <code php>
Line 108: Line 112:
 </code> </code>
  
-Most libraries will probably use something like //example2()// to test the values they receive, partially for backwards compatibility reasons (can use //function_exists//), but also because it allows them to easily choose how mistakes are handled. For example, I would suggest libraries used logged warnings by default, with an option to throw exceptions for those developers who are confident their code is ready or when it's in development mode, or they could provide a way to disable checks on a per query basis, or entirely for legacy projects ([[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|example]]).+Most libraries will probably use something like //example2()// to test the values they receive, partially for backwards compatibility reasons (can use //function_exists//), but also because it allows them to easily choose how mistakes are handled. For example, I would suggest libraries used logged warnings by default, with an option to throw exceptions for those programmers who are confident their code is ready or when it's in development mode, or they could provide a way to disable checks on a per query basis, or entirely for legacy projects ([[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4|example]]).
  
-Libraries could also check their output (e.g. SQL to a database) is still a LiteralString, but this isn't a priority (libraries are rarely the source of Injection Vulnerabilities, it's usually the developer using them incorrectly).+Libraries could also check their output (e.g. SQL to a database) is still a LiteralString, but this isn't a priority (libraries are rarely the source of Injection Vulnerabilities, it's usually the programmer using them incorrectly).
  
 You can test it at [[https://3v4l.org/sLmC9/rfc#vrfc.literals|3v4l.org]] using the previous "is_literal()" function name. You can test it at [[https://3v4l.org/sLmC9/rfc#vrfc.literals|3v4l.org]] using the previous "is_literal()" function name.
- 
-<code php> 
-class sqli_protected_db { 
-  private $db; 
-  public function __construct() { 
-    $this->db = new mysqli('localhost', 'username', 'password', 'database'); 
-  } 
-  public function query(LiteralString $sql, Array $parameters = [], Array $aliases = []) { 
-    foreach ($aliases as $name => $value) { 
-      $sql = str_replace('{' . $name . '}', '`' . str_replace('`', '``', $value) . '`', $sql); 
-    } 
-    print_r($sql . "\n"); 
-    print_r(iterator_to_array($this->db->execute_query($sql, $parameters))); 
-  } 
-} 
- 
-$db = new sqli_protected_db(); 
- 
-$db->query('SELECT name FROM user WHERE id = ?', [$_GET['id']]); 
-$db->query('SELECT name FROM user WHERE id = ' . $_GET['id']); // TypeError 
- 
-$db->query('SELECT name FROM user ORDER BY {order}', [], ['order' => $_GET['order']]); 
-$db->query('SELECT name FROM user ORDER BY ' . $_GET['order']); // TypeError 
-</code> 
- 
-<code php> 
-class query_builder { 
-  public function where(LiteralString $column, ?LiteralString $operator = null, $value = null) { 
-    print_r($column . ' ' . $operator . ' ?' . "\n"); 
-  } 
-} 
- 
-$qb = new query_builder(); 
- 
-$qb->where('CONCAT(name_first, " ", name_last)', 'LIKE', $_GET['name']); 
-$qb->where($_GET['field'], '=', $_GET['value']); // TypeError 
-</code> 
  
 ===== Considerations ===== ===== Considerations =====
Line 161: Line 128:
 When two LiteralString values are concatenated, the result is also a LiteralString. When two LiteralString values are concatenated, the result is also a LiteralString.
  
-Some people may believe that not supporting concatenation might help debugging, with the thought being, in a long complex script, which only checks if a variable is a LiteralString at the end, it's harder to identify the source of the problem. However, over the last year I've simply not found this to be the case (usual debug techniques work fine), whereas it would be nigh-on-impossible to update every library and all existing code to not use concatenation (e.g. to use a query builder). That said, someone who really wants this strict way of working could use:+It's been suggested that not supporting concatenation might help debugging, with the thought being, in a long complex script, which only checks if a variable is a LiteralString at the end, it's harder to identify the source of the problem. However, over the last year the feedback has been that the usual debug techniques work fine (if anythingprogrammers want sprintf support); whereas it would be nigh-on-impossible to update every library and all existing code to not use concatenation (e.g. to use a query builder). That said, someone who really wants this strict way of working could use:
  
 <code php> <code php>
Line 176: Line 143:
 } }
 </code> </code>
 +
 +[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/python/integers.py|Python]] and [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/go/integers.go|Go]] support string concatenation as well.
  
 (On a technical note, we did test an implementation that didn't support concatenation, primarily to see if this would help reduce the performance impact even further. However, the PHP engine can sometimes still concatenate values automatically at compile-time (so concatenation appears to work in some contexts), and it didn't make much (if any) difference in regards to performance, because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// (which needs to remove the //LiteralString// flag) and "zend_vm_def.h" does the same; by supporting a quick concat with an empty string (x2), which would need its flag removed as well). (On a technical note, we did test an implementation that didn't support concatenation, primarily to see if this would help reduce the performance impact even further. However, the PHP engine can sometimes still concatenate values automatically at compile-time (so concatenation appears to work in some contexts), and it didn't make much (if any) difference in regards to performance, because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// (which needs to remove the //LiteralString// flag) and "zend_vm_def.h" does the same; by supporting a quick concat with an empty string (x2), which would need its flag removed as well).
Line 218: Line 187:
 </code> </code>
  
-Libraries can also abstract this for the developer, e.g. WordPress should support the following in the future ([[https://core.trac.wordpress.org/ticket/54042|#54042]]):+Libraries can also abstract this, e.g. WordPress should support the following in the future ([[https://core.trac.wordpress.org/ticket/54042|#54042]]):
  
 <code php> <code php>
Line 255: Line 224:
 $sql .= ' ORDER BY ' . ($fields[$sort] ?? 'u.full_name'); // A LiteralString $sql .= ' ORDER BY ' . ($fields[$sort] ?? 'u.full_name'); // A LiteralString
 </code> </code>
 +
 +This approach stops the attacker specifying a private field (e.g. //telephone_number//, where they can determine every users telephone number by updating their own account, and seeing how that affects the order).
  
 There may be some exceptions, see the next section. There may be some exceptions, see the next section.
Line 269: Line 240:
  
 <code php> <code php>
-$sql = "+$sql = '
   SELECT   SELECT
-    t.name+    u.name
-    t.f1+
   FROM   FROM
-    {my_table} AS t+    user AS u
   WHERE   WHERE
-    t.id = ?"; // A LiteralString+    u.type = ? 
 +  ORDER BY 
 +    {field}'; // A LiteralString
  
 $parameters = [ $parameters = [
-    $_GET['id'],+    $_GET['type'],
   ];   ];
  
 $identifiers = [ $identifiers = [
-    'my_table' => $_GET['table'],+    'field' => $_GET['field'],
   ];   ];
  
Line 292: Line 264:
  
 <code php> <code php>
-$wpdb->prepare('SELECT * FROM %i', $table_name);+$wpdb->prepare('ORDER BY %i', $field);
 </code> </code>
  
Line 299: Line 271:
 ==== FAQ: Bypassing It ==== ==== FAQ: Bypassing It ====
  
-This implementation does not provide an easy way for a developer to mark anything they want as a LiteralString, this is on purpose - we do not want to re-create one of the problems with Taint Checking, by pretending the LiteralString is a flag to say the value is "safe".+This implementation does not provide an easy way for programmers to mark anything they want as a LiteralString, this is on purpose - we do not want to re-create one of the problems with Taint Checking, by pretending the LiteralString is a flag to say the value is "safe".
  
 Some libraries may want to support their own way to bypass these checks, e.g. a ValueObject: Some libraries may want to support their own way to bypass these checks, e.g. a ValueObject:
Line 326: Line 298:
 </code> </code>
  
-But we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-literal-bypass.php|eval]]), but in doing so the developer is clearly choosing to do something wrong. We want to provide safety rails, but there is nothing stopping the developer from intentionally jumping over them.+But we do not pretend there aren't ways around this (e.g. using [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/is-literal-bypass.php|eval]]), but in doing so the programmer is clearly choosing to do something wrong. We want to provide safety rails, but there is nothing stopping the programmer from intentionally jumping over them.
  
 ==== FAQ: Integer Values ==== ==== FAQ: Integer Values ====
Line 332: Line 304:
 We wanted to flag integers defined in the source code, in the same way we are doing with strings. Unfortunately [[https://news-web.php.net/php.internals/114964|it would require a big change to add a literal flag on integers]]. Changing how integers work internally would have made a big performance impact, and potentially affected every part of PHP (including extensions). We wanted to flag integers defined in the source code, in the same way we are doing with strings. Unfortunately [[https://news-web.php.net/php.internals/114964|it would require a big change to add a literal flag on integers]]. Changing how integers work internally would have made a big performance impact, and potentially affected every part of PHP (including extensions).
  
-Due to this limitation, we did consider an approach to trust all integers, where Scott Arciszewski suggested the name //is_noble()//. While this is not as philosophically pure, we continued to explore this possibility because we could not find any way an Injection Vulnerability could be introduced with integers in SQL, HTML, CLI; and other contexts as well (e.g. preg, mail additional_params, XPath query, and even eval). We could not find any character encoding issues either (The closest we could find was EBCDIC, an old IBM character encoding, which encodes the 0-9 characters differently; which anyone using it would need to re-encode either way, and [[https://www.php.net/manual/en/migration80.other-changes.php#migration80.other-changes.ebcdic|EBCDIC is not supported by PHP]]). And we could not find any issue with a 64bit PHP server sending a large number to a 32bit database, because the number is being encoded as characters in a string (so that's also fine). However, the feedback received was that while safe from Injection Vulnerabilities, it becomes a more complex concept, one that might cause developers to assume it is also safe from developer/logic errors. Ultimately the preference was the simpler approach, that did not allow any integers (which is reinforced with the name LiteralString).+Due to this limitation, we did consider an approach to trust all integers, where Scott Arciszewski suggested the name //is_noble()//. While this is not as philosophically pure, we continued to explore this possibility because we could not find any way an Injection Vulnerability could be introduced with integers in SQL, HTML, CLI; and other contexts as well (e.g. preg, mail additional_params, XPath query, and even eval). We could not find any character encoding issues either (The closest we could find was EBCDIC, an old IBM character encoding, which encodes the 0-9 characters differently; which anyone using it would need to re-encode either way, and [[https://www.php.net/manual/en/migration80.other-changes.php#migration80.other-changes.ebcdic|EBCDIC is not supported by PHP]]). And we could not find any issue with a 64bit PHP server sending a large number to a 32bit database, because the number is being encoded as characters in a string (so that's also fine). However, the feedback received was that while safe from Injection Vulnerabilities, it becomes a more complex concept, one that might cause programmers to assume it is also safe from programmer/logic errors. Ultimately the preference was the simpler approach, that did not allow any integers (which is reinforced with the name LiteralString)
 + 
 +[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/python/integers.py|Python]] and [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/go/integers.go|Go]] do not support integers either.
  
 ==== FAQ: Other Values ==== ==== FAQ: Other Values ====
Line 355: Line 329:
 We made the decision to only support 4 functions that concatenated strings. We made the decision to only support 4 functions that concatenated strings.
  
-There are a lot of other candidates; e.g. adding //strtoupper()// might be reasonable, however we would need to consider the effect of every function and context, making the concept of a LiteralString more complex (e.g. //str_shuffle()// creating unpredictable results, or output varying based on the current locale).+There are a lot of other candidates; e.g. adding //strtoupper()// might be reasonable, however we would need to consider the effect of every function and context, making the concept of a LiteralString more complex (e.g. output varying based on the current locale, //str_shuffle()// creating unpredictable results, etc).
  
 The main request that's come up over the last year is to support //sprintf()//. While this is reasonable for basic concatenation (e.g. only using "%s"), it gets more complicated when coercing values to a different type, or when using formatting. That said, a future RFC might consider changing this (with the main focus being on the implications/risks). The main request that's come up over the last year is to support //sprintf()//. While this is reasonable for basic concatenation (e.g. only using "%s"), it gets more complicated when coercing values to a different type, or when using formatting. That said, a future RFC might consider changing this (with the main focus being on the implications/risks).
Line 380: Line 354:
  
 Existing libraries will probably focus on using //is_literal_string()//, as it allows them to easily choose how mistakes are handled, and //function_exists()// makes supporting PHP 8.2 and below very easy. Existing libraries will probably focus on using //is_literal_string()//, as it allows them to easily choose how mistakes are handled, and //function_exists()// makes supporting PHP 8.2 and below very easy.
 +
 +**Psalm** (Matthew Brown): 13th June 2021 "I was skeptical about the first draft of this RFC when I saw it last month, but now I see the light (especially with the concat changes)". Then on the 14th June, "I've just added support for a //literal-string// type to Psalm: https://psalm.dev/r/9440908f39" ([[https://github.com/vimeo/psalm/releases/tag/4.8.0|4.8.0]])
 +
 +**PHPStan** (Ondřej Mirtes): 1st September 2021, has been implemented in [[https://github.com/phpstan/phpstan/releases/tag/0.12.97|0.12.97]].
 +
 +**PhpStorm**: 2022.3 recognises the //literal-string// type ([[https://youtrack.jetbrains.com/issue/WI-64109/literal-string-support-in-phpdoc|WI-64109]]).
  
 **WordPress**: After adding support for escaping field/table names (identifiers) with //%i// ([[https://core.trac.wordpress.org/ticket/52506|#52506]]), and to make //IN (?,?,?)// easier with //%...d// ([[https://core.trac.wordpress.org/ticket/54042|#54042]]), a LiteralString check will be added to the //$query// parameter in //wpdb::prepare()//. **WordPress**: After adding support for escaping field/table names (identifiers) with //%i// ([[https://core.trac.wordpress.org/ticket/52506|#52506]]), and to make //IN (?,?,?)// easier with //%...d// ([[https://core.trac.wordpress.org/ticket/54042|#54042]]), a LiteralString check will be added to the //$query// parameter in //wpdb::prepare()//.
 +
 +**Nettle** (David Grudl): "the literal-string type [is used] with nette/database" ([[https://github.com/nette/database/commit/fb2476b2f7937053a99d30b53c7e5731f6f7b96c|patch]]).
  
 **Doctrine**: While not part of the official Doctrine project, the [[https://github.com/phpstan/phpstan-doctrine|phpstan-doctrine]] extension adds experimental support via bleedingEdge (will probably use a separate flag in the future). **Doctrine**: While not part of the official Doctrine project, the [[https://github.com/phpstan/phpstan-doctrine|phpstan-doctrine]] extension adds experimental support via bleedingEdge (will probably use a separate flag in the future).
Line 388: Line 370:
  
 **RedBean** (Gabor de Mooij): "You can list RedBeanPHP as a supporter, we will implement this into the core." ([[https://github.com/gabordemooij/redbean/pull/873/files|example]]). **RedBean** (Gabor de Mooij): "You can list RedBeanPHP as a supporter, we will implement this into the core." ([[https://github.com/gabordemooij/redbean/pull/873/files|example]]).
- 
-**PhpStorm**: 2022.3 recognises the //literal-string// type ([[https://youtrack.jetbrains.com/issue/WI-64109/literal-string-support-in-phpdoc|WI-64109]]). 
- 
-**Psalm** (Matthew Brown): 13th June 2021 "I was skeptical about the first draft of this RFC when I saw it last month, but now I see the light (especially with the concat changes)". Then on the 14th June, "I've just added support for a //literal-string// type to Psalm: https://psalm.dev/r/9440908f39" ([[https://github.com/vimeo/psalm/releases/tag/4.8.0|4.8.0]]) 
- 
-**PHPStan** (Ondřej Mirtes): 1st September 2021, has been implemented in [[https://github.com/phpstan/phpstan/releases/tag/0.12.97|0.12.97]]. 
  
 ===== Alternatives ===== ===== Alternatives =====
Line 401: Line 377:
 Both [[https://github.com/vimeo/psalm/releases/tag/4.8.0|Psalm]] and [[https://github.com/phpstan/phpstan/releases/tag/0.12.97|PHPStan]] have supported the //literal-string// type since September 2021. Both [[https://github.com/vimeo/psalm/releases/tag/4.8.0|Psalm]] and [[https://github.com/phpstan/phpstan/releases/tag/0.12.97|PHPStan]] have supported the //literal-string// type since September 2021.
  
-While I want more developers to use Static Analysis, it's not realistic to expect all PHP developers to always use these tools, and for all PHP code to be updated so Static Analysis can run the strictest checks, and use no baseline (using the JetBrains surveys; in [[https://www.jetbrains.com/lp/devecosystem-2021/php/#PHP_do-you-use-static-analysis|2021]] only 33% used Static Analysis; and [[https://www.jetbrains.com/lp/devecosystem-2022/php/#what-additional-quality-tools-do-you-use-regularly-if-any-|2022]] shows a similar story with 33% (at best) using PHPStan/Psalm/Phan; where the selected developers for both surveys are 3 times more likely to use Laravel than WordPress).+While I want more programmers to use Static Analysis, it's not realistic to expect all PHP programmers to always use these tools, and for all PHP code to be updated so Static Analysis can run the strictest checks, and use no baseline (using the JetBrains surveys; in [[https://www.jetbrains.com/lp/devecosystem-2021/php/#PHP_do-you-use-static-analysis|2021]] only 33% used Static Analysis; and [[https://www.jetbrains.com/lp/devecosystem-2022/php/#what-additional-quality-tools-do-you-use-regularly-if-any-|2022]] shows a similar story with 33% (at best) using PHPStan/Psalm/Phan; where the selected programmers for both surveys are 3 times more likely to use Laravel than WordPress).
  
-Also, it can be tricky to get current Static Analysis tools to cover every case. For example, they don't currently support [[https://stackoverflow.com/questions/71861442/php-static-analysis-and-recursive-type-checking|recursive type checking]], or [[https://stackoverflow.com/questions/72231302/phpstan-extension-dynamic-return-types-with-value-objects|get a value-object to conditionally return a type]]. In contrast, both are [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/sql-orm.php#L60|easy with the is_literal_string() function]].+Also, it can be tricky to get current Static Analysis tools to cover every case. For example, they don't currently support [[https://stackoverflow.com/questions/71861442/php-static-analysis-and-recursive-type-checking|recursive type checking]], or [[https://stackoverflow.com/questions/72231302/phpstan-extension-dynamic-return-types-with-value-objects|get a value-object to conditionally return a type]]. In contrast, both are [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/sql-orm.php#L60|easy with the LiteralString type]].
  
 ==== Taint Checking ==== ==== Taint Checking ====
Line 417: Line 393:
 All three examples would be incorrectly considered "safe" (untainted). The first two need the values to be quoted. The third example, //htmlentities()// does not escape single quotes by default before PHP 8.1 ([[https://github.com/php/php-src/commit/50eca61f68815005f3b0f808578cc1ce3b4297f0|fixed]]), and it does not consider the issue of 'javascript:' URLs. All three examples would be incorrectly considered "safe" (untainted). The first two need the values to be quoted. The third example, //htmlentities()// does not escape single quotes by default before PHP 8.1 ([[https://github.com/php/php-src/commit/50eca61f68815005f3b0f808578cc1ce3b4297f0|fixed]]), and it does not consider the issue of 'javascript:' URLs.
  
-This is why Psalm, which supports Taint Checking, clearly notes these [[https://psalm.dev/docs/security_analysis/#limitations|limitations]].+This is why Psalm notes these [[https://psalm.dev/docs/security_analysis/#limitations|Taint Checking Limitations]], and suggests using the //literal-string// type. 
 + 
 +==== Abstractions ==== 
 + 
 +Libraries currently accept LiteralStrings like the following: 
 + 
 +<code php> 
 +->field_add('LEFT(ref, (LENGTH(ref) - 3))'
 +</code> 
 + 
 +But the library has no idea when a programmer does something like: 
 + 
 +<code php> 
 +->field_add('LEFT(ref, (LENGTH(ref) - ' . $_GET['cut'] . '))') // INSECURE 
 +</code> 
 + 
 +A LiteralString check would easily identify these mistakes; but an alternative approach would be to replace these simple strings with a full abstraction, where //every// part is either represented by an object, or checked/quoted as appropriate; for example: 
 + 
 +<code php> 
 +->field_add(new Func('LEFT', 'ref', new Calc(new Func('LENGTH', 'ref'), '-', new Value(3)))) 
 +</code> 
 + 
 +The [[https://github.com/tpetry/laravel-query-expressions|Laravel Query Expressions]] package does this. 
 + 
 +While this does allow for additional checks (e.g. static analysis), it's unlikely many programmers will adopt, as it's difficult to write (and later read); in the same way developers are more likely to use //DOMDocument::loadHTML()// rather than add every element via //DOMDocument::createElement()//, //DOMDocument::createAttribute()//, etc. 
 + 
 +==== Tagged Templates ==== 
 + 
 +In JavaScript, there is a form of Template Literal known as [[https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates|Tagged Templates]]. 
 + 
 +Available since ~2015 (Firefox 34, Chrome 41, NodeJS 4); where libraries should use [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]] (NodeJS can use [[https://www.npmjs.com/package/is-template-object|is-template-object]]) to ensure the function is called correctly ([[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm/index.js|example]]). 
 + 
 +<code javascript> 
 +function example(strings, ...values) { 
 +    if (isTemplateObject(strings)) { 
 +       throw new Error('Not a Tagged Template'); 
 +    } 
 +    return strings[0] + values[0] + strings[1] + values[1] + strings[2]; 
 +
 + 
 +var id = 123, 
 +    field = 'name', 
 +    sql = example`WHERE id = ${id} ORDER BY ${field}`; // The Template 
 + 
 +console.log(sql); 
 +</code> 
 + 
 +PHP cannot use //`// (execute shell command), but could use //```// (which can be tricky for MarkDown). 
 + 
 +Instead of calling a function directly, PHP could create a //TemplateLiteral// object, providing methods like //getStringParts()// and //getValues()//, so the object can be passed to a library to check and use. 
 + 
 +By using a //TemplateLiteral// object, it would be possible to concatenate with //$a = ```{$a} b```// (e.g. to conditionally add SQL/HTML, or help readability); but other forms of concatenation would be up for debate, e.g. 
 + 
 +<code php> 
 +$sql = ```{$sql} AND category = {$category}```; 
 + 
 +$sql = ```deleted ``` . ($archive ? ```IS NOT NULL``` :  ```IS NULL```); // Maybe? 
 + 
 +if ($name) { 
 +  $sql .= ``` AND name = {$name}```; // Maybe? 
 +
 +</code> 
 + 
 +Tagged Templates might be a nice feature to have (sometimes they can be easier to read), but assuming a //__toString()// method is provided, we must also consider mis-use; e.g. in JavaScript, basic Template Literals have made it much easier for developers to create XSS vulnerabilities, where developers often don't think about HTML encoding in this context: 
 + 
 +<code javascript> 
 +p.innerHTML = `Hi ${name}`; // INSECURE 
 +</code> 
 + 
 +Consideration would be needed on if/how Tagged Templates could protect functions like //mysqli_query()//; e.g. only accept if the Tagged Template uses no variables? or could PDO, MySQLi, ODBC, etc provide Value-Objects for Identifiers? In comparison, a LiteralString can simply be accepted - so code that already uses LiteralString's would not need any modification (see Future Scope for special cases). 
 + 
 +Also, considering developers often (incorrectly) believe their Database Abstractions or Parameterised Queries have completely solved Injection Vulnerabilities, it would be very unlikely to get //all// developers to replace //all// of their existing LiteralStrings with Tagged Templates (note how few libraries use this in NodeJS). 
 + 
 +While changing the quote character is fairly easy, it's tricky to automate, time-consuming, and risky for those without tests (a typical project can easily require thousands of lines of code to be changed). Any escaping functions would still need to be removed (so no advantage there). Variables for Identifiers (e.g. field-name) in SQL Tagged Templates would need to be considered, and developers will need to wait until PHP 8.X is their minimum supported version. 
 + 
 +[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/alternatives/tagged-templates.php|Example]] / [[https://github.com/craigfrancis/php-is-literal-rfc/commit/1dc5f4fb425009d03a640036a1022f88c4a0533d?diff=unified|Diff]] 
 + 
 +[[https://docs.hhvm.com/hack/XHP/introduction|XHP]] in Hack / HHVM is similar, where it introduces an XML-like syntax that can be used for HTML templating. 
 + 
 +==== Macros ==== 
 + 
 +In Rust it's possible to use [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/others/rust|procedural macros]], e.g. 
 + 
 +<code rust> 
 +html_add!("<p>Hello <span>?</span></p>"); 
 +</code> 
 + 
 +Macros are run during compilation (when user values are not present), and can replace the code within the brackets. In this case the macro could check the contents, and if it's considered safe, change the code to call a method provided by the library with "unsafe" in its name. While developers could call the unsafe method directly, they are at least aware they are doing something unsafe, and can be easily found during an audit. 
 + 
 +Macros might be a nice feature to have; but it can get complicated for libraries to check the AST; getting developers to replace their existing LiteralStrings to use Macros is unlikely (as noted with Tagged Templates); and without operator overloads ([[https://wiki.php.net/rfc/user_defined_operator_overloads|1]]/[[https://wiki.php.net/rfc/userspace_operator_overloading|2]]), concatenation would need to be handled within the macro: 
 + 
 +<code diff> 
 +- $where_sql .= ' AND deleted IS NULL'; 
 ++ $where_sql = sql!($where_sql . ' AND deleted IS NULL'); 
 +or 
 ++ sql!($where_sql .= ' AND deleted IS NULL'); 
 +</code> 
 + 
 +[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/alternatives/macro.php|Example]] / [[https://github.com/craigfrancis/php-is-literal-rfc/commit/1f2baaebaf1dea6d5886c7e6e14e2b4f6dd179a5?diff=unified|Diff]]
  
 ==== Education ==== ==== Education ====
  
-Developer training simply does not scale, and mistakes still happen.+Training simply does not scale, and mistakes still happen.
  
 We cannot expect everyone to have formal training, know everything from Day 1, and consider programming a full time job. We want new programmers, with a variety of experiences, ages, and backgrounds. Everyone should be guided to do the right thing, and notified as soon as they make a mistake (we all make mistakes). We also need to acknowledge that many programmers are busy, do copy/paste code, don't necessarily understand what it does, edit it for their needs, then simply move on to their next task. We cannot expect everyone to have formal training, know everything from Day 1, and consider programming a full time job. We want new programmers, with a variety of experiences, ages, and backgrounds. Everyone should be guided to do the right thing, and notified as soon as they make a mistake (we all make mistakes). We also need to acknowledge that many programmers are busy, do copy/paste code, don't necessarily understand what it does, edit it for their needs, then simply move on to their next task.
Line 441: Line 515:
 **Rust** can use a "[[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/others/rust|procedural macro]]", to check the provided value is a literal at compile-time. **Rust** can use a "[[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/others/rust|procedural macro]]", to check the provided value is a literal at compile-time.
  
-**Node** has the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm/index.js|is-template-object polyfill]], which checks a tag function was provided a "tagged template literal" (this technique is used in [[https://www.npmjs.com/package/safesql|safesql]], via [[https://www.npmjs.com/package/template-tag-common|template-tag-common]]). Alternatively Node developers can use [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm-closure-library/index.js|goog.string.Const]] from Google's Closure Library.+**Node** has the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm/index.js|is-template-object polyfill]], which checks a tag function was provided a "tagged template literal" (this technique is used in [[https://www.npmjs.com/package/safesql|safesql]], via [[https://www.npmjs.com/package/template-tag-common|template-tag-common]]). Alternatively Node programmers can use [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm-closure-library/index.js|goog.string.Const]] from Google's Closure Library.
  
 **JavaScript** is getting [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]], for "Distinguishing strings from a trusted developer from strings that may be attacker controlled" (intended to be [[https://github.com/mikewest/tc39-proposal-literals|used with Trusted Types]]). **JavaScript** is getting [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]], for "Distinguishing strings from a trusted developer from strings that may be attacker controlled" (intended to be [[https://github.com/mikewest/tc39-proposal-literals|used with Trusted Types]]).
Line 458: Line 532:
   * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).   * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).
  
-Last year I wrote the [[https://wiki.php.net/rfc/is_literal|is_literal() RFC]], where the feedback was:+In 2021 I wrote the [[https://wiki.php.net/rfc/is_literal|is_literal() RFC]], where the feedback was:
  
   * "Ideally we would want to assign a variable to be of 'literal' type." [[https://externals.io/message/115306#115308|George P. Banyard]] (covered by this RFC).   * "Ideally we would want to assign a variable to be of 'literal' type." [[https://externals.io/message/115306#115308|George P. Banyard]] (covered by this RFC).
Line 468: Line 542:
   * "Bad code should better be fixed through better documentation." (we've tried that, and mistakes still happen).   * "Bad code should better be fixed through better documentation." (we've tried that, and mistakes still happen).
   * "I think libraries are very unlikely to adopt" (they have, see above).   * "I think libraries are very unlikely to adopt" (they have, see above).
-  * "you can't even trust is_literal() [due to] file_put_contents("data.php", "<?php return $_GET[id];"); $id = require "data.php";" (I doubt any developer will do this by accident).+  * "you can't even trust is_literal() [due to] file_put_contents("data.php", "<?php return $_GET[id];"); $id = require "data.php";" (I doubt any programmer will do this by accident).
   * "I don't believe we should expect security or maintainability without (all together): proper education + peer reviewing + static analysis." (these should still happen).   * "I don't believe we should expect security or maintainability without (all together): proper education + peer reviewing + static analysis." (these should still happen).
   * "in the real world, you're going to start seeing cases where something is "literal enough", but doesn't pass the is_literal test" (examples were asked for, but no response).   * "in the real world, you're going to start seeing cases where something is "literal enough", but doesn't pass the is_literal test" (examples were asked for, but no response).
Line 474: Line 548:
 I also agree with [[https://news-web.php.net/php.internals/87400|Scott Arciszewski]], "SQL injection is almost a solved problem [by using] prepared statements", where LiteralString identifies when user input is accidentally included in the SQL string. I also agree with [[https://news-web.php.net/php.internals/87400|Scott Arciszewski]], "SQL injection is almost a solved problem [by using] prepared statements", where LiteralString identifies when user input is accidentally included in the SQL string.
  
-On a technical note, the implementation avoids situations that could have confused the developer, by using the Lexer to mark strings as a LiteralString (i.e. earlier in the process):+On a technical note, the implementation avoids situations that could have confused the programmer, by using the Lexer to mark strings as a LiteralString (i.e. earlier in the process):
  
 <code php> <code php>
Line 513: Line 587:
 ===== Open Issues ===== ===== Open Issues =====
  
-None+Additional testing of the final implementation; including extensions like [[https://www.swoole.com/|Swoole]] or [[https://openswoole.com/|OpenSwoole]]. 
 + 
 +Should //eval()// be unable to create a LiteralString, or is too similar to: 
 + 
 +<code php> 
 +$id = ($_GET['id'] ?? NULL); 
 +$file = tempnam(sys_get_temp_dir(), 'literal-string'); 
 +file_put_contents($file, '<'.'?php return '.var_export(strval($id),true).';'); 
 +$id = require($file); 
 +unlink($file); 
 +</code>
  
 ===== Future Scope ===== ===== Future Scope =====
Line 519: Line 603:
 1) We might re-look at //sprintf()// being able to return a LiteralString. 1) We might re-look at //sprintf()// being able to return a LiteralString.
  
-2) As noted by MarkR, the biggest benefit will come when this flag can be used by PDO and similar functions (//mysqli_query//, //preg_match//, //exec//, etc).+2) We might re-look at //LiteralInteger//. While this is unlikely, as it would change the zval structure, it might be possible if there is enough demand. It would also need a discussion on what happens with other operations, e.g. integer addition. 
 + 
 +3) As noted by MarkR, the biggest benefit will come when this flag can be used by PDO and similar functions (//mysqli_query//, //preg_match//, //exec//, etc).
  
 However, first we need libraries to start checking the relevant inputs are a LiteralString. The library can then do their thing, and apply the appropriate escaping, which can result in a value that no longer has the LiteralString flag set, but is perfectly safe for the native functions. However, first we need libraries to start checking the relevant inputs are a LiteralString. The library can then do their thing, and apply the appropriate escaping, which can result in a value that no longer has the LiteralString flag set, but is perfectly safe for the native functions.
  
-With a future RFC, we could introduce checks for the native functions. For example, if we use the [[https://eiv.dev/trusted-types/|Trusted Types]] concept from JavaScript, the libraries could create a stringable ValueObject as their output. These objects can be added to a list of safe objects for the relevant native functions. The native functions could then **warn** developers when they do not receive a value with the LiteralString flag, or one of the safe objects. These warnings would **not break anything**, they just make developers aware of any mistakes they have made, and we will always need a way of switching them off entirely (e.g. phpMyAdmin).+With a future RFC, we could introduce checks for the native functions. For example, if we use the [[https://eiv.dev/trusted-types/|Trusted Types]] concept from JavaScript, the libraries could create a stringable ValueObject as their output. These objects can be added to a list of safe objects for the relevant native functions. The native functions could then **warn** programmers when they do not receive a value with the LiteralString flag, or one of the safe objects. These warnings would **not break anything**, they just make programmers aware of any mistakes they have made, and we will always need a way of switching them off entirely (e.g. phpMyAdmin).
  
 ===== Voting ===== ===== Voting =====
rfc/literal_string.1676807183.txt.gz · Last modified: 2023/02/19 11:46 by craigfrancis