rfc:literal_string

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:literal_string [2023/03/13 10:34] – Update on Tagged Template concatenation craigfrancisrfc:literal_string [2023/04/20 12:18] (current) – Notes on integer/string-concat in Python/Go | Note XHP for templating | Note use of eval | Note future scope for LiteralInteger craigfrancis
Line 5: Line 5:
   * Voting End: ???   * Voting End: ???
   * RFC Started: 2022-12-27   * RFC Started: 2022-12-27
-  * RFC Updated: 2022-12-27+  * RFC Updated: 2023-03-16
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
   * Contributors: Joe Watkins, Máté Kocsis   * Contributors: Joe Watkins, Máté Kocsis
Line 17: Line 17:
 Add //LiteralString// type, and //is_literal_string()//, to "distinguish strings from a trusted developer, from strings that may be attacker controlled". Add //LiteralString// type, and //is_literal_string()//, to "distinguish strings from a trusted developer, from strings that may be attacker controlled".
  
-This ensures the value cannot be a source of an Injection Vulnerabilitybecause it does not contain user input.+The vast majority of Injection Vulnerabilities involving libraries (e.g. database abstractions) are due to programmers using the library incorrectly. A simple LiteralString check would allow libraries to easily identify these mistakeswithout needing to make massive changes. 
 + 
 +It also allows developers to easily check their Parameterised Queries.
  
 The //LiteralString// type has been added to Python 3.11 via [[https://peps.python.org/pep-0675/|PEP 675]]. The //LiteralString// type has been added to Python 3.11 via [[https://peps.python.org/pep-0675/|PEP 675]].
Line 25: Line 27:
 ===== The Problem ===== ===== The Problem =====
  
-Developers often believe Database Abstractions or Parameterised Queries have completely solved Injection and Cross-Site Scripting (XSS) vulnerabilities; and while the situation has improved, they still happen:+Developers often believe Database Abstractions or Parameterised Queries have completely solved Injection and Cross-Site Scripting (XSS) vulnerabilities; and while the situation has improved, mistakes still happen:
  
 <code php> <code php>
Line 141: Line 143:
 } }
 </code> </code>
 +
 +[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/python/integers.py|Python]] and [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/go/integers.go|Go]] support string concatenation as well.
  
 (On a technical note, we did test an implementation that didn't support concatenation, primarily to see if this would help reduce the performance impact even further. However, the PHP engine can sometimes still concatenate values automatically at compile-time (so concatenation appears to work in some contexts), and it didn't make much (if any) difference in regards to performance, because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// (which needs to remove the //LiteralString// flag) and "zend_vm_def.h" does the same; by supporting a quick concat with an empty string (x2), which would need its flag removed as well). (On a technical note, we did test an implementation that didn't support concatenation, primarily to see if this would help reduce the performance impact even further. However, the PHP engine can sometimes still concatenate values automatically at compile-time (so concatenation appears to work in some contexts), and it didn't make much (if any) difference in regards to performance, because //concat_function()// in "zend_operators.c" uses //zend_string_extend()// (which needs to remove the //LiteralString// flag) and "zend_vm_def.h" does the same; by supporting a quick concat with an empty string (x2), which would need its flag removed as well).
Line 220: Line 224:
 $sql .= ' ORDER BY ' . ($fields[$sort] ?? 'u.full_name'); // A LiteralString $sql .= ' ORDER BY ' . ($fields[$sort] ?? 'u.full_name'); // A LiteralString
 </code> </code>
 +
 +This approach stops the attacker specifying a private field (e.g. //telephone_number//, where they can determine every users telephone number by updating their own account, and seeing how that affects the order).
  
 There may be some exceptions, see the next section. There may be some exceptions, see the next section.
Line 299: Line 305:
  
 Due to this limitation, we did consider an approach to trust all integers, where Scott Arciszewski suggested the name //is_noble()//. While this is not as philosophically pure, we continued to explore this possibility because we could not find any way an Injection Vulnerability could be introduced with integers in SQL, HTML, CLI; and other contexts as well (e.g. preg, mail additional_params, XPath query, and even eval). We could not find any character encoding issues either (The closest we could find was EBCDIC, an old IBM character encoding, which encodes the 0-9 characters differently; which anyone using it would need to re-encode either way, and [[https://www.php.net/manual/en/migration80.other-changes.php#migration80.other-changes.ebcdic|EBCDIC is not supported by PHP]]). And we could not find any issue with a 64bit PHP server sending a large number to a 32bit database, because the number is being encoded as characters in a string (so that's also fine). However, the feedback received was that while safe from Injection Vulnerabilities, it becomes a more complex concept, one that might cause programmers to assume it is also safe from programmer/logic errors. Ultimately the preference was the simpler approach, that did not allow any integers (which is reinforced with the name LiteralString). Due to this limitation, we did consider an approach to trust all integers, where Scott Arciszewski suggested the name //is_noble()//. While this is not as philosophically pure, we continued to explore this possibility because we could not find any way an Injection Vulnerability could be introduced with integers in SQL, HTML, CLI; and other contexts as well (e.g. preg, mail additional_params, XPath query, and even eval). We could not find any character encoding issues either (The closest we could find was EBCDIC, an old IBM character encoding, which encodes the 0-9 characters differently; which anyone using it would need to re-encode either way, and [[https://www.php.net/manual/en/migration80.other-changes.php#migration80.other-changes.ebcdic|EBCDIC is not supported by PHP]]). And we could not find any issue with a 64bit PHP server sending a large number to a 32bit database, because the number is being encoded as characters in a string (so that's also fine). However, the feedback received was that while safe from Injection Vulnerabilities, it becomes a more complex concept, one that might cause programmers to assume it is also safe from programmer/logic errors. Ultimately the preference was the simpler approach, that did not allow any integers (which is reinforced with the name LiteralString).
 +
 +[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/python/integers.py|Python]] and [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/go/integers.go|Go]] do not support integers either.
  
 ==== FAQ: Other Values ==== ==== FAQ: Other Values ====
Line 371: Line 379:
 While I want more programmers to use Static Analysis, it's not realistic to expect all PHP programmers to always use these tools, and for all PHP code to be updated so Static Analysis can run the strictest checks, and use no baseline (using the JetBrains surveys; in [[https://www.jetbrains.com/lp/devecosystem-2021/php/#PHP_do-you-use-static-analysis|2021]] only 33% used Static Analysis; and [[https://www.jetbrains.com/lp/devecosystem-2022/php/#what-additional-quality-tools-do-you-use-regularly-if-any-|2022]] shows a similar story with 33% (at best) using PHPStan/Psalm/Phan; where the selected programmers for both surveys are 3 times more likely to use Laravel than WordPress). While I want more programmers to use Static Analysis, it's not realistic to expect all PHP programmers to always use these tools, and for all PHP code to be updated so Static Analysis can run the strictest checks, and use no baseline (using the JetBrains surveys; in [[https://www.jetbrains.com/lp/devecosystem-2021/php/#PHP_do-you-use-static-analysis|2021]] only 33% used Static Analysis; and [[https://www.jetbrains.com/lp/devecosystem-2022/php/#what-additional-quality-tools-do-you-use-regularly-if-any-|2022]] shows a similar story with 33% (at best) using PHPStan/Psalm/Phan; where the selected programmers for both surveys are 3 times more likely to use Laravel than WordPress).
  
-Also, it can be tricky to get current Static Analysis tools to cover every case. For example, they don't currently support [[https://stackoverflow.com/questions/71861442/php-static-analysis-and-recursive-type-checking|recursive type checking]], or [[https://stackoverflow.com/questions/72231302/phpstan-extension-dynamic-return-types-with-value-objects|get a value-object to conditionally return a type]]. In contrast, both are [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/sql-orm.php#L60|easy with the is_literal_string() function]].+Also, it can be tricky to get current Static Analysis tools to cover every case. For example, they don't currently support [[https://stackoverflow.com/questions/71861442/php-static-analysis-and-recursive-type-checking|recursive type checking]], or [[https://stackoverflow.com/questions/72231302/phpstan-extension-dynamic-return-types-with-value-objects|get a value-object to conditionally return a type]]. In contrast, both are [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/sql-orm.php#L60|easy with the LiteralString type]].
  
 ==== Taint Checking ==== ==== Taint Checking ====
Line 401: Line 409:
 </code> </code>
  
-While a LiteralString check would easily identify these mistakes; an alternative approach would be to replace these simple strings with a full abstraction, where every part is either represented by an object, or checked/quoted as appropriate; for example:+LiteralString check would easily identify these mistakes; but an alternative approach would be to replace these simple strings with a full abstraction, where //every// part is either represented by an object, or checked/quoted as appropriate; for example:
  
 <code php> <code php>
Line 407: Line 415:
 </code> </code>
  
-I'm fairly sure this won't be adopted by many programmers, as it'too difficult to write (and later read); in the same way developers are more likely to use //DOMDocument::loadHTML()// rather than add every element via //DOMDocument::createElement()//, //DOMDocument::createAttribute()//, etc.+The [[https://github.com/tpetry/laravel-query-expressions|Laravel Query Expressions]] package does this
 + 
 +While this does allow for additional checks (e.g. static analysis), it's unlikely many programmers will adopt, as it's difficult to write (and later read); in the same way developers are more likely to use //DOMDocument::loadHTML()// rather than add every element via //DOMDocument::createElement()//, //DOMDocument::createAttribute()//, etc.
  
 ==== Tagged Templates ==== ==== Tagged Templates ====
Line 413: Line 423:
 In JavaScript, there is a form of Template Literal known as [[https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates|Tagged Templates]]. In JavaScript, there is a form of Template Literal known as [[https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates|Tagged Templates]].
  
-Available since Firefox 34, Chrome 41, NodeJS 4 (~2015); where libraries need to use [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]] (NodeJS can use [[https://www.npmjs.com/package/is-template-object|is-template-object]]) to make sure the function is called correctly ([[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm/index.js|example]]).+Available since ~2015 (Firefox 34, Chrome 41, NodeJS 4); where libraries should use [[https://github.com/tc39/proposal-array-is-template-object|isTemplateObject]] (NodeJS can use [[https://www.npmjs.com/package/is-template-object|is-template-object]]) to ensure the function is called correctly ([[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/others/npm/index.js|example]]).
  
 <code javascript> <code javascript>
Line 425: Line 435:
 var id = 123, var id = 123,
     field = 'name',     field = 'name',
-    sql = example`WHERE id = ${id} ORDER BY ${field}`;+    sql = example`WHERE id = ${id} ORDER BY ${field}`; // The Template
  
 console.log(sql); console.log(sql);
 </code> </code>
  
-PHP cannot use //`// (execute shell command), but could use //```//.+PHP cannot use //`// (execute shell command), but could use //```// (which can be tricky for MarkDown).
  
 Instead of calling a function directly, PHP could create a //TemplateLiteral// object, providing methods like //getStringParts()// and //getValues()//, so the object can be passed to a library to check and use. Instead of calling a function directly, PHP could create a //TemplateLiteral// object, providing methods like //getStringParts()// and //getValues()//, so the object can be passed to a library to check and use.
  
-Concatenation can help readability, and to conditionally add SQL/HTML. By using a TemplateLiteral object, it would be possible to concatenate Templates with //$a = ```$a b```//, and supporting the concatenation in other ways would be up for debate:+By using a //TemplateLiteral// object, it would be possible to concatenate with //$a = ```{$ab```// (e.g. to conditionally add SQL/HTMLor help readability); but other forms of concatenation would be up for debate, e.g.
  
 <code php> <code php>
-$sql = ```$sql AND category = $category```;+$sql = ```{$sqlAND category = {$category}```;
  
-$sql = ```deleted ``` . ($archive ? ```IS NOT NULL``` :  ```IS NULL```);+$sql = ```deleted ``` . ($archive ? ```IS NOT NULL``` :  ```IS NULL```); // Maybe?
  
 if ($name) { if ($name) {
-    $sql .= ``` AND name = $name```;+  $sql .= ``` AND name = {$name}```; // Maybe?
 } }
 </code> </code>
  
-Tagged Templates might be a nice feature to have, but they are unlikely to be used by many developers to solve Injection Vulnerabilities (as is the case in NodeJS). Developers often believe their Database Abstractions or Parameterised Queries have completely solved Injection Vulnerabilities (but mistakes happen). Thereforegetting //all// developers to replace //all// of their sensitive LiteralStrings with Templates is unlikely. While changing the quote character is fairly easy, it is time-consuming, and risky for those without tests. In both cases any escaping functions would need to be removed (so no advantage there). Also, libraries wouldn'be able to use until PHP 8.X is the minimum supported version; and consideration would need to be given for things like identifying field-name variables in SQL.+Tagged Templates might be a nice feature to have (sometimes they can be easier to read), but assuming a //__toString()// method is provided, we must also consider mis-use; e.g. in JavaScript, basic Template Literals have made it much easier for developers to create XSS vulnerabilities, where developers often don't think about HTML encoding in this context: 
 + 
 +<code javascript> 
 +p.innerHTML = `Hi ${name}`; // INSECURE 
 +</code> 
 + 
 +Consideration would be needed on if/how Tagged Templates could protect functions like //mysqli_query()//; e.g. only accept if the Tagged Template uses no variables? or could PDO, MySQLi, ODBC, etc provide Value-Objects for Identifiers? In comparison, a LiteralString can simply be accepted - so code that already uses LiteralString's would not need any modification (see Future Scope for special cases). 
 + 
 +Also, considering developers often (incorrectly) believe their Database Abstractions or Parameterised Queries have completely solved Injection Vulnerabilities, it would be very unlikely to get //all// developers to replace //all// of their existing LiteralStrings with Tagged Templates (note how few libraries use this in NodeJS). 
 + 
 +While changing the quote character is fairly easy, it's tricky to automate, time-consuming, and risky for those without tests (a typical project can easily require thousands of lines of code to be changed)Any escaping functions would still need to be removed (so no advantage there). Variables for Identifiers (e.g. field-name) in SQL Tagged Templates would need to be considered, and developers will need to wait until PHP 8.X is their minimum supported version.
  
 [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/alternatives/tagged-templates.php|Example]] / [[https://github.com/craigfrancis/php-is-literal-rfc/commit/1dc5f4fb425009d03a640036a1022f88c4a0533d?diff=unified|Diff]] [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/alternatives/tagged-templates.php|Example]] / [[https://github.com/craigfrancis/php-is-literal-rfc/commit/1dc5f4fb425009d03a640036a1022f88c4a0533d?diff=unified|Diff]]
 +
 +[[https://docs.hhvm.com/hack/XHP/introduction|XHP]] in Hack / HHVM is similar, where it introduces an XML-like syntax that can be used for HTML templating.
  
 ==== Macros ==== ==== Macros ====
Line 460: Line 482:
 Macros are run during compilation (when user values are not present), and can replace the code within the brackets. In this case the macro could check the contents, and if it's considered safe, change the code to call a method provided by the library with "unsafe" in its name. While developers could call the unsafe method directly, they are at least aware they are doing something unsafe, and can be easily found during an audit. Macros are run during compilation (when user values are not present), and can replace the code within the brackets. In this case the macro could check the contents, and if it's considered safe, change the code to call a method provided by the library with "unsafe" in its name. While developers could call the unsafe method directly, they are at least aware they are doing something unsafe, and can be easily found during an audit.
  
-But, checking the AST can get complicated for libraries; getting developers to replace their existing LiteralStrings to use Macros is unlikely (same issue as Tagged Templates); and without operator overloads ([[https://wiki.php.net/rfc/user_defined_operator_overloads|1]]/[[https://wiki.php.net/rfc/userspace_operator_overloading|2]]), concatenation would need to be handled within the macro:+Macros might be a nice feature to have; but it can get complicated for libraries to check the AST; getting developers to replace their existing LiteralStrings to use Macros is unlikely (as noted with Tagged Templates); and without operator overloads ([[https://wiki.php.net/rfc/user_defined_operator_overloads|1]]/[[https://wiki.php.net/rfc/userspace_operator_overloading|2]]), concatenation would need to be handled within the macro:
  
 <code diff> <code diff>
 - $where_sql .= ' AND deleted IS NULL'; - $where_sql .= ' AND deleted IS NULL';
 + $where_sql = sql!($where_sql . ' AND deleted IS NULL'); + $where_sql = sql!($where_sql . ' AND deleted IS NULL');
 +or
 ++ sql!($where_sql .= ' AND deleted IS NULL');
 </code> </code>
  
Line 508: Line 532:
   * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).   * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).
  
-Last year I wrote the [[https://wiki.php.net/rfc/is_literal|is_literal() RFC]], where the feedback was:+In 2021 I wrote the [[https://wiki.php.net/rfc/is_literal|is_literal() RFC]], where the feedback was:
  
   * "Ideally we would want to assign a variable to be of 'literal' type." [[https://externals.io/message/115306#115308|George P. Banyard]] (covered by this RFC).   * "Ideally we would want to assign a variable to be of 'literal' type." [[https://externals.io/message/115306#115308|George P. Banyard]] (covered by this RFC).
Line 563: Line 587:
 ===== Open Issues ===== ===== Open Issues =====
  
-None+Additional testing of the final implementation; including extensions like [[https://www.swoole.com/|Swoole]] or [[https://openswoole.com/|OpenSwoole]]. 
 + 
 +Should //eval()// be unable to create a LiteralString, or is too similar to: 
 + 
 +<code php> 
 +$id = ($_GET['id'] ?? NULL); 
 +$file = tempnam(sys_get_temp_dir(), 'literal-string'); 
 +file_put_contents($file, '<'.'?php return '.var_export(strval($id),true).';'); 
 +$id = require($file); 
 +unlink($file); 
 +</code>
  
 ===== Future Scope ===== ===== Future Scope =====
Line 569: Line 603:
 1) We might re-look at //sprintf()// being able to return a LiteralString. 1) We might re-look at //sprintf()// being able to return a LiteralString.
  
-2) As noted by MarkR, the biggest benefit will come when this flag can be used by PDO and similar functions (//mysqli_query//, //preg_match//, //exec//, etc).+2) We might re-look at //LiteralInteger//. While this is unlikely, as it would change the zval structure, it might be possible if there is enough demand. It would also need a discussion on what happens with other operations, e.g. integer addition. 
 + 
 +3) As noted by MarkR, the biggest benefit will come when this flag can be used by PDO and similar functions (//mysqli_query//, //preg_match//, //exec//, etc).
  
 However, first we need libraries to start checking the relevant inputs are a LiteralString. The library can then do their thing, and apply the appropriate escaping, which can result in a value that no longer has the LiteralString flag set, but is perfectly safe for the native functions. However, first we need libraries to start checking the relevant inputs are a LiteralString. The library can then do their thing, and apply the appropriate escaping, which can result in a value that no longer has the LiteralString flag set, but is perfectly safe for the native functions.
rfc/literal_string.1678703661.txt.gz · Last modified: 2023/03/13 10:34 by craigfrancis