rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
rfc:is_literal [2020/03/21 17:38] – Remove edit link craigfrancisrfc:is_literal [2021/04/18 16:55] – some suggested rewording imsop
Line 1: Line 1:
 ====== PHP RFC: Is Literal Check ====== ====== PHP RFC: Is Literal Check ======
  
-  * Version: 0.1+  * Version: 0.4
   * Date: 2020-03-21   * Date: 2020-03-21
 +  * Updated: 2021-02-19
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
   * Status: Draft   * Status: Draft
   * First Published at: https://wiki.php.net/rfc/is_literal   * First Published at: https://wiki.php.net/rfc/is_literal
 +  * GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc
  
 ===== Introduction ===== ===== Introduction =====
  
-Add an //is_literal()// functionso developers/frameworks can be sure they are working with safe value one created from one or more literalsdefined within PHP scripts.+This RFC proposes a new function, //is_literal(string $string)//, to help enforce separation of hard-coded logic from user-supplied data. This addresses some of the same use cases as "taint flags", but is both simpler and stricter: it does not address how user data is transmitted or escapedonly whether it has been passed to a particular library function separately from the fixed data.
  
-This allows developers/frameworks, at runtimeto warn or block SQL Injection, Command Line Injectionand many cases of HTML Injection.+The clearest example is a database library which supports parametrised queries at the driver level. The correct usage would be something like ''$db->query("Select * From users Where id = ?"[$_GET['id']]);'' but the user could also write ''$db->query("Select * From users Where id = " . $_GET['id']);'' By rejecting the SQL if it was not written as a literalthe library can provide extra protection against this incorrect use.
  
-It allows commands to be tested, to ensure they are a "programmer supplied constant/static/validated string", and all other unsafe variables are provided separately (as noted by [[https://news-web.php.net/php.internals/87725|Yasuo Ohgaki]]). 
  
-This will also allow systems/frameworks to decide if they want to **block**, **educate** (via a notice), or **ignore** these issues (to avoid the "don't nanny" concern raised by [[https://news-web.php.net/php.internals/87383|Lester Caine]]). +===== Examples =====
- +
-===== Related JavaScript Implementation ===== +
- +
-This proposal is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. +
- +
-https://github.com/tc39/proposal-array-is-template-object\\ +
-https://github.com/mikewest/tc39-proposal-literals +
- +
-They are looking at "Distinguishing strings from a trusted developer, from strings that may be attacker controlled"+
- +
-===== Taint Checking ===== +
- +
-Xinchen Hui has done some amazing work with the Taint extension: +
- +
-https://github.com/laruence/taint +
- +
-Unfortunately this approach does not address all issues, mainly because it still allows string escaping, which is only "[[https://www.php.net/manual/en/pdo.quote.php|Theoretically Safe]]" (typically due to character encoding issues), nor does it address issues such as missing quotes: +
- +
-  $sql = 'DELETE FROM table WHERE id = ' . mysqli_real_escape_string($db, $_GET['id']); +
-   +
-  // delete.php?id=id +
-   +
-  // DELETE FROM table WHERE id = id+
  
-  $html = '<img src=' . htmlentities($_GET['url']) />'; +The [[https://www.doctrine-project.org/projects/doctrine-orm/en/current/reference/query-builder.html#high-level-api-methods|Doctrine Query Builder]] allows custom Where clauses to be provided as strings. This is intended for use with literals and placeholders, but does not protect against this simple mistake:
-   +
-  // example.php?url=x%20onerror=alert(cookie) +
-   +
-  // <img src=x onerror=alert(cookie) />+
  
-The Taint extension also [[https://github.com/laruence/taint/blob/4a6c4cb2613e27f5604d2021802c144a954caff8/taint.c#L63|conflicts with XDebug]] (sorry Derick),+<code php> 
 +// INSECURE 
 +$qb->select('u'
 +   ->from('User', 'u'
 +   ->where('u.id = ' $_GET['id']) 
 +</code>
  
-===== Previous RFC =====+The definition of the ''where'' method could check with ''is_literal'' and throw an exception advising the programmer to replace it with a safer use of placeholders:
  
-Matt Tait suggested [[https://wiki.php.net/rfc/sql_injection_protection||Automatic SQL Injection Protection]].+<code php
 +$qb->select('u'
 +   ->from('User', 'u'
 +   ->where('u.id = :identifier'
 +   ->setParameter('identifier', $_GET['id']); 
 +</code>
  
-It was noted that "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]])and this amount of work isn't ideal for "just for one use case" ([[https://news-web.php.net/php.internals/87647|Julien Pauli]]).+Similarly, Twig allows [[https://twig.symfony.com/doc/2.x/recipes.html#loading-a-template-from-a-string|loading a template from a string]], which could allow accidentally skipping the default escaping functionality:
  
-Where it would have effected every SQL function, such as //mysqli_query()//, //$pdo->query()//, //odbc_exec()//, etc (concerns raised by [[https://news-web.php.net/php.internals/87436|Lester Caine]] and [[https://news-web.php.net/php.internals/87650|Anthony Ferrara]]).+<code php> 
 +// INSECURE 
 +echo $twig->createTemplate('<p>Hi ' . $_GET['name''</p>')->render(); 
 +</code>
  
-And each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass rawuser submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).+If ''createTemplate'' checked with ''is_literal''the programmer could be advised to write this instead:
  
-I also agree that "SQL injection is almost a solved problem [by using] prepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), but we do need something that can identify mistakes, ideally at runtime.+<code php> 
 +echo $twig->createTemplate('<p>Hi {{ name }}</p>')->render(['name' => $_GET['name']])
 +</code>
  
 ===== Proposal ===== ===== Proposal =====
  
-Add an //is_literal()// function to check if a given variable has only been created by Literal(s).+A literal is defined as any value which is entirely under the control of the programmer. The value may be passed between functions, as long as it is not modified in any way other than string concatenation.
  
-This uses a similar definition as the [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]] by Matt Tait, but it does not need to accept Integer or FloatingPoint variables as safe (unless it makes the implementation easier), nor should it effect any existing functions.+<code php
 +is_literal('Example'); // true
  
-Thanks to [[https://news-web.php.net/php.internals/87396|Xinchen Hui]], we know the PHP5 Taint extension was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier".+$a = 'Example'; 
 +is_literal($a); // true
  
-Unlike the Taint extension, there is no need to provide an equivalent //untaint()// function.+is_literal(4); // true 
 +is_literal(0.3)// true 
 +is_literal('a' 'b'); // true, compiler can concatenate
  
-===== Examples =====+$a 'A'; 
 +$b $a . ' B ' . 3; 
 +is_literal($b); // true, ideally (more details below)
  
-==== SQL Injection, Basic ====+is_literal($_GET['id']); // false
  
-A simple example:+is_literal(rand(0, 10)); // false
  
-  $sql = 'SELECT * FROM table WHERE id = ?'+is_literal(sprintf('LIMIT %d', 3)); // false
-   +
-  $result = $db->exec($sql[$id]);+
  
-Checked in the framework by:+$c = count($ids); 
 +$a = 'WHERE id IN (' . implode(',', array_fill(0, $c, '?')) . ')'; 
 +is_literal($a); // true, the one exception that involves functions. [TODOthis exception is controversial] 
 +</code>
  
-  class db { +Note that there is no way to manually mark a string as "safe" (i.e. no equivalent to ''untaint()''); as soon as the value has been manipulated in any way, it is no longer marked as a literal.
-   +
-    public function exec($sql, $parameters = []) { +
-   +
-      if (!is_literal($sql)) +
-        throw new Exception('SQL must be a literal.'); +
-      } +
-   +
-      $statement = $this->pdo->prepare($sql); +
-      $statement->execute($parameters); +
-      return $statement->fetchAll(); +
-   +
-    } +
-   +
-  }+
  
-It will also work with string concatenation: 
  
-  define('TABLE', 'example'); +===== Implementation Notes =====
-   +
-  $sql 'SELECT * FROM ' . TABLE . ' WHERE id ?'; +
-   +
-    is_literal($sql); // Returns true +
-   +
-  $sql .' AND id ' . mysqli_real_escape_string($db, $_GET['id']); +
-   +
-    is_literal($sql); // Returns false+
  
-==== SQL Injection, ORDER BY ====+(Most of what's in this section probably doesn't need to be in the final RFC.)
  
-To ensure //ORDER BY// can be set via the userbut only use acceptable values:+Ideally string concatenation would be allowed, but [[https://github.com/Danack/RfcLiteralString/issues/5|Danack]] suggested this might raise performance concernsand an array implode like function could be used instead (or a query builder).
  
-  $order_fields = [ +Thanks to [[https://chat.stackoverflow.com/transcript/message/51565346#51565346|NikiC]], it looks like we can reuse the GC_PROTECTED flag for strings
-      'name', +
-      'created', +
-      'admin', +
-    ]; +
-   +
-  $order_id = array_search(($_GET['sort'?? NULL)$order_fields); +
-   +
-  $sql = ' ORDER BY ' $order_fields[$order_id];+
  
-==== SQL InjectionWHERE IN ====+As an aside[[https://news-web.php.net/php.internals/87396|Xinchen Hui]] found the Taint extension was complex in PHP5, but "with PHP7's new zend_string, and string flags, the implementation will become easier". Also, [[https://chat.stackoverflow.com/transcript/message/48927813#48927813|MarkR]] suggested that it might be possible to use the fact that "interned strings in PHP have a flag", which is there because these "can't be freed".
  
-Most SQL strings can be a concatenations of literal values, but //WHERE x IN (?,?,?)// need to use a variable number of literal placeholders.+===== Comparison to Taint Tracking =====
  
-So there //might// need to be a special case for //array_fill()//+//implode()// or //str_repeat()//+//substr()// to create something like '?,?,?'+Some languages implement a "taint flag" which tracks whether values are considered "safe". There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[rfc/taint|a previous RFC proposing it be added to the language]].
  
-  $in_sql = implode(','array_fill(0count($ids), '?')); +These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theorybut the operation of escaping functionsand the context for which their output is safeare very hard to define. This leads to a feature that is both complex and unreliable.
-   +
-  // or +
-   +
-  $in_sql = substr(str_repeat('?,', count($ids)), 0, -1);+
  
-To be used with:+The current proposal avoids this complexity by addressing a different part of the problemseparating inputs supplied by the programmer from inputs supplied by the user.
  
-  $sql 'SELECT * FROM table WHERE id IN (' . $in_sql . ')';+===== Previous Work =====
  
-==== SQL InjectionORM Usage ====+Google currently uses a [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#go-implementation|similar approach in Go]] which uses "compile time constants"[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#perl-implementation|Perl has a Taint Mode]] (but uses regular expressions to un-taint data), and there are discussions about [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#javascript-implementation|adding it to JavaScript]] to support Trusted Types.
  
-[[https://www.doctrine-project.org/projects/doctrine-orm/en/2.7/reference/query-builder.html#high-level-api-methods|Doctrine]] could use this to ensure //$predicates// is a literal:+As noted be [[https://news-web.php.net/php.internals/109192|Tyson Andre]], it might be possible to use static analysis, for example [[https://psalm.dev/|psalm]]. But I can't find any which do these checks by default, [[https://github.com/vimeo/psalm/commit/2122e4a1756dac68a83ec3f5abfbc60331630781|can be incomplete]], they are likely to miss things (especially at runtime), and we can't expect all programmers to use static analysis (especially those who are new to programming, who need this more than developers who know the concepts and just make the odd mistake).
  
-  $users = $queryBuilder +And there is the [[https://wiki.php.net/rfc/sql_injection_protection|Automatic SQL Injection Protection]] RFC by Matt Tait, where this RFC uses a similar concept of the [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]]. When Matt's RFC was being discussed, it was noted:
-    ->select('u'+
-    ->from('User', 'u'+
-    ->where('u.id = ' $_GET['id']+
-    ->getQuery() +
-    ->getResult(); +
-   +
-  // example.php?id=u.id+
  
-Where this mistake could be identified by:+  * "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]]); 
 +  * this amount of work isn't ideal for "just for one use case" ([[https://news-web.php.net/php.internals/87647|Julien Pauli]]); 
 +  * It would have effected every SQL function, such as //mysqli_query()//, //$pdo->query()//, //odbc_exec()//, etc (concerns raised by [[https://news-web.php.net/php.internals/87436|Lester Caine]] and [[https://news-web.php.net/php.internals/87650|Anthony Ferrara]]); 
 +  * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).
  
-  public function where($predicates) +I also agree that "SQL injection is almost solved problem [by usingprepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), but we still //is_literal()// to identify mistakes.
-  { +
-      if (!is_literal($predicates)) { +
-          throw new Exception('Can only accept a literal'); +
-      } +
-      ... +
-  } +
- +
-[[https://redbeanphp.com/index.php?p=/finding|RedBean]] could check //$sql// is a literal: +
- +
-  $users = R::find('user', 'id = ' . $_GET['id']); +
- +
-[[http://propelorm.org/Propel/reference/model-criteria.html#relational-api|PropelORM]] could check //$clause// is a literal: +
- +
-  $users = UserQuery::create()->where('id = ' . $_GET['id'])->find(); +
- +
-==== SQL Injection, ORM Internal ==== +
- +
-The //is_literal()// function could be used by ORM developers, so they can be sure they have created an SQL string out of literals. +
- +
-This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https://framework.zend.com/security/advisory/ZF2014-04|1]]/[[https://framework.zend.com/security/advisory/ZF2016-03|2]]+
- +
-==== CLI Injection ==== +
- +
-Rather than using functions such as: +
- +
-  * //exec()// +
-  * //shell_exec()// +
-  * //system()// +
-  * //passthru()// +
- +
-Frameworks (or PHP) could introduce something similar to //pcntl_exec()//where arguments are provided separately. +
- +
-Or, take a verified literal for the command, and use parameters for the arguments (like SQL): +
- +
-  $output = parameterised_exec('grep ? /path/to/file | wc -l', [ +
-      'example', +
-    ]); +
- +
-Rough implementation: +
- +
-  function parameterised_exec($cmd, $args = []) { +
-   +
-    if (!is_literal($cmd)) { +
-      throw new Exception('The first argument must be a literal'); +
-    } +
-   +
-    $offset = 0; +
-    $k = 0; +
-    while (($pos = strpos($cmd, '?', $offset)) !== false) { +
-      if (!isset($args[$k])) { +
-        throw new Exception('Missing parameter "' . ($k + 1) . '"'); +
-        exit(); +
-      } +
-      $arg = escapeshellarg($args[$k]); +
-      $cmd = substr($cmd, 0, $pos) . $arg . substr($cmd, ($pos + 1)); +
-      $offset = ($pos + strlen($arg)); +
-      $k++; +
-    } +
-    if (isset($args[$k])) { +
-      throw new Exception('Unused parameter "' . ($k + 1) . '"'); +
-      exit(); +
-    } +
-   +
-    return exec($cmd); +
-   +
-  } +
- +
-==== HTML Injection ==== +
- +
-Template engines should receive variables separately from the raw HTML. +
- +
-Often the engine will get the HTML from static files: +
- +
-  $html = file_get_contents('/path/to/template.html'); +
- +
-But small snippets of HTML are often easier to define as a literal within the PHP script: +
- +
-  $template_html = ' +
-    <p>Hello <span id="username"></span></p> +
-    <p><a>Website</a></p>'; +
- +
-Where the variables are supplied separately, in this example I'm using XPaths: +
- +
-  $values = [ +
-      '//span[@id="username"]' => [ +
-          NULL      => 'Name', // The textContent +
-          'class'   => 'admin', +
-          'data-id' => '123', +
-        ], +
-      '//a' => [ +
-          'href' => 'https://example.com', +
-        ], +
-    ]; +
-   +
-  echo template_parse($template_html, $values); +
- +
-Being sure the HTML does not contain unsafe variables, the templating engine can accept and apply the supplied variables for the relevant context, for example: +
- +
-  function template_parse($html, $values) { +
-   +
-    if (!is_literal($html)) { +
-      throw new Exception('Invalid Template HTML.'); +
-    } +
-   +
-    $dom = new DomDocument(); +
-    $dom->loadHTML('<?xml encoding="UTF-8">' . $html); +
-   +
-    $xpath = new DOMXPath($dom); +
-   +
-    foreach ($values as $query => $attributes) { +
-   +
-      if (!is_literal($query)) { +
-        throw new Exception('Invalid Template XPath.'); +
-      } +
-   +
-      foreach ($xpath->query($query) as $element) { +
-        foreach ($attributes as $attribute => $value) { +
-   +
-          if (!is_literal($attribute)) { +
-            throw new Exception('Invalid Template Attribute.'); +
-          } +
-   +
-          if ($attribute) { +
-            $safe = false; +
-            if ($attribute == 'href') { +
-              if (preg_match('/^https?:\/\//', $value)) { +
-                $safe = true; // Not "javascript:..." +
-              } +
-            } else if ($attribute == 'class') { +
-              if (in_array($value, ['admin', 'important'])) { +
-                $safe = true; // Only allow specific classes? +
-              } +
-            } else if (preg_match('/^data-[a-z]+$/', $attribute)) { +
-              if (preg_match('/^[a-z0-9 ]+$/i', $value)) { +
-                $safe = true; +
-              } +
-            } +
-            if ($safe) { +
-              $element->setAttribute($attribute, $value); +
-            } +
-          } else { +
-            $element->textContent = $value; +
-          } +
-   +
-        } +
-      } +
-   +
-    } +
-   +
-    $html = ''; +
-    $body = $dom->documentElement->firstChild; +
-    if ($body->hasChildNodes()) { +
-      foreach ($body->childNodes as $node) { +
-        $html .= $dom->saveXML($node); +
-      } +
-    } +
-   +
-    return $html; +
-   +
-  }+
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
  
-Not sure+None
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
  
-PHP 8?+PHP 8.1?
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 341: Line 138:
 ===== Open Issues ===== ===== Open Issues =====
  
-  Can //array_fill()//+//implode()// or //str_repeat()//+//substr()// pass though the "is_literal" flag for the "WHERE IN" case? +On [[https://github.com/craigfrancis/php-is-literal-rfc/issues|GitHub]]: 
-  - Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), won't be able to use this check, as originally noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]].+ 
 +  - Name it something else? [[https://news-web.php.net/php.internals/109197|Jakob Givoni]] suggested //is_from_literal()//; or maybe //is_safe()//
 +  - Would this cause performance issues? 
 +  - Can //array_fill()//+//implode()// pass though the "is_literal" flag for the "WHERE IN" case? 
 +  - Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), they might need to make some changes to use this check, as originally noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]].
  
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
Line 350: Line 151:
 ===== Future Scope ===== ===== Future Scope =====
  
-Certain functions (mysqli_query, preg_match, etc) might use this information to generate error/warning/notice.+As noted by [[https://chat.stackoverflow.com/transcript/message/51573226#51573226|MarkR]], the biggest benefit will come when it can be used by PDO and similar functions (//mysqli_query////preg_match//, //exec//, etc). But the basic idea can be used immediately by frameworks and general abstraction libraries, and they can give feedback for future work. 
 + 
 +**Phase 2** could introduce a way for programmers to specify that certain function arguments only accept safe literals, and/or specific value-objects their project trusts (this idea comes from [[https://web.dev/trusted-types/|Trusted Types]] in JavaScript). 
 + 
 +For example, project could require the second argument for //pg_query()// only accept literals or their //query_builder// object (which provides a //__toString// method); and that any output (print, echo, readfile, etc) must use the //html_output// object that's returned by their trusted HTML Templating system (using //ob_start()// might be useful here). 
 + 
 +**Phase 3** could set a default of 'only literals' for all of the relevant PHP function arguments, so developers are given a warning, and later prevented (via an exception), when they provide an unsafe value to those functions (they could still specify that unsafe values are allowed, e.g. phpMyAdmin). 
 + 
 +And, for a bit of silliness (Spaß ist verboten), there could be a //is_figurative()// function, which MarkR seems to [[https://chat.stackoverflow.com/transcript/message/48927770#48927770|really]], [[https://chat.stackoverflow.com/transcript/message/51573091#51573091|want]] :-)
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
  
-Not sure+N/A
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
  
-volunteer is needed to help with implementation.+N/A
  
 ===== Implementation ===== ===== Implementation =====
  
-N/A+[[https://github.com/Danack/|Danack]] has [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|started an implementation]].
  
 ===== References ===== ===== References =====
  
-- https://wiki.php.net/rfc/sql_injection_protection+N/A
  
 ===== Rejected Features ===== ===== Rejected Features =====
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis