rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
rfc:is_literal [2020/12/26 11:41] – New examples, with a focus on escaping craigfrancisrfc:is_literal [2021/04/19 13:44] – Updated examples, and general tweaks craigfrancis
Line 1: Line 1:
 ====== PHP RFC: Is Literal Check ====== ====== PHP RFC: Is Literal Check ======
  
-  * Version: 0.2+  * Version: 0.5
   * Date: 2020-03-21   * Date: 2020-03-21
-  * Updated: 2020-12-22+  * Updated: 2021-04-19
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
   * Status: Draft   * Status: Draft
Line 11: Line 11:
 ===== Introduction ===== ===== Introduction =====
  
-Add an //is_literal()// functionso developers/frameworks can check if given variable is **safe**.+This RFC proposes a new function, //is_literal(string $string)//, to help enforce separation of hard-coded values, from user-supplied data.
  
-As inat runtimebeing able to check if variable has been created by literals, defined within a PHP script, by a trusted developer.+This addresses some of the same use cases as "taint flags"but is both simpler and stricter. It does not address how user data is transmitted or escapedonly whether it has been passed to a particular library function separately from the programmer defined values.
  
-This simple check can be used to warn or completely block SQL InjectionCommand Line Injection, and many cases of HTML Injection (aka XSS).+The clearest example is a database library which supports parametrised queries at the driver levelwhere the programmer could use either of these:
  
-===== The Problem =====+<code php> 
 +$db->query('SELECT * FROM users WHERE id ' . $_GET['id']); // INSECURE
  
-Escaping strings for SQLHTML, Commands, etc is **very** error prone.+$db->query('SELECT * FROM users WHERE id = ?'[$_GET['id']]); 
 +</code>
  
-The vast majority of programmers should never do this (mistakes will be made).+By rejecting the SQL that was not written as a literal (first example), the library can provide protection against this incorrect use.
  
-Unsafe values (often user supplied) **must** be kept separate (e.g. parameterised SQL), or be processed by something that understands the context (e.g. a HTML Templating Engine).+===== Examples =====
  
-This is primarily for security reasons, but it can also cause data to be damaged (e.gASCII/UTF-8 issues).+The [[https://www.doctrine-project.org/projects/doctrine-orm/en/current/reference/query-builder.html#high-level-api-methods|Doctrine Query Builder]] allows custom WHERE clauses to be provided as strings. This is intended for use with literals and placeholders, but does not protect against this simple mistake:
  
-Take these mistakes, where the value has come from the user:+<code php> 
 +// INSECURE 
 +$qb->select('u'
 +   ->from('User''u'
 +   ->where('u.id = ' . $_GET['id']) 
 +</code>
  
-  echo "<img src=" . $url . " alt='' />";+The definition of the //where()// method could check with //is_literal()// and throw an exception advising the programmer to replace it with a safer use of placeholders:
  
-Flawedand unfortunately very common, a classic XSS vulnerability.+<code php> 
 +$qb->select('u'
 +   ->from('User''u'
 +   ->where('u.id = :identifier'
 +   ->setParameter('identifier', $_GET['id']); 
 +</code>
  
-  echo "<img src=" htmlentities($url) " alt='' />";+Similarly, Twig allows [[https://twig.symfony.com/doc/2.x/recipes.html#loading-a-template-from-a-string|loading a template from a string]], which could allow accidentally skipping the default escaping functionality:
  
-Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)'//+<code php> 
 +// INSECURE 
 +echo $twig->createTemplate('<p>Hi ' . $_GET['name'] . '</p>')->render()
 +</code>
  
-  echo "<img src='" . htmlentities($url. "' alt='' />";+If //createTemplate()// checked with //is_literal()//, the programmer could be advised to write this instead:
  
-Flawed because //htmlentities()// does not encode single quotes by default, e.g. //$url = "/' onerror='alert(1)"//+<code php> 
 +echo $twig->createTemplate('<p>Hi {{ name }}</p>')->render(['name' => $_GET['name']])
 +</code>
  
-  echo '<a href="' . htmlentities($url) . '">Link</a>';+===== Proposal =====
  
-Flawed because link can include JavaScript, e.g. //$url = 'javascript:alert(1)'//+A literal is defined as value (stringwhich has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way other than string concatenation.
  
-  <script+<code php
-    var url = "<?= addslashes($url?>"; +is_literal('Example'); // true
-  </script>+
  
-Flawed because //addslashes()// is not HTML context aware, e.g. //$url = '</script><script>alert(1)</script>'//+$= 'Example'; 
 +is_literal($a)// true
  
-  echo '<a href="/path/?name=' . htmlentities($name) '">Link</a>';+is_literal(4); // true 
 +is_literal(0.3); // true 
 +is_literal('a' . 'b')// true, compiler can concatenate
  
-Flawed because //urlencode()// has not been used, e.g. //$name = 'A&B'//+$= 'A'; 
 +$b = $a . ' B ' . 3; 
 +is_literal($b); // true, ideally (more details below)
  
-  <p><?= htmlentities($url?></p>+is_literal($_GET['id'])// false
  
-Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4)so the value could be corrupted.+is_literal(rand(010)); // false
  
-Also flawed because some browsers (e.g. IE 11)if the charset isn't defined (header or meta tag), could guess the output as UTF-7, e.g. //$url = '+ADw-script+AD4-alert(1)+ADw-+AC8-script+AD4-'//+is_literal(sprintf('LIMIT %d'3)); // false, should use parameters 
 +</code>
  
-  example.html: +Note that there is no way to manually mark a string as a literal (i.e. no equivalent to //untaint()//); as soon as the value has been manipulated in any wayit is no longer marked as a literal.
-      <img src={{ url }} alt='' /+
-   +
-  $loader = new \Twig\Loader\FilesystemLoader('./templates/'); +
-  $twig = new \Twig\Environment($loader['autoescape' => 'name']); +
-   +
-  echo $twig->render('example.html', ['url' => $url]);+
  
-Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = 'onerror=alert(1)'//+See the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md|justification page]] as to why it's done this way.
  
-  $sql 'SELECT 1 FROM user WHERE id=' . $mysqli->escape_string($id);+===== Comparison to Taint Tracking =====
  
-Flawed because the value has not been quoted, e.g. //$id = 'id'or '1 OR 1=1'//+Some languages implement a "taint flag" which tracks whether values are considered "safe"There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Huiand [[https://wiki.php.net/rfc/taint|a previous RFC proposing it be added to the language]].
  
-  $sql = 'SELECT 1 FROM user WHERE id="' $mysqli->escape_string($id) '"';+These solutions rely on the assumption that the output of an escaping function is safe for a particular contextThis sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable.
  
-Flawed if 'sql_mode' includes //NO_BACKSLASH_ESCAPES//e.g. //$id = '2" or "1"="1'//+This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmerfrom inputs supplied by the user.
  
-  $sql 'INSERT INTO user (name) VALUES ("' . $mysqli->escape_string($name) . '")';+===== Previous Work =====
  
-Flawed if 'SET NAMES latin1' has been usedand escape_string(uses 'utf8'.+Google uses a [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#go-implementation|similar approach in Go]] to identify "compile time constants"[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#perl-implementation|Perl has a Taint Mode]] (but uses regular expressions to un-taint data), and there are discussions about [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#javascript-implementation|adding it to JavaScript]] to support Trusted Types.
  
-  $parameters = "-f$email"; +As noted by [[https://news-web.php.net/php.internals/109192|Tyson Andre]], it might be possible to use static analysis, for example [[https://psalm.dev/|psalm]]. But I can't find any which do these checks by default[[https://github.com/vimeo/psalm/commit/2122e4a1756dac68a83ec3f5abfbc60331630781|can be incomplete]]they are likely to miss things (especially at runtime), and we can't expect all programmers to use static analysis (especially those who are new to programmingwho need this more than developers who know the concepts and just make the odd mistake).
-   +
-  // $parameters = '-f' escapeshellarg($email); +
-   +
-  mail('a@example.com', 'Subject', 'Message'NULL, $parameters);+
  
-Flawed because it's not possible to safely escape values in //$additional_parameters// for //mail()//, e.g. //$email = 'b@example.com -X/www/example.php'// +And there is the [[https://wiki.php.net/rfc/sql_injection_protection|Automatic SQL Injection Protection]] RFC by Matt Taitwhere this RFC uses a similar concept of the [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]]. When Matt's RFC was being discussed, it was noted:
- +
-===== Previous Solutions ===== +
- +
-[[https://github.com/laruence/taint|Taint extension]] by Xinchen Huibut this approach explicitly allows escaping, which doesn't address the issues listed above. +
- +
-[[https://wiki.php.net/rfc/sql_injection_protection|Automatic SQL Injection Protection]] by Matt Taitwhere it was noted:+
  
   * "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]]);   * "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]]);
Line 102: Line 108:
   * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).   * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).
  
-I also agree that "SQL injection is almost a solved problem [by using] prepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), but we still need something to identify mistakes. +I also agree that "SQL injection is almost a solved problem [by using] prepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), and this is where //is_literal()// can be used to check that no mistakes are made.
- +
-===== Related JavaScript Implementation ===== +
- +
-This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types. +
- +
-https://github.com/tc39/proposal-array-is-template-object\\ +
-https://github.com/mikewest/tc39-proposal-literals +
- +
-They are looking at "Distinguishing strings from a trusted developer, from strings that may be attacker controlled"+
- +
-===== Solution ===== +
- +
-Literals are safe values, defined within the PHP scripts, for example: +
- +
-  $a = 'Example'; +
-  is_literal($a)// true +
-   +
-  $a = 'Example ' . $a . ', ' . 5; +
-  is_literal($a); // true +
-   +
-  $a = 'Example ' . $_GET['id']; +
-  is_literal($a); // false +
-   +
-  $a = 'Example ' . time(); +
-  is_literal($a); // false +
-   +
-  $a = sprintf('LIMIT %d', 3); +
-  is_literal($a); // false +
-   +
-  $c = count($ids); +
-  $a = 'WHERE id IN (' . implode(',', array_fill(0, $c, '?')) . ')'; +
-  is_literal($a); // true, the odd one that involves functions. +
-   +
-  $limit = 10; +
-  $a = 'LIMIT ' . ($limit + 1); +
-  is_literal($a); // false, but might need some discussion. +
- +
-This uses a similar definition of [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]] from Matt Tait's RFC, but it does not need to accept Integer or FloatingPoint variables as safe (unless it makes the implementation easier), nor should this proposal effect any existing functions. +
- +
-Thanks to [[https://news-web.php.net/php.internals/87396|Xinchen Hui]], we know the PHP5 Taint extension was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier"+
- +
-And thanks to [[https://chat.stackoverflow.com/transcript/message/48927813#48927813|Mark R]], it might be possible to use the fact that "interned strings in PHP have a flag", which is there because these "can'be freed"+
- +
-Commands can be checked to ensure they are a "programmer supplied constant/static/validated string", and all other unsafe variables are provided separately (as noted by [[https://news-web.php.net/php.internals/87725|Yasuo Ohgaki]]). +
- +
-This approach allows all systems/frameworks to decide if they want to **block**, **educate** (via a notice), or **ignore** these issues (to avoid the "don't nanny" concern raised by [[https://news-web.php.net/php.internals/87383|Lester Caine]]). +
- +
-Unlike the Taint extension, there must **not** be an equivalent //untaint()// function, or support any kind of escaping. +
- +
-==== Solution: SQL Injection ==== +
- +
-Database abstractions (e.g. ORMs) will be able to ensure they are provided with strings that are safe. +
- +
-[[https://www.doctrine-project.org/projects/doctrine-orm/en/2.7/reference/query-builder.html#high-level-api-methods|Doctrine]] could use this to ensure //->where($predicates)// is a literal: +
- +
-  $users = $queryBuilder +
-    ->select('u'+
-    ->from('User', 'u'+
-    ->where('u.id = ' . $_GET['id']) +
-    ->getQuery() +
-    ->getResult(); +
-   +
-  // example.php?id=u.id +
- +
-This mistake can be easily identified by: +
- +
-  public function where($predicates) +
-  { +
-      if (function_exists('is_literal') && !is_literal($predicates)) { +
-          throw new Exception('->where() can only accept a literal'); +
-      } +
-      ... +
-  } +
- +
-[[https://redbeanphp.com/index.php?p=/finding|RedBean]] could check //$sql// is a literal: +
- +
-  $users = R::find('user', 'id = ' . $_GET['id']); +
- +
-[[http://propelorm.org/Propel/reference/model-criteria.html#relational-api|PropelORM]] could check //$clause// is a literal: +
- +
-  $users = UserQuery::create()->where('id = ' . $_GET['id'])->find(); +
- +
-The //is_literal()// function could also be used internally by ORM developers, so they can be sure they have created their SQL strings out of literals. This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https://framework.zend.com/security/advisory/ZF2014-04|1]]/[[https://framework.zend.com/security/advisory/ZF2016-03|2]]. +
- +
-==== Solution: SQL Injection, Basic ==== +
- +
-A simple example:+
  
-  $sql 'SELECT * FROM table WHERE id ?'; +===== Usage =====
-   +
-  $result $db->exec($sql, [$id]);+
  
-Checked in the framework by:+By libraries:
  
-  class db { +<code php> 
-   +function literal_check($var) { 
-    public function exec($sql, $parameters = []) { +  if (function_exists('is_literal') && !is_literal($var)) { 
-   +    $level = 2; // Get from config, defaults to 1
-      if (!is_literal($sql)) { +    if ($level === 0) { 
-        throw new Exception('SQL must be a literal.'); +      // Programmer aware, and is choosing to bypass this check. 
-      } +    } else if ($level === 1{ 
-   +      trigger_error('Non-literal detected!', E_USER_NOTICE); 
-      $statement $this->pdo->prepare($sql); +    } else { 
-      $statement->execute($parameters); +      throw new Exception('Non-literal detected!');
-      return $statement->fetchAll(); +
-  +
     }     }
-   
   }   }
 +}
  
-This also works with string concatenation:+function example($input) { 
 +  literal_check($input); 
 +  // ... 
 +}
  
-  define('TABLE', 'example'); +example('hello'); // OK 
-   +example(strtoupper('hello')); // Exception thrown: the result of strtoupper is a new, non-literal string 
-  $sql = 'SELECT * FROM ' . TABLE . ' WHERE id = ?'; +</code>
-   +
-    is_literal($sql); // Returns true +
-   +
-  $sql .= ' AND id = ' . $mysqli->escape_string($_GET['id'])+
-   +
-    is_literal($sql); // Returns false+
  
-==== Solution: SQL Injection, ORDER BY ====+Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//:
  
-To ensure //ORDER BY// can be set via the userbut only use acceptable values:+<code php> 
 +$order_fields = [ 
 +    'name', 
 +    'created', 
 +    'admin', 
 +  ];
  
-  $order_fields = [ +$order_id = array_search(($_GET['sort'] ?? NULL), $order_fields);
-      'name', +
-      'created', +
-      'admin', +
-    ]; +
-   +
-  $order_id = array_search(($_GET['sort'] ?? NULL), $order_fields)+
-   +
-  $sql = ' ORDER BY ' . $order_fields[$order_id];+
  
-==== Solution: SQL Injection, WHERE IN ====+$sql ' ORDER BY ' . $order_fields[$order_id]; 
 +</code>
  
-Most SQL strings can be a simple concatenations of literal values, but //WHERE IN (?,?,?)// needs to use a variable number of literal placeholders.+Undefined number of parameters; for example //WHERE IN//:
  
-There needs to be a special case for //array_fill()//+//implode()//, so the //is_literal// state can be preserved, allowing us to create the safe literal string '?,?,?': +<code php> 
- +function where_in_sql($count) // Should check for 0 
-  $in_sql = implode(',', array_fill(0, count($ids), '?')); +  $sql = '?'; 
-   +  for ($k = 1; $k $count; $k++) { 
-  $sql = 'SELECT * FROM table WHERE id IN (' . $in_sql . ')'; +    $sql .= ',?';
- +
-==== Solution: CLI Injection ==== +
- +
-Rather than using functions such as: +
- +
-  * //exec()// +
-  * //shell_exec()// +
-  * //system()// +
-  * //passthru()// +
- +
-Frameworks (or PHP) could introduce something similar to //pcntl_exec()//, where arguments are provided separately. +
- +
-Or, take a safe literal for the command, and use parameters for the arguments (like SQL does): +
- +
-  $output parameterised_exec('grep /path/to/file | wc -l', [ +
-      'example', +
-    ]); +
- +
-Rough implementation: +
- +
-  function parameterised_exec($cmd, $args = []) { +
-   +
-    if (!is_literal($cmd)) { +
-      throw new Exception('The first argument must be a literal'); +
-    } +
-   +
-    $offset = 0; +
-    $k = 0; +
-    while (($pos = strpos($cmd, '?', $offset)) !== false) { +
-      if (!isset($args[$k])) { +
-        throw new Exception('Missing parameter "' . ($k + 1) . '"'); +
-        exit(); +
-      } +
-      $arg = escapeshellarg($args[$k]); +
-      $cmd = substr($cmd, 0, $pos) . $arg . substr($cmd, ($pos + 1)); +
-      $offset = ($pos + strlen($arg)); +
-      $k+++
-    } +
-    if (isset($args[$k])) { +
-      throw new Exception('Unused parameter "' . ($k + 1) . '"'); +
-      exit(); +
-    +
-   +
-    return exec($cmd); +
-   +
-  } +
- +
-==== Solution: HTML Injection ==== +
- +
-Template engines should receive variables separately from the raw HTML. +
- +
-Often the engine will get the HTML from static files (safe): +
- +
-  $html file_get_contents('/path/to/template.html'); +
- +
-But small snippets of HTML are often easier to define as a literal within the PHP script: +
- +
-  $template_html = ' +
-    <p>Hello <span id="username"></span></p> +
-    <p><a>Website</a></p>'; +
- +
-Where the variables are supplied separatelyin this example I'm using XPath: +
- +
-  $values = [ +
-      '//span[@id="username"]' => [ +
-          NULL      => 'Name', // The textContent +
-          'class'   => 'admin', +
-          'data-id' => '123', +
-        ], +
-      '//a' => [ +
-          'href' => 'https://example.com', +
-        ], +
-    ]; +
-   +
-  echo template_parse($template_html, $values); +
- +
-The templating engine can then accept and apply the supplied variables for the relevant context. +
- +
-As a simple example, this can be done with: +
- +
-  function template_parse($html, $values) { +
-   +
-    if (!is_literal($html)) { +
-      throw new Exception('Invalid Template HTML.'); +
-    } +
-   +
-    $dom = new DomDocument(); +
-    $dom->loadHTML('<?xml encoding="UTF-8">. $html); +
-   +
-    $xpath = new DOMXPath($dom); +
-   +
-    foreach ($values as $query => $attributes) { +
-   +
-      if (!is_literal($query)) { +
-        throw new Exception('Invalid Template XPath.'); +
-      } +
-   +
-      foreach ($xpath->query($query) as $element) { +
-        foreach ($attributes as $attribute => $value) { +
-   +
-          if (!is_literal($attribute)) { +
-            throw new Exception('Invalid Template Attribute.'); +
-          } +
-   +
-          if ($attribute) { +
-            $safe = false; +
-            if ($attribute == 'href') { +
-              if (preg_match('/^https?:\/\//', $value)) { +
-                $safe = true; // Not "javascript:..." +
-              } +
-            } else if ($attribute == 'class') { +
-              if (in_array($value, ['admin', 'important'])) { +
-                $safe = true; // Only allow specific classes? +
-              } +
-            } else if (preg_match('/^data-[a-z]+$/', $attribute)) { +
-              if (preg_match('/^[a-z0-9 ]+$/i', $value)) { +
-                $safe = true; +
-              } +
-            } +
-            if ($safe) { +
-              $element->setAttribute($attribute, $value); +
-            } +
-          } else { +
-            $element->textContent = $value; +
-          } +
-   +
-        } +
-      } +
-   +
-    } +
-   +
-    $html = ''; +
-    $body = $dom->documentElement->firstChild; +
-    if ($body->hasChildNodes()) { +
-      foreach ($body->childNodes as $node) { +
-        $html .= $dom->saveXML($node); +
-      } +
-    } +
-   +
-    return $html; +
-  +
   }   }
 +  return $sql;
 +}
 +$sql = 'WHERE id IN (' . where_in_sql(count($ids)) . ')';
 +</code>
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
Line 417: Line 190:
 On [[https://github.com/craigfrancis/php-is-literal-rfc/issues|GitHub]]: On [[https://github.com/craigfrancis/php-is-literal-rfc/issues|GitHub]]:
  
-  - Would this cause performance issuesPresumably not as bad a type checking. +  - Name it something else[[https://news-web.php.net/php.internals/109197|Jakob Givoni]] suggested //is_from_literal()//. 
-  - Can //array_fill()//+//implode()// pass though the "is_literal" flag for the "WHERE IN" case? +  - Would this cause performance issues[[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/001.phpt|basic string concat test]], just focusing on string concat (worst case scenario), shows a 1.3% increase in processing time (1.341s to 1.358s = +0.017s). 
-  - Should the function be named //is_from_literal()//(suggestion from [[https://news-web.php.net/php.internals/109197|Jakob Givoni]]) +  - Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), they might need to make some changes to use this check, as originally noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]].
-  - Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), so they might need to make some changes to use this check, as originally noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]]+
- +
-===== Alternatives ===== +
- +
-  - The current Taint Extension (notes above) +
-  - Using static analysis (not at runtime), for example [[https://psalm.dev/|psalm]] (thanks [[https://news-web.php.net/php.internals/109192|Tyson Andre]]). But I can't find any which do these checks by default (if they even try), and we can't expect all programmers to use static analysis (especially those who have just stated).+
  
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
Line 433: Line 200:
 ===== Future Scope ===== ===== Future Scope =====
  
-Certain functions (//mysqli_query//, //preg_match//, etc) could use this information to generate a error/warning/notice.+As noted by [[https://chat.stackoverflow.com/transcript/message/51573226#51573226|MarkR]], the biggest benefit will come when it can be used by PDO and similar functions (//mysqli_query//, //preg_match//, //exec//, etc). But the basic idea can be used immediately by frameworks and general abstraction libraries, and they can give feedback for future work.
  
-PHP could also have mode where output (e.g. //echo '<html>'//) is blocked, and this can be bypassed (maybe via //ini_set//) when the HTML Templating Engine has created the correctly encoded output.+**Phase 2** could introduce way for programmers to specify that certain function arguments only accept safe literals, and/or specific value-objects their project trusts (this idea comes from [[https://web.dev/trusted-types/|Trusted Types]] in JavaScript). 
 + 
 +For example, a project could require the second argument for //pg_query()// only accept literals or their //query_builder// object (which provides a //__toString// method); and that any output (print, echo, readfile, etc) must use the //html_output// object that's returned by their trusted HTML Templating system (using //ob_start()// might be useful here). 
 + 
 +**Phase 3** could set a default of 'only literals' for all of the relevant PHP function arguments, so developers are given a warning, and later prevented (via an exception), when they provide an unsafe value to those functions (they could still specify that unsafe values are allowed, e.g. phpMyAdmin). 
 + 
 +And, for a bit of silliness (Spaß ist verboten), there could be a //is_figurative()// function, which MarkR seems to [[https://chat.stackoverflow.com/transcript/message/48927770#48927770|really]], [[https://chat.stackoverflow.com/transcript/message/51573091#51573091|want]] :-)
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
Line 443: Line 216:
 ===== Patches and Tests ===== ===== Patches and Tests =====
  
-volunteer is needed to help with implementation.+N/A
  
 ===== Implementation ===== ===== Implementation =====
 +
 +Joe Watkins has [[https://github.com/php/php-src/compare/master...krakjoe:literals|created an implementation]] which includes string concat. While the performance impact needs to be considered, this would provide the easiest solution for projects already using string concat for their parameterised SQL.
 +
 +Dan Ackroyd also [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|started an implementation]], which uses functions like [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two#diff-2b0486443df74cd919c949f33f895eacf97c34b8490e7554e032e770ab11e4d8R2761|literal_combine()]] to avoid performance concerns.
 +
 +===== References =====
  
 N/A N/A
Line 452: Line 231:
  
 N/A N/A
 +
 +===== Thanks =====
 +
 +  - **Dan Ackroyd**, DanAck, for surprising me with the first implementation, and getting the whole thing started.
 +  - **Joe Watkins**, krakjoe, for finding how to set the literal flag, and creating the implementation that supports string concat.
 +  - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries.
 +  - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the [[https://chat.stackoverflow.com/transcript/message/51565346#51565346|GC_PROTECTED flag for strings]], which allowed Dan to start the first implementation.
 +  - **MarkR, **for alternative ideas, and noting that "interned strings in PHP have a flag" [[https://chat.stackoverflow.com/transcript/message/48927813#48927813|source]], which started the conversation on how this could be implemented.
 +  - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].
  
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis