rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
rfc:is_literal [2020/03/23 20:23] – Add link to GitHub craigfrancisrfc:is_literal [2021/02/12 18:42] – Add notes about the implementation from Danack, GC Flags, and Future Scope craigfrancis
Line 1: Line 1:
 ====== PHP RFC: Is Literal Check ====== ====== PHP RFC: Is Literal Check ======
  
-  * Version: 0.1+  * Version: 0.2
   * Date: 2020-03-21   * Date: 2020-03-21
 +  * Updated: 2020-12-22
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
   * Status: Draft   * Status: Draft
Line 10: Line 11:
 ===== Introduction ===== ===== Introduction =====
  
-Add an //is_literal()// function, so developers/frameworks can be sure they are working with a safe value - one created from one or more literals, defined within PHP scripts.+Add an //is_literal()// function, so developers/frameworks can check if given variable is **safe**.
  
-This function would allows developers/frameworks, at runtime, to warn or block SQL InjectionCommand Line Injectionand many cases of HTML Injection.+As in, at runtime, being able to check if a variable has been created by literalsdefined within a PHP scriptby a trusted developer.
  
-Commands can then be tested to ensure they are a "programmer supplied constant/static/validated string", and all other unsafe variables are provided separately (as noted by [[https://news-web.php.net/php.internals/87725|Yasuo Ohgaki]]).+This simple check can be used to warn or completely block SQL Injection, Command Line Injection, and many cases of HTML Injection (aka XSS).
  
-This will also allow systems/frameworks to decide if they want to **block**, **educate** (via a notice), or **ignore** these issues (to avoid the "don't nanny" concern raised by [[https://news-web.php.net/php.internals/87383|Lester Caine]]).+===== The Problem =====
  
-Literals are values defined within the PHP scripts, for example:+Escaping strings for SQL, HTML, Commands, etc is **very** error prone.
  
-    $a = 'Hi'; +The vast majority of programmers should never do this (mistakes will be made). 
-    $= 'Example ' . $a; + 
-    is_literal($b); // Returns true +Unsafe values (often user supplied) **must** be kept separate (e.g. parameterised SQL), or be processed by something that understands the context (e.g. a HTML Templating Engine). 
-     + 
-    $= 'Example ' . $_GET['id']; +This is primarily for security reasons, but it can also cause data to be damaged (e.g. ASCII/UTF-8 issues). 
-    is_literal($c); // Returns false+ 
 +Take these mistakes, where the value has come from the user: 
 + 
 +<code php> 
 +echo "<img src=" . $url . " alt='' />"; 
 +</code> 
 + 
 +Flawed, and unfortunately very common, classic XSS vulnerability. 
 + 
 +<code php> 
 +echo "<img src=" . htmlentities($url) . " alt='' />"
 +</code> 
 + 
 +Flawed because the attribute value is not quoted, e.g. //$url = '/ onerror=alert(1)'// 
 + 
 +<code php> 
 +echo "<img src='" htmlentities($url) . "' alt='' />"; 
 +</code> 
 + 
 +Flawed because //htmlentities()// does not encode single quotes by default, e.g. //$url = "/' onerror='alert(1)"// 
 + 
 +<code php> 
 +echo '<href="' . htmlentities($url) . '">Link</a>'
 +</code> 
 + 
 +Flawed because a link can include JavaScript, e.g. //$url = 'javascript:alert(1)'// 
 + 
 +<code html> 
 +<script> 
 +  var url = "<?= addslashes($url?>"; 
 +</script> 
 +</code> 
 + 
 +Flawed because //addslashes()// is not HTML context aware, e.g. //$url = '</script><script>alert(1)</script>'// 
 + 
 +<code php> 
 +echo '<a href="/path/?name=' . htmlentities($name) . '">Link</a>'; 
 +</code> 
 + 
 +Flawed because //urlencode()// has not been used, e.g. //$name = 'A&B'// 
 + 
 +<code html> 
 +<p><?= htmlentities($url) ?></p> 
 +</code> 
 + 
 +Flawed because the encoding is not guaranteed to be UTF-8 (or ISO-8859-1 before PHP 5.4), so the value could be corrupted. 
 + 
 +Also flawed because some browsers (e.g. IE 11), if the charset isn't defined (header or meta tag), could guess the output as UTF-7, e.g. //$url = '+ADw-script+AD4-alert(1)+ADw-+AC8-script+AD4-'// 
 + 
 +<code php> 
 +example.html: 
 +    <img src={{ url }} alt='' /> 
 + 
 +$loader = new \Twig\Loader\FilesystemLoader('./templates/'); 
 +$twig = new \Twig\Environment($loader, ['autoescape' => 'name']); 
 + 
 +echo $twig->render('example.html', ['url' => $url]); 
 +</code> 
 + 
 +Flawed because Twig is not context aware (in this case, an unquoted HTML attribute), e.g. //$url = '/ onerror=alert(1)'// 
 + 
 +<code php> 
 +$sql = 'SELECT 1 FROM user WHERE id=. $mysqli->escape_string($id); 
 +</code> 
 + 
 +Flawed because the value has not been quoted, e.g. //$id = 'id', or '1 OR 1=1'// 
 + 
 +<code php> 
 +$sql = 'SELECT 1 FROM user WHERE id="' . $mysqli->escape_string($id) . '"'; 
 +</code> 
 + 
 +Flawed if 'sql_mode' includes //NO_BACKSLASH_ESCAPES//, e.g. //$id = '2" or "1"="1'// 
 + 
 +<code php> 
 +$sql = 'INSERT INTO user (name) VALUES ("' . $mysqli->escape_string($name) . '")'; 
 +</code> 
 + 
 +Flawed if 'SET NAMES latin1' has been used, and escape_string() uses 'utf8'
 + 
 +<code php> 
 +$parameters = "-f$email"; 
 + 
 +// $parameters = '-f' . escapeshellarg($email); 
 + 
 +mail('a@example.com', 'Subject', 'Message', NULL, $parameters); 
 +</code> 
 + 
 +Flawed because it's not possible to safely escape values in //$additional_parameters// for //mail()//, e.g. //$email = 'b@example.com -X/www/example.php'// 
 + 
 +===== Previous Solutions ===== 
 + 
 +[[https://github.com/laruence/taint|Taint extension]] by Xinchen Hui, but this approach explicitly allows escaping, which doesn't address the issues listed above. 
 + 
 +[[https://wiki.php.net/rfc/sql_injection_protection|Automatic SQL Injection Protection]] by Matt Tait, where it was noted: 
 + 
 +  * "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]])
 +  * this amount of work isn't ideal for "just for one use case" ([[https://news-web.php.net/php.internals/87647|Julien Pauli]]); 
 +  * It would have effected every SQL function, such as //mysqli_query()//, //$pdo->query()//, //odbc_exec()//, etc (concerns raised by [[https://news-web.php.net/php.internals/87436|Lester Caine]] and [[https://news-web.php.net/php.internals/87650|Anthony Ferrara]]); 
 +  * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]). 
 + 
 +I also agree that "SQL injection is almost a solved problem [by using] prepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), but we still need something to identify mistakes. 
 + 
 +===== Related Go Implementation ===== 
 + 
 +As discussed by [[https://www.usenix.org/conference/usenixsecurity15/symposium-program/presentation/kern|Christoph Kern (Google) in 2015]] and in [[https://www.youtube.com/watch?v=ccfEu-Jj0as|2016]], this approach works. 
 + 
 +The Go language can do this by checking for "compile time constants", which isn't as good as a run time solution (e.g. the //WHERE IN// issue), but it does work. 
 + 
 +Google currently avoids most issues by using [[https://blogtitle.github.io/go-safe-html/|go-safe-html]] and [[https://github.com/google/go-safeweb/tree/master/safesql|safesql]].
  
 ===== Related JavaScript Implementation ===== ===== Related JavaScript Implementation =====
  
-This proposal is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types.+This RFC is taking some ideas from TC39, where a similar idea is being discussed for JavaScript, to support the introduction of Trusted Types.
  
 https://github.com/tc39/proposal-array-is-template-object\\ https://github.com/tc39/proposal-array-is-template-object\\
Line 36: Line 145:
 They are looking at "Distinguishing strings from a trusted developer, from strings that may be attacker controlled". They are looking at "Distinguishing strings from a trusted developer, from strings that may be attacker controlled".
  
-===== Taint Checking =====+===== Solution =====
  
-Xinchen Hui has done some amazing work with the Taint extension:+Literals are safe values, defined within the PHP scripts, for example:
  
-https://github.com/laruence/taint+<code php> 
 +$a = 'Example'; 
 +is_literal($a); // true
  
-Unfortunately this approach does not address all issues, mainly because it still allows string escaping, which is only "[[https://www.php.net/manual/en/pdo.quote.php|Theoretically Safe]]" (typically due to character encoding issues), nor does it address issues such as missing quotes:+$a = 'Example ' $a ', ' 5; 
 +is_literal($a); // true
  
-  $sql = 'DELETE FROM table WHERE id = ' . mysqli_real_escape_string($db, $_GET['id']); +$= 'Example ' . $_GET['id']
-   +is_literal($a); // false
-  // delete.php?id=id +
-   +
-  // DELETE FROM table WHERE id = id+
  
-  $html = '<img src=' . htmlentities($_GET['url']. ' />'+$= 'Example ' . time(); 
-   +is_literal($a)// false
-  // example.php?url=x%20onerror=alert(cookie) +
-   +
-  // <img src=x onerror=alert(cookie) />+
  
-The Taint extension also [[https://github.com/laruence/taint/blob/4a6c4cb2613e27f5604d2021802c144a954caff8/taint.c#L63|conflicts with XDebug]] (sorry Derick),+$a = sprintf('LIMIT %d', 3); 
 +is_literal($a); // false
  
-===== Previous RFC =====+$c count($ids); 
 +$a 'WHERE id IN (' . implode(',', array_fill(0, $c, '?')) . ')'; 
 +is_literal($a); // true, the odd one that involves functions.
  
-Matt Tait suggested [[https://wiki.php.net/rfc/sql_injection_protection||Automatic SQL Injection Protection]].+$limit = 10; 
 +$a = 'LIMIT ' . ($limit + 1); 
 +is_literal($a); // false, but might need some discussion. 
 +</code>
  
-It was noted that "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]])and this amount of work isn'ideal for "just for one use case" ([[https://news-web.php.net/php.internals/87647|Julien Pauli]]).+This uses a similar definition of [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]] from Matt Tait's RFCbut it doesn'need to accept Integer or FloatingPoint variables as safe (unless it makes the implementation easier), nor should this proposal effect any existing functions.
  
-Where it would have effected every SQL function, such as //mysqli_query()//, //$pdo->query()//, //odbc_exec()//, etc (concerns raised by [[https://news-web.php.net/php.internals/87436|Lester Caine]] and [[https://news-web.php.net/php.internals/87650|Anthony Ferrara]]).+Thanks to [[https://chat.stackoverflow.com/transcript/message/51565346#51565346|NikiC]], it looks like we can reuse the GC_PROTECTED flag for strings.
  
-And each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass rawuser submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).+As an aside, [[https://news-web.php.net/php.internals/87396|Xinchen Hui]] found the Taint extension was complex in PHP5, but "with PHP7's new zend_string, and string flags, the implementation will become easier". Also, [[https://chat.stackoverflow.com/transcript/message/48927813#48927813|MarkR]] suggested that it might be possible to use the fact that "interned strings in PHP have a flag", which is there because these "can't be freed".
  
-I also agree that "SQL injection is almost solved problem [by using] prepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), but we do need something that can identify mistakes, ideally at runtime.+Commands can be checked to ensure they are a "programmer supplied constant/static/validated string", and all other unsafe variables are provided separately (as noted by [[https://news-web.php.net/php.internals/87725|Yasuo Ohgaki]]).
  
-===== Proposal =====+This approach allows all systems/frameworks to decide if they want to **block**, **educate** (via a notice), or **ignore** these issues (to avoid the "don't nanny" concern raised by [[https://news-web.php.net/php.internals/87383|Lester Caine]]).
  
-Add an //is_literal()// function to check if a given variable has only been created by Literal(s).+Unlike the Taint extension, there must **not** be an equivalent //untaint()// function, or support any kind of escaping.
  
-This uses a similar definition as the [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]] by Matt Tait, but it does not need to accept Integer or FloatingPoint variables as safe (unless it makes the implementation easier), nor should it effect any existing functions.+==== SolutionSQL Injection ====
  
-Thanks to [[https://news-web.php.net/php.internals/87396|Xinchen Hui]], we know the PHP5 Taint extension was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier".+Database abstractions (e.gORMs) will be able to ensure they are provided with strings that are safe.
  
-And thanks to [[https://chat.stackoverflow.com/transcript/message/48927813#48927813|Mark R]], it might be possible to use the fact that "interned strings in PHP have a flag", which is there because these "can't be freed".+[[https://www.doctrine-project.org/projects/doctrine-orm/en/2.7/reference/query-builder.html#high-level-api-methods|Doctrine]] could use this to ensure //->where($predicates)// is a literal:
  
-Unlike the Taint extension, there is no need to provide an equivalent //untaint()// function.+<code php> 
 +$users = $queryBuilder 
 +  ->select('u') 
 +  ->from('User', 'u'
 +  ->where('u.id = ' . $_GET['id']) 
 +  ->getQuery() 
 +  ->getResult();
  
-===== Examples =====+// example.php?id=u.id 
 +</code>
  
-==== SQL Injection, Basic ====+This mistake can be easily identified by:
  
-A simple example:+<code php> 
 +public function where($predicates) 
 +
 +    if (function_exists('is_literal') && !is_literal($predicates)) { 
 +        throw new Exception('->where() can only accept a literal'); 
 +    } 
 +    ... 
 +
 +</code>
  
-  $sql = 'SELECT * FROM table WHERE id = ?'; +[[https://redbeanphp.com/index.php?p=/finding|RedBean]] could check //$sql// is a literal:
-   +
-  $result $db->exec($sql, [$id]);+
  
-Checked in the framework by:+<code php> 
 +$users = R::find('user', 'id = ' . $_GET['id']); 
 +</code>
  
-  class db { +[[http://propelorm.org/Propel/reference/model-criteria.html#relational-api|PropelORM]] could check //$clause// is a literal:
-   +
-    public function exec($sql, $parameters = []) { +
-   +
-      if (!is_literal($sql)) { +
-        throw new Exception('SQL must be a literal.'); +
-      } +
-   +
-      $statement = $this->pdo->prepare($sql); +
-      $statement->execute($parameters); +
-      return $statement->fetchAll(); +
-   +
-    } +
-   +
-  }+
  
-It will also work with string concatenation:+<code php> 
 +$users = UserQuery::create()->where('id = ' . $_GET['id'])->find(); 
 +</code>
  
-  define('TABLE', 'example'); +The //is_literal()// function could also be used internally by ORM developersso they can be sure they have created their SQL strings out of literals. This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https://framework.zend.com/security/advisory/ZF2014-04|1]]/[[https://framework.zend.com/security/advisory/ZF2016-03|2]].
-   +
-  $sql = 'SELECT * FROM ' . TABLE . ' WHERE id = ?'; +
-   +
-    is_literal($sql)// Returns true +
-   +
-  $sql .= ' AND id = ' . mysqli_real_escape_string($db$_GET['id']); +
-   +
-    is_literal($sql); // Returns false+
  
-==== SQL Injection, ORDER BY ====+==== Solution: SQL Injection, Basic ====
  
-To ensure //ORDER BY// can be set via the user, but only use acceptable values:+A simple example:
  
-  $order_fields +<code php> 
-      'name', +$sql = 'SELECT * FROM table WHERE id = ?';
-      'created', +
-      'admin', +
-    ]; +
-   +
-  $order_id array_search(($_GET['sort'?? NULL), $order_fields); +
-   +
-  $sql = ORDER BY ' . $order_fields[$order_id];+
  
-==== SQL InjectionWHERE IN ====+$result $db->exec($sql[$id]); 
 +</code>
  
-Most SQL strings can be a concatenations of literal values, but //WHERE x IN (?,?,?)// need to use a variable number of literal placeholders.+Checked in the framework by:
  
-So there //might// need to be a special case for //array_fill()//+//implode()// or //str_repeat()//+//substr()// to create something like '?,?,?'+<code php> 
 +class db {
  
-  $in_sql = implode(',', array_fill(0, count($ids)'?')); +  public function exec($sql, $parameters []{
-   +
-  // or +
-   +
-  $in_sql substr(str_repeat('?,', count($ids)), 0, -1);+
  
-To be used with:+    if (!is_literal($sql)) { 
 +      throw new Exception('SQL must be a literal.'); 
 +    }
  
-  $sql 'SELECT * FROM table WHERE id IN (' . $in_sql . ')';+    $statement $this->pdo->prepare($sql); 
 +    $statement->execute($parameters); 
 +    return $statement->fetchAll();
  
-==== SQL Injection, ORM Usage ====+  }
  
-[[https://www.doctrine-project.org/projects/doctrine-orm/en/2.7/reference/query-builder.html#high-level-api-methods|Doctrine]] could use this to ensure //$predicates// is a literal:+
 +</code>
  
-  $users = $queryBuilder +This also works with string concatenation:
-    ->select('u'+
-    ->from('User', 'u'+
-    ->where('u.id = ' . $_GET['id']) +
-    ->getQuery() +
-    ->getResult(); +
-   +
-  // example.php?id=u.id+
  
-Where this mistake could be identified by:+<code php> 
 +define('TABLE', 'example');
  
-  public function where($predicates) +$sql = 'SELECT * FROM ' . TABLE ' WHERE id = ?';
-  { +
-      if (!is_literal($predicates)) { +
-          throw new Exception('Can only accept a literal'); +
-      } +
-      ..+
-  }+
  
-[[https://redbeanphp.com/index.php?p=/finding|RedBean]] could check //$sql// is a literal:+  is_literal($sql); // Returns true
  
-  $users R::find('user', 'id = ' . $_GET['id']);+$sql .= ' AND id = ' . $mysqli->escape_string($_GET['id']);
  
-[[http://propelorm.org/Propel/reference/model-criteria.html#relational-api|PropelORM]] could check //$clause/is a literal:+  is_literal($sql); // Returns false 
 +</code> 
 + 
 +==== Solution: SQL Injection, ORDER BY ==== 
 + 
 +To ensure //ORDER BY// can be set via the user, but only use acceptable values: 
 + 
 +<code php> 
 +$order_fields = [ 
 +    'name', 
 +    'created', 
 +    'admin', 
 +  ]; 
 + 
 +$order_id = array_search(($_GET['sort'] ?? NULL), $order_fields); 
 + 
 +$sql = ' ORDER BY ' . $order_fields[$order_id]; 
 +</code> 
 + 
 +==== SolutionSQL Injection, WHERE IN ====
  
-  $users = UserQuery::create()->where('id = ' $_GET['id'])->find();+Most SQL strings can be a simple concatenations of literal values, but //WHERE x IN (?,?,?)// needs to use a variable number of literal placeholders.
  
-==== SQL InjectionORM Internal ====+There needs to be a special case for //array_fill()//+//implode()//so the //is_literal// state can be preserved, allowing us to create the safe literal string '?,?,?':
  
-The //is_literal()// function could be used by ORM developersso they can be sure they have created an SQL string out of literals.+<code php> 
 +$in_sql = implode(',', array_fill(0, count($ids), '?'));
  
-This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https://framework.zend.com/security/advisory/ZF2014-04|1]]/[[https://framework.zend.com/security/advisory/ZF2016-03|2]].+$sql = 'SELECT * FROM table WHERE id IN (' $in_sql ')'; 
 +</code>
  
-==== CLI Injection ====+==== Solution: CLI Injection ====
  
 Rather than using functions such as: Rather than using functions such as:
Line 200: Line 312:
 Frameworks (or PHP) could introduce something similar to //pcntl_exec()//, where arguments are provided separately. Frameworks (or PHP) could introduce something similar to //pcntl_exec()//, where arguments are provided separately.
  
-Or, take a verified literal for the command, and use parameters for the arguments (like SQL):+Or, take a safe literal for the command, and use parameters for the arguments (like SQL does):
  
-  $output = parameterised_exec('grep ? /path/to/file | wc -l', [ +<code php> 
-      'example', +$output = parameterised_exec('grep ? /path/to/file | wc -l', [ 
-    ]);+    'example', 
 +  ]); 
 +</code>
  
 Rough implementation: Rough implementation:
  
-  function parameterised_exec($cmd, $args = []) { +<code php> 
-   +function parameterised_exec($cmd, $args = []) { 
-    if (!is_literal($cmd)) { + 
-      throw new Exception('The first argument must be a literal'); +  if (!is_literal($cmd)) { 
-    +    throw new Exception('The first argument must be a literal'); 
-   +  
-    $offset = 0; + 
-    $k = 0; +  $offset = 0; 
-    while (($pos = strpos($cmd, '?', $offset)) !== false) { +  $k = 0; 
-      if (!isset($args[$k])) { +  while (($pos = strpos($cmd, '?', $offset)) !== false) { 
-        throw new Exception('Missing parameter "' . ($k + 1) . '"'); +    if (!isset($args[$k])) { 
-        exit(); +      throw new Exception('Missing parameter "' . ($k + 1) . '"');
-      } +
-      $arg = escapeshellarg($args[$k]); +
-      $cmd = substr($cmd, 0, $pos) . $arg . substr($cmd, ($pos + 1)); +
-      $offset = ($pos + strlen($arg)); +
-      $k++; +
-    } +
-    if (isset($args[$k])) { +
-      throw new Exception('Unused parameter "' . ($k + 1) . '"');+
       exit();       exit();
     }     }
-   +    $arg = escapeshellarg($args[$k]); 
-    return exec($cmd); +    $cmd = substr($cmd, 0, $pos) . $arg . substr($cmd, ($pos + 1))
-  +    $offset = ($pos + strlen($arg)); 
 +    $k++
 +  
 +  if (isset($args[$k])) { 
 +    throw new Exception('Unused parameter "' . ($k + 1) . '"'); 
 +    exit();
   }   }
  
-==== HTML Injection ====+  return exec($cmd); 
 + 
 +
 +</code> 
 + 
 +==== Solution: HTML Injection ====
  
 Template engines should receive variables separately from the raw HTML. Template engines should receive variables separately from the raw HTML.
  
-Often the engine will get the HTML from static files:+Often the engine will get the HTML from static files (safe):
  
-  $html = file_get_contents('/path/to/template.html');+<code php> 
 +$html = file_get_contents('/path/to/template.html'); 
 +</code>
  
 But small snippets of HTML are often easier to define as a literal within the PHP script: But small snippets of HTML are often easier to define as a literal within the PHP script:
  
-  $template_html = ' +<code php> 
-    <p>Hello <span id="username"></span></p> +$template_html = ' 
-    <p><a>Website</a></p>';+  <p>Hello <span id="username"></span></p> 
 +  <p><a>Website</a></p>'; 
 +</code> 
 + 
 +Where the variables are supplied separately, in this example I'm using XPath: 
 + 
 +<code php> 
 +$values = [ 
 +    '//span[@id="username"]' => [ 
 +        NULL      => 'Name', // The textContent 
 +        'class'   => 'admin', 
 +        'data-id' => '123', 
 +      ], 
 +    '//a' => [ 
 +        'href' => 'https://example.com', 
 +      ], 
 +  ]; 
 + 
 +echo template_parse($template_html, $values); 
 +</code> 
 + 
 +The templating engine can then accept and apply the supplied variables for the relevant context. 
 + 
 +As a simple example, this can be done with: 
 + 
 +<code php> 
 +function template_parse($html, $values) { 
 + 
 +  if (!is_literal($html)) { 
 +    throw new Exception('Invalid Template HTML.'); 
 +  }
  
-Where the variables are supplied separately, in this example I'm using XPaths:+  $dom = new DomDocument(); 
 +  $dom->loadHTML('<' . '?xml encoding="UTF-8">' . $html);
  
-  $values +  $xpath new DOMXPath($dom);
-      '//span[@id="username"]' => [ +
-          NULL      => 'Name', // The textContent +
-          'class'   => 'admin', +
-          'data-id' => '123', +
-        ], +
-      '//a' => [ +
-          'href' => 'https://example.com', +
-        ], +
-    ]; +
-   +
-  echo template_parse($template_html, $values);+
  
-Being sure the HTML does not contain unsafe variables, the templating engine can accept and apply the supplied variables for the relevant context, for example:+  foreach ($values as $query => $attributes) {
  
-  function template_parse($html, $values) { +    if (!is_literal($query)) { 
-   +      throw new Exception('Invalid Template XPath.');
-    if (!is_literal($html)) { +
-      throw new Exception('Invalid Template HTML.');+
     }     }
-   + 
-    $dom = new DomDocument(); +    foreach ($xpath->query($query) as $element) { 
-    $dom->loadHTML('<?xml encoding="UTF-8">' . $html); +      foreach ($attributes as $attribute => $value) { 
-   + 
-    $xpath = new DOMXPath($dom); +        if (!is_literal($attribute)) { 
-   +          throw new Exception('Invalid Template Attribute.'); 
-    foreach ($values as $query => $attributes) { +        
-   + 
-      if (!is_literal($query)) { +        if ($attribute) { 
-        throw new Exception('Invalid Template XPath.'); +          $safe = false; 
-      } +          if ($attribute == 'href') { 
-   +            if (preg_match('/^https?:\/\//', $value)) { 
-      foreach ($xpath->query($query) as $element) { +              $safe = true; // Not "javascript:..."
-        foreach ($attributes as $attribute => $value) { +
-   +
-          if (!is_literal($attribute)) { +
-            throw new Exception('Invalid Template Attribute.'); +
-          +
-   +
-          if ($attribute) { +
-            $safe = false; +
-            if ($attribute == 'href') { +
-              if (preg_match('/^https?:\/\//', $value)) { +
-                $safe = true; // Not "javascript:..." +
-              } +
-            } else if ($attribute == 'class') { +
-              if (in_array($value, ['admin', 'important'])) { +
-                $safe = true; // Only allow specific classes? +
-              } +
-            } else if (preg_match('/^data-[a-z]+$/', $attribute)) { +
-              if (preg_match('/^[a-z0-9 ]+$/i', $value)) { +
-                $safe = true; +
-              }+
             }             }
-            if ($safe) { +          } else if ($attribute == 'class') { 
-              $element->setAttribute($attribute, $value);+            if (in_array($value, ['admin', 'important'])) { 
 +              $safe = true// Only allow specific classes?
             }             }
-          } else { +          } else if (preg_match('/^data-[a-z]+$/', $attribute)) { 
-            $element->textContent = $value;+            if (preg_match('/^[a-z0-9 ]+$/i', $value)) { 
 +              $safe = true; 
 +            } 
 +          } 
 +          if ($safe) 
 +            $element->setAttribute($attribute, $value);
           }           }
-  +        } else { 
 +          $element->textContent = $value;
         }         }
 +
       }       }
-   
     }     }
-   + 
-    $html = ''; +  } 
-    $body = $dom->documentElement->firstChild; + 
-    if ($body->hasChildNodes()) { +  $html = ''; 
-      foreach ($body->childNodes as $node) { +  $body = $dom->documentElement->firstChild; 
-        $html .= $dom->saveXML($node); +  if ($body->hasChildNodes()) { 
-      }+    foreach ($body->childNodes as $node) { 
 +      $html .= $dom->saveXML($node);
     }     }
-   
-    return $html; 
-   
   }   }
 +
 +  return $html;
 +
 +}
 +</code>
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
  
-Not sure+None
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
  
-PHP 8?+PHP 8.1?
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 353: Line 479:
 ===== Open Issues ===== ===== Open Issues =====
  
-  - Would this cause performance issues? +On [[https://github.com/craigfrancis/php-is-literal-rfc/issues|GitHub]]: 
-  - Can //array_fill()//+//implode()// or //str_repeat()//+//substr()// pass though the "is_literal" flag for the "WHERE IN" case?+ 
 +  - Would this cause performance issues? Presumably not as bad a type checking. 
 +  - Can //array_fill()//+//implode()// pass though the "is_literal" flag for the "WHERE IN" case?
   - Should the function be named //is_from_literal()//? (suggestion from [[https://news-web.php.net/php.internals/109197|Jakob Givoni]])   - Should the function be named //is_from_literal()//? (suggestion from [[https://news-web.php.net/php.internals/109197|Jakob Givoni]])
-  - Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), won't be able to use this check, as originally noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]].+  - Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), so they might need to make some changes to use this check, as originally noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]].
  
 ===== Alternatives ===== ===== Alternatives =====
  
   - The current Taint Extension (notes above)   - The current Taint Extension (notes above)
-  - Using static analysis (not runtime), for example [[https://psalm.dev/|psalm]] (thanks [[https://news-web.php.net/php.internals/109192|Tyson Andre]])+  - Using static analysis (not at runtime), for example [[https://psalm.dev/|psalm]] (thanks [[https://news-web.php.net/php.internals/109192|Tyson Andre]]). But I can't find any which do these checks by default (if they even try), and we can't expect all programmers to use static analysis (especially those who have just stated).
  
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
Line 369: Line 497:
 ===== Future Scope ===== ===== Future Scope =====
  
-Certain functions (mysqli_query, preg_match, etc) might use this information to generate a error/warning/notice.+As noted by [[https://chat.stackoverflow.com/transcript/message/51573226#51573226|MarkR]], the benefit will come when it can be used by PDO and similar functions (//mysqli_query////preg_match//, etc).
  
-===== Proposed Voting Choices =====+This check could be used to throw an exception, or generate an error/warning/notice, providing a way for PHP to teach new programmers, and/or completely block unsafe values in SQL, HTML, CLI, etc.
  
-Not sure+PHP could also have a mode where output (e.g. //echo '<html>'//) is blocked, and this can be bypassed (maybe via //ini_set//) when the HTML Templating Engine has created the correctly encoded output.
  
-===== Patches and Tests =====+And maybe there could be a [[https://chat.stackoverflow.com/transcript/message/51573091#51573091|is_figurative()]] function, which [[https://chat.stackoverflow.com/transcript/message/48927770#48927770|MarkR]] seems to really want :-)
  
-A volunteer is needed to help with implementation.+===== Proposed Voting Choices =====
  
-===== Implementation =====+N/A 
 + 
 +===== Patches and Tests =====
  
 N/A N/A
  
-===== References =====+===== Implementation =====
  
-https://wiki.php.net/rfc/sql_injection_protection+[[https://github.com/Danack/|Danack]] has [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|started an implementation]].
  
 ===== Rejected Features ===== ===== Rejected Features =====
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis