rfc:is_literal

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
rfc:is_literal [2020/03/23 20:23] – Add link to GitHub craigfrancisrfc:is_literal [2021/05/02 16:38] – Add performance stats, and notes from Dan craigfrancis
Line 1: Line 1:
 ====== PHP RFC: Is Literal Check ====== ====== PHP RFC: Is Literal Check ======
  
-  * Version: 0.1+  * Version: 0.6
   * Date: 2020-03-21   * Date: 2020-03-21
 +  * Updated: 2021-04-30
   * Author: Craig Francis, craig#at#craigfrancis.co.uk   * Author: Craig Francis, craig#at#craigfrancis.co.uk
   * Status: Draft   * Status: Draft
Line 10: Line 11:
 ===== Introduction ===== ===== Introduction =====
  
-Add an //is_literal()// functionso developers/frameworks can be sure they are working with a safe value - one created from one or more literals, defined within PHP scripts.+A new function, //is_literal(string $string)//, to identify variables that have been created from a programmer defined string.
  
-This function would allows developers/frameworks, at runtime, to warn or block SQL Injection, Command Line Injection, and many cases of HTML Injection.+This takes the concept of "taint checking" and makes it simpler and stricter.
  
-Commands can then be tested to ensure they are "programmer supplied constant/static/validated string", and all other unsafe variables are provided separately (as noted by [[https://news-web.php.net/php.internals/87725|Yasuo Ohgaki]]).+It does not allow variable to be marked as untainted, and it does not allowing escaping (important).
  
-This will also allow systems/frameworks to decide if they want to **block****educate** (via notice)or **ignore** these issues (to avoid the "don't nanny" concern raised by [[https://news-web.php.net/php.internals/87383|Lester Caine]]).+For example, a database library that supports parametrised queries at the driver leveltoday a programmer could use either of these:
  
-Literals are values defined within the PHP scripts, for example:+<code php> 
 +$db->query('SELECT * FROM users WHERE id = ' . $_GET['id']); // INSECURE
  
-    $a = 'Hi'; +$db->query('SELECT * FROM users WHERE id ?', [$_GET['id']])
-    $b = 'Example ' . $a; +</code>
-    is_literal($b); // Returns true +
-     +
-    $c = 'Example ' . $_GET['id']; +
-    is_literal($c); /Returns false+
  
-===== Related JavaScript Implementation =====+By rejecting the SQL that was not written as a literal (first example), and only accepting a literal string (written by the programmer), the library can provide an "inherently safe API".
  
-This proposal is taking some ideas from TC39where similar idea is being discussed for JavaScript, to support the introduction of Trusted Types.+This definition of an "inherently safe API" comes from Christoph Kernwho did talk in 2016 about [[https://www.youtube.com/watch?v=ccfEu-Jj0as|Preventing Security Bugs through Software Design]] (also at [[https://www.usenix.org/conference/usenixsecurity15/symposium-program/presentation/kern|USENIX Security 2015]]), which covers how this is used at Google. The idea is that we "Don't Blame the DeveloperBlame the API"; where we need to put the burden on libraries (written once, used by many) to ensure that it's impossible for the developer to make these mistakes.
  
-https://github.com/tc39/proposal-array-is-template-object\\ +By adding a way for libraries to check if the strings they receive came from the developer (from trusted PHP source code), it allows the library to check they are being used in a safe way.
-https://github.com/mikewest/tc39-proposal-literals+
  
-They are looking at "Distinguishing strings from a trusted developer, from strings that may be attacker controlled".+===== Why =====
  
-===== Taint Checking =====+The [[https://owasp.org/www-project-top-ten/|OWASP Top 10]] lists common vulnerabilities sorted by prevalence, exploitability, detectability, and impact.
  
-Xinchen Hui has done some amazing work with the Taint extension:+**Injection vulnerabilities** remain at the top of the list (common prevalence, easy for attackers to detect/exploit, severe impact); and **XSS** comes in at 7 (widespread prevalence, easy for attackers to detect/exploit, moderate impact).
  
-https://github.com/laruence/taint+This is because injection mistakes are very easy to make, and hard to identify - is_literal() addresses this.
  
-Unfortunately this approach does not address all issues, mainly because it still allows string escaping, which is only "[[https://www.php.net/manual/en/pdo.quote.php|Theoretically Safe]]" (typically due to character encoding issues), nor does it address issues such as missing quotes:+===== Examples =====
  
-  $sql = 'DELETE FROM table WHERE id = ' . mysqli_real_escape_string($db, $_GET['id']); +The [[https://www.doctrine-project.org/projects/doctrine-orm/en/current/reference/query-builder.html#high-level-api-methods|Doctrine Query Builder]] allows a custom WHERE clause to be provided as a string. This is intended for use with literals and placeholders, but does not protect against this simple mistake:
-   +
-  // delete.php?id=id +
-   +
-  // DELETE FROM table WHERE id = id+
  
-  $html = '<img src=' . htmlentities($_GET['url']) . ' />'; +<code php> 
-   +// INSECURE 
-  // example.php?url=x%20onerror=alert(cookie) +$qb->select('u') 
-   +   ->from('User', 'u'
-  // <img src=x onerror=alert(cookie) />+   ->where('u.id = ' . $_GET['id']) 
 +</code>
  
-The Taint extension also [[https://github.com/laruence/taint/blob/4a6c4cb2613e27f5604d2021802c144a954caff8/taint.c#L63|conflicts with XDebug]] (sorry Derick),+The definition of the //where()// method could check with //is_literal()// and throw an exceptionadvising the programmer to replace it with a safer use of placeholders:
  
-===== Previous RFC =====+<code php> 
 +$qb->select('u'
 +   ->from('User', 'u'
 +   ->where('u.id :identifier'
 +   ->setParameter('identifier', $_GET['id']); 
 +</code>
  
-Matt Tait suggested [[https://wiki.php.net/rfc/sql_injection_protection||Automatic SQL Injection Protection]].+Similarly, Twig allows [[https://twig.symfony.com/doc/2.x/recipes.html#loading-a-template-from-a-string|loading a template from a string]], which could allow accidentally skipping the default escaping functionality:
  
-It was noted that "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]]), and this amount of work isn't ideal for "just for one use case" ([[https://news-web.php.net/php.internals/87647|Julien Pauli]]).+<code php> 
 +// INSECURE 
 +echo $twig->createTemplate('<p>Hi ' $_GET['name''</p>')->render(); 
 +</code>
  
-Where it would have effected every SQL function, such as //mysqli_query()////$pdo->query()//, //odbc_exec()//, etc (concerns raised by [[https://news-web.php.net/php.internals/87436|Lester Caine]] and [[https://news-web.php.net/php.internals/87650|Anthony Ferrara]]).+If //createTemplate()// checked with //is_literal()//, the programmer could be advised to write this instead:
  
-And each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST databecause some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).+<code php> 
 +echo $twig->createTemplate('<p>Hi {{ name }}</p>')->render(['name' => $_GET['name']]); 
 +</code>
  
-I also agree that "SQL injection is almost a solved problem [by using] prepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), but we do need something that can identify mistakes, ideally at runtime.+===== Failed Solutions =====
  
-===== Proposal =====+==== Education ====
  
-Add an //is_literal()// function to check if a given variable has only been created by Literal(s).+Developer training has not worked, it simply does not scale (people start programming every day), and learning about every single issue is difficult.
  
-This uses a similar definition as the [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]] by Matt Taitbut it does not need to accept Integer or FloatingPoint variables as safe (unless it makes the implementation easier), nor should it effect any existing functions.+Keeping in mind that programmers will frequently do just enough to complete their task (busy)where they often copy/paste a solution to their problem they find online (risky), modify it for their needs (risky), then move on.
  
-Thanks to [[https://news-web.php.net/php.internals/87396|Xinchen Hui]], we know the PHP5 Taint extension was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier".+We cannot keep saying they 'need to be careful', and relying on them to never make a mistake.
  
-And thanks to [[https://chat.stackoverflow.com/transcript/message/48927813#48927813|Mark R]], it might be possible to use the fact that "interned strings in PHP have a flag", which is there because these "can't be freed".+==== Escaping ====
  
-Unlike the Taint extension, there is no need to provide an equivalent //untaint()// function.+Escaping is hard, and error prone.
  
-===== Examples =====+We have a list of common [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification.md#common-mistakes|escaping mistakes]].
  
-==== SQL InjectionBasic ====+Developers should use parameterised queries (e.g. SQL)or a well tested library that knows how to escape values based on their context (e.g. HTML).
  
-A simple example:+==== Taint Checking ====
  
-  $sql = 'SELECT * FROM table WHERE id = ?'; +Some languages implement a "taint flag" which tracks whether values are considered "safe".
-   +
-  $result = $db->exec($sql, [$id]);+
  
-Checked in the framework by:+There is a [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui, and [[https://wiki.php.net/rfc/taint|a previous RFC proposing it be added to the language]].
  
-  class db { +These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theorybut the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to feature that is both complex and unreliable.
-   +
-    public function exec($sql$parameters = []) { +
-   +
-      if (!is_literal($sql)) { +
-        throw new Exception('SQL must be literal.'); +
-      } +
-   +
-      $statement = $this->pdo->prepare($sql); +
-      $statement->execute($parameters); +
-      return $statement->fetchAll(); +
-   +
-    } +
-   +
-  }+
  
-It will also work with string concatenation:+This proposal avoids the complexity by addressing a different part of the problemseparating inputs supplied by the programmer, from inputs supplied by the user.
  
-  define('TABLE', 'example'); +==== Static Analysis ====
-   +
-  $sql 'SELECT * FROM ' . TABLE . ' WHERE id ?'; +
-   +
-    is_literal($sql); // Returns true +
-   +
-  $sql .' AND id ' . mysqli_real_escape_string($db, $_GET['id']); +
-   +
-    is_literal($sql); // Returns false+
  
-==== SQL InjectionORDER BY ====+While I agree with [[https://news-web.php.net/php.internals/109192|Tyson Andre]]it is highly recommended to use Static Analysis.
  
-To ensure //ORDER BY// can be set via the userbut only use acceptable values:+But they nearly always focus on other issues (type checkingbasic logic flaws, code formatting, etc).
  
-  $order_fields = [ +Those that attempt to address injection vulnerabilitiesdo so via Taint Checking (see above), and are [[https://github.com/vimeo/psalm/commit/2122e4a1756dac68a83ec3f5abfbc60331630781|often incomplete]].
-      'name', +
-      'created', +
-      'admin', +
-    ]; +
-   +
-  $order_id = array_search(($_GET['sort'] ?? NULL), $order_fields); +
-   +
-  $sql = ' ORDER BY ' $order_fields[$order_id];+
  
-==== SQL InjectionWHERE IN ====+For a quick example, psalm, even in its strictest errorLevel (1), and/or running //--taint-analysis//, will not notice the missing quote marks in this SQL, and will incorrectly assume this is perfectly safe:
  
-Most SQL strings can be a concatenations of literal values, but //WHERE x IN (?,?,?)// need to use a variable number of literal placeholders.+<code php> 
 +$db = new mysqli('...');
  
-So there //might// need to be a special case for //array_fill()//+//implode()// or //str_repeat()//+//substr()// to create something like '?,?,?'+$id = (string) ($_GET['id'?? 'id'); // Keep the type checker happy.
  
-  $in_sql = implode(',', array_fill(0, count($ids), '?')); +$db->prepare('SELECT * FROM users WHERE id = . $db->real_escape_string($id)); 
-   +</code>
-  // or +
-   +
-  $in_sql = substr(str_repeat('?,', count($ids)), 0, -1);+
  
-To be used with:+When psalm comes to taint checking the usage of a library (like Doctrine), it assumes all methods are safe, because none of them note the sinks (and even if they did, you're back to escaping being an issue).
  
-  $sql = 'SELECT * FROM table WHERE id IN ($in_sql . ')';+But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming (usage tends to be higher by those writing well tested libraries).
  
-==== SQL Injection, ORM Usage ====+===== Proposal =====
  
-[[https://www.doctrine-project.org/projects/doctrine-orm/en/2.7/reference/query-builder.html#high-level-api-methods|Doctrine]] could use this to ensure //$predicates// is a literal:+This RFC proposes adding three functions:
  
-  $users = $queryBuilder +  * //is_literal(string $string): bool// to check if a variable represents a value written into the source code or not. 
-    ->select('u'+  * //literal_combine(string $piecestring $pieces): string// to allow concatenating literal strings. 
-    ->from('User''u'+  * //literal_implode(string $glue, array $pieces): string// to implode an array of literals, with a literal.
-    ->where('u.id = ' . $_GET['id']) +
-    ->getQuery() +
-    ->getResult(); +
-   +
-  // example.php?id=u.id+
  
-Where this mistake could be identified by:+A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way.
  
-  public function where($predicates) +<code php> 
-  { +is_literal('Example'); // true
-      if (!is_literal($predicates)) { +
-          throw new Exception('Can only accept a literal'); +
-      } +
-      ... +
-  }+
  
-[[https://redbeanphp.com/index.php?p=/finding|RedBean]] could check //$sql// is literal:+$a = 'Hello'; 
 +$b = 'World';
  
-  $users = R::find('user', 'id = ' . $_GET['id']);+is_literal($a); // true 
 +is_literal($a . $b); // false, no concat at run time (details below).
  
-[[http://propelorm.org/Propel/reference/model-criteria.html#relational-api|PropelORM]] could check //$clause// is a literal:+is_literal(literal_combine($a, $b)); // true
  
-  $users = UserQuery::create()->where('id = ' . $_GET['id'])->find();+is_literal($_GET['id']); // false 
 +is_literal('WHERE id = ' . intval($_GET['id'])); // false 
 +is_literal(rand(0, 10)); // false 
 +is_literal(sprintf('LIMIT %d', 3)); // false, should use parameters 
 +</code>
  
-==== SQL InjectionORM Internal ====+There is no way to manually mark a string as a literal (i.e. no equivalent to //untaint()//); as soon as the value has been manipulated in any wayit is no longer marked as a literal.
  
-The //is_literal()// function could be used by ORM developers, so they can be sure they have created an SQL string out of literals.+*Technical detail: Strings that are concatenated in place at compile time are treated as a literal.*
  
-This would avoid mistakes such as the ORDER BY issues in the Zend framework [[https://framework.zend.com/security/advisory/ZF2014-04|1]]/[[https://framework.zend.com/security/advisory/ZF2016-03|2]].+===== Previous Work =====
  
-==== CLI Injection ====+Google uses "compile time constants" in Go, which isn't as good as a run time solution (e.g. the //WHERE IN// issue), but it works, and is used by [[https://blogtitle.github.io/go-safe-html/|go-safe-html]] and [[https://github.com/google/go-safeweb/tree/master/safesql|go-safesql]].
  
-Rather than using functions such as:+Google also uses [[https://errorprone.info/|Error Prone]] in Java to augment the compiler's type analysis, where [[https://errorprone.info/bugpattern/CompileTimeConstant|@CompileTimeConstant]] ensures method parameters can only use "compile-time constant expressions" (this isn't a complete solution either).
  
-  * //exec()/+Perl has a [[https://perldoc.perl.org/perlsec#Taint-mode|Taint Mode]], via the -T flag, where all input is marked as "tainted", and cannot be used by some methods (like commands that modify files), unless you use a regular expression to match and return known-good values (where regular expressions are easy to get wrong).
-  * //shell_exec()// +
-  * //system()// +
-  * //passthru()//+
  
-Frameworks (or PHP) could introduce something similar to //pcntl_exec()//, where arguments are provided separately.+[[https://github.com/tc39/proposal-array-is-template-object|JavaScript might get isTemplateObject]] to "Distinguishing strings from a trusted developer from strings that may be attacker controlled" (intended to be [[https://github.com/mikewest/tc39-proposal-literals|used with Trusted Types]]).
  
-Ortake a verified literal for the command, and use parameters for the arguments (like SQL):+As noted abovethere is the [[https://github.com/laruence/taint|Taint extension for PHP]] by Xinchen Hui.
  
-  $output = parameterised_exec('grep ? /path/to/file wc -l', [ +And there is the [[https://wiki.php.net/rfc/sql_injection_protection|Automatic SQL Injection Protection]] RFC by Matt Taitwhere this RFC uses a similar concept of the [[https://wiki.php.net/rfc/sql_injection_protection#safeconst|SafeConst]]. When Matt's RFC was being discussedit was noted:
-      'example', +
-    ]);+
  
-Rough implementation:+  * "unfiltered input can affect way more than only SQL" ([[https://news-web.php.net/php.internals/87355|Pierre Joye]]); 
 +  * this amount of work isn't ideal for "just for one use case" ([[https://news-web.php.net/php.internals/87647|Julien Pauli]]); 
 +  * It would have effected every SQL function, such as //mysqli_query()//, //$pdo->query()//, //odbc_exec()//, etc (concerns raised by [[https://news-web.php.net/php.internals/87436|Lester Caine]] and [[https://news-web.php.net/php.internals/87650|Anthony Ferrara]]); 
 +  * Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally "pass raw, user submitted, SQL" (Ronald Chmara [[https://news-web.php.net/php.internals/87406|1]]/[[https://news-web.php.net/php.internals/87446|2]]).
  
-  function parameterised_exec($cmd, $args = []) +I also agree that "SQL injection is almost a solved problem [by using] prepared statements" ([[https://news-web.php.net/php.internals/87400|Scott Arciszewski]]), and this is where //is_literal()// can be used to check that no mistakes are made when using prepared statements. 
-   + 
-    if (!is_literal($cmd)) { +===== Usage ===== 
-      throw new Exception('The first argument must be a literal'); + 
-    } +By libraries: 
-   + 
-    $offset 0+<code php> 
-    $k = 0; +class db { 
-    while (($pos = strpos($cmd, '?', $offset)) !== false) { +  protected $level 2// Probably should default to 1 at first. 
-      if (!isset($args[$k])) { +  function literal_check($var) { 
-        throw new Exception('Missing parameter "' . ($k + 1) '"'); +    if (function_exists('is_literal'&& !is_literal($var)) { 
-        exit();+      if ($this->level === 0) { 
 +        // Programmer aware, and is choosing to bypass this check. 
 +      } else if ($this->level === 1) 
 +        trigger_error('Non-literal detected!', E_USER_WARNING); 
 +      } else { 
 +        throw new Exception('Non-literal detected!');
       }       }
-      $arg = escapeshellarg($args[$k]); 
-      $cmd = substr($cmd, 0, $pos) . $arg . substr($cmd, ($pos + 1)); 
-      $offset = ($pos + strlen($arg)); 
-      $k++; 
     }     }
-    if (isset($args[$k])) { 
-      throw new Exception('Unused parameter "' . ($k + 1) . '"'); 
-      exit(); 
-    } 
-   
-    return exec($cmd); 
-   
   }   }
 +  function unsafe_disable_injection_protection() {
 +    $this->level = 0;
 +  }
 +  function where($sql, $parameters = []) {
 +    $this->literal_check($sql);
 +    // ...
 +  }
 +}
  
-==== HTML Injection ====+$db->where('id ?'); // OK 
 +$db->where('id ' . $_GET['id']); // Exception thrown 
 +</code>
  
-Template engines should receive variables separately from the raw HTML.+Table and Fields in SQL, which cannot use parameters; for example //ORDER BY//:
  
-Often the engine will get the HTML from static files:+<code php> 
 +$order_fields = [ 
 +    'name', 
 +    'created', 
 +    'admin', 
 +  ];
  
-  $html file_get_contents('/path/to/template.html');+$order_id array_search(($_GET['sort'] ?? NULL), $order_fields);
  
-But small snippets of HTML are often easier to define as a literal within the PHP script:+$sql = literal_combine(' ORDER BY ', $order_fields[$order_id]); 
 +</code>
  
-  $template_html = ' +Undefined number of parameters; for example //WHERE IN//:
-    <p>Hello <span id="username"></span></p> +
-    <p><a>Website</a></p>';+
  
-Where the variables are supplied separatelyin this example I'm using XPaths:+<code php> 
 +function where_in_sql($count) { // Should check for 0 
 +  $sql = []; 
 +  for ($k = 0; $k < $count; $k++) { 
 +    $sql[] = '?'; 
 +  } 
 +  return literal_implode(',', $sql); 
 +
 +$sql = literal_combine('WHERE id IN (', where_in_sql(count($ids)), ')'); 
 +</code>
  
-  $values +===== Considerations =====
-      '//span[@id="username"]' => [ +
-          NULL      => 'Name', // The textContent +
-          'class'   => 'admin', +
-          'data-id' => '123', +
-        ], +
-      '//a' => [ +
-          'href' => 'https://example.com', +
-        ], +
-    ]; +
-   +
-  echo template_parse($template_html, $values);+
  
-Being sure the HTML does not contain unsafe variables, the templating engine can accept and apply the supplied variables for the relevant context, for example:+==== Naming ====
  
-  function template_parse($html, $values) { +Literal string is the standard name for strings in source code. See https://www.google.com/search?q=what+is+literal+string+in+php 
-   + 
-    if (!is_literal($html)+> A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax. ... The heredoc preserves the line breaks and other whitespace (including indentationin the text
-      throw new Exception('Invalid Template HTML.'); + 
-    } +Alternatives suggestions have included //is_from_literal()// from [[https://news-web.php.net/php.internals/109197|Jakob Givoni]]. I think //is_safe_string()// might be asking for trouble. Other terms have included "compile time constants" and "code string". 
-   + 
-    $dom = new DomDocument()+==== Supporting Int/Float/Boolean values. ===
-    $dom->loadHTML('<?xml encoding="UTF-8">' $html); + 
-   +When converting to string, they aren't guaranteed (and often don'thave the exact same value they have in source code. 
-    $xpath new DOMXPath($dom); + 
-   +For example, //TRUE// and //true// when cast to string give "1"
-    foreach ($values as $query => $attributes) { + 
-   +It's also a very low value feature, where there might not be space for a flag to be added. 
-      if (!is_literal($query)) { + 
-        throw new Exception('Invalid Template XPath.'); +==== Supporting Concatenation ==== 
-      } + 
-   +Unfortunately early testing suggests there will be too much of a performance impact, and is why //literal_combine()// or //literal_implode()// exists. 
-      foreach ($xpath->query($query) as $element) { + 
-        foreach ($attributes as $attribute => $value) { +It isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string
-   + 
-          if (!is_literal($attribute)) { +It was considered because it would have made it easier for existing projects currently using string concatenation to adopt. 
-            throw new Exception('Invalid Template Attribute.'); + 
-          } +Joe Watkins has created a version that does support string concatenation, which we might re-consider at a later date (Joe's implementation also sets the literal flag, and was used as the basis for this implementation). 
-   + 
-          if ($attribute) { +Máté Kocsis did the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/kocsismate.pdf|primary testing on the string concat version]]and found a 0.124% performance hit for the Laravel Demo app, 0.161% for Symfony, and a more severe -3.719% when running this [[https://github.com/kocsismate/php-version-benchmarks/blob/main/app/zend/concat.php#L25|concat test]]. 
-            $safe = false; + 
-            if ($attribute == 'href'{ +In my own [[https://github.com/craigfrancis/php-is-literal-rfc/tree/main/tests|simplistic testing]], the [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/results/with-concat/local.pdf|results]] found a 0.42% performance hit for the Laravel Demo app, 0.40% for Symfony, and 1.81% when running my [[https://github.com/craigfrancis/php-is-literal-rfc/blob/main/tests/001.phpt|concat test]] via //./cli/php//
-              if (preg_match('/^https?:\/\//'$value)) { + 
-                $safe = true; // Not "javascript:..." +Dan Ackroyd also notes that the use of //literal_combine()// or //literal_implode()// will make it easier to identify exactly where mistakes are maderather than it being picked up at the end of a potentially long scriptafter multiple string concatenations, e.g. 
-              } + 
-            } else if ($attribute == 'class'+<code php> 
-              if (in_array($value['admin''important'])) { +$sortOrder 'ASC'; 
-                $safe true; // Only allow specific classes? + 
-              } +// 300 lines of code, or multiple function calls 
-            } else if (preg_match('/^data-[a-z]+$/', $attribute)) { + 
-              if (preg_match('/^[a-z0-9 ]+$/i', $value)) { +$sql .= ORDER BY name ' . $sortOrder; 
-                $safe = true; + 
-              } +// 300 lines of codeor multiple function calls 
-            } + 
-            if ($safe) { +$db->query($sql); 
-              $element->setAttribute($attribute, $value); +</code> 
-            } + 
-          } else { +If a developer changed the literal //'ASC'// to //$_GET['order']//the error raised by //$db->query()// would not be clear where the mistake was made. Whereas using //literal_combine()// highlights exactly where the issue happened: 
-            $element->textContent $value; + 
-          } +<code php> 
-   +$sql = literal_combine($sql, ' ORDER BY name ', $sortOrder); 
-        } +</code> 
-      } + 
-   +==== Performance ==== 
-    } + 
-   +The proposed implementation has: 
-    $html ''; + 
-    $body $dom->documentElement->firstChild; +    [TODO] 
-    if ($body->hasChildNodes()) { + 
-      foreach ($body->childNodes as $node) { +Also, see the section above, where string concatenation support would have introduced a ~1% performance hit. 
-        $html .= $dom->saveXML($node); + 
-      } +==== Values from INI/JSON/YAML ==== 
-    } + 
-   +As noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]], Systems/Frameworks that define certain variables (e.g. table name prefixeswithout the use of a literal (e.g. ini/json/yaml files), might need to make some changes to use this feature (depending on where they use the is_literal check). 
-    return $html; + 
-   +==== Existing String Functions ==== 
-  }+ 
 +Trying to determine if the //is_literal// flag should be passed through functions like //str_repeat()//, or //substr()// etc is difficult. Having a security feature be difficult to reason about, gives a much higher chance of making a mistake. 
 + 
 +For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate query builder, or by using //literal_combine()/////literal_implode()//.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
  
-Not sure+No known BC breaks, except for code-bases that already contain userland functions //is_literal()//, //literal_implode()// or //literal_combine()//.
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
  
-PHP 8?+PHP 8.1
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 341: Line 314:
 ==== To SAPIs ==== ==== To SAPIs ====
  
-Not sure+None known
  
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
Line 353: Line 326:
 ===== Open Issues ===== ===== Open Issues =====
  
-  - Would this cause performance issues? +None
-  - Can //array_fill()//+//implode()// or //str_repeat()//+//substr()// pass though the "is_literal" flag for the "WHERE IN" case? +
-  - Should the function be named //is_from_literal()//? (suggestion from [[https://news-web.php.net/php.internals/109197|Jakob Givoni]]) +
-  - Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), won't be able to use this check, as originally noted by [[https://news-web.php.net/php.internals/87667|Dennis Birkholz]]. +
- +
-===== Alternatives ===== +
- +
-  - The current Taint Extension (notes above) +
-  - Using static analysis (not runtime), for example [[https://psalm.dev/|psalm]] (thanks [[https://news-web.php.net/php.internals/109192|Tyson Andre]])+
  
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
  
-Not sure+None known
  
 ===== Future Scope ===== ===== Future Scope =====
  
-Certain functions (mysqli_query, preg_match, etc) might use this information to generate error/warning/notice.+As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar functions (//mysqli_query////preg_match//, //exec//, etc). But the basic idea can be used immediately by frameworks and general abstraction libraries, and they can give feedback for future work. 
 + 
 +**Phase 2** could introduce a way for programmers to specify certain PHP function/method arguments can only accept literals, and/or specific value-objects their project trusts (this idea comes from [[https://web.dev/trusted-types/|Trusted Types]] in JavaScript). 
 + 
 +For example, project could require the second argument for //pg_query()// only accept literals or their //query_builder// object (which provides a //__toString// method); and that any output (print, echo, readfile, etc) must use the //html_output// object that's returned by their trusted HTML Templating system (using //ob_start()// might be useful here). 
 + 
 +**Phase 3** could set a default of 'only literals' for all of the relevant PHP function arguments, so developers are given a warning, and later prevented (via an exception), when they provide an unsafe value to those functions (they could still specify that unsafe values are allowed, e.g. phpMyAdmin). 
 + 
 +And, for a bit of silliness (Spaß ist verboten), MarkR would like a //is_figurative()// function (functionality to be confirmed).
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
  
-Not sure+Accept the RFC. Yes/No
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
  
-volunteer is needed to help with implementation.+N/A
  
 ===== Implementation ===== ===== Implementation =====
  
-N/A+Dan Ackroyd has [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two|started an implementation]], which uses functions like [[https://github.com/php/php-src/compare/master...Danack:is_literal_attempt_two#diff-2b0486443df74cd919c949f33f895eacf97c34b8490e7554e032e770ab11e4d8R2761|literal_combine()]] to avoid performance concerns. 
 + 
 +Joe Watkins has [[https://github.com/php/php-src/compare/master...krakjoe:literals|created an implementation]] which includes string concat. As noted above, the performance impact is probably too much for it to be accepted.
  
 ===== References ===== ===== References =====
  
-- https://wiki.php.net/rfc/sql_injection_protection+N/A
  
 ===== Rejected Features ===== ===== Rejected Features =====
  
 N/A N/A
 +
 +===== Thanks =====
 +
 +  - **Dan Ackroyd**, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat.
 +  - **Joe Watkins**, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that does support string concat.
 +  - **Rowan Tommins**, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries.
 +  - **Nikita Popov**, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the "GC_PROTECTED flag for strings", which allowed Dan to start the first implementation.
 +  - **Mark Randall**, MarkR, for alternative ideas, and noting that "interned strings in PHP have a flag", which started the conversation on how this could be implemented.
 +  - **Xinchen Hui**, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but "with PHP7's new zend_string, and string flags, the implementation will become easier" [[https://news-web.php.net/php.internals/87396|source]].
  
rfc/is_literal.txt · Last modified: 2022/02/14 00:36 by craigfrancis