PHP RFC: Is Literal Check
- Version: 0.3
- Date: 2020-03-21
- Updated: 2021-02-19
- Author: Craig Francis, craig#at#craigfrancis.co.uk
- Status: Draft
- First Published at: https://wiki.php.net/rfc/is_literal
- GitHub Repo: https://github.com/craigfrancis/php-is-literal-rfc
Add an is_literal() function, so developers/frameworks can check if a given variable is safe.
As in, at runtime, being able to check if a variable has been created by literals, defined within a PHP script, by a trusted developer.
This check can be used to warn or completely block (by default) many, if not all, injection based vulnerabilities.
$users = $queryBuilder ->select('u') ->from('User', 'u') ->where('u.id = ' . $_GET['id']) ->getQuery() ->getResult();
Or Twig could project against HTML Injection vulnerabilities (aka XSS), like this flawed HTML Template:
echo $twig->createTemplate('<p>Hi ' . $_GET['name'] . '</p>')->render();
Literals are safe values, defined within the PHP script, for example:
is_literal('Example'); // true $a = 'Example'; is_literal($a); // true is_literal(4); // true is_literal(0.3); // true is_literal('a' . 'b'); // true, compiler can concatenate $a = 'A'; $b = $a . ' B ' . 3; is_literal($b); // true, ideally (more details below) is_literal($_GET['id']); // false is_literal(rand(0, 10)); // false is_literal(sprintf('LIMIT %d', 3)); // false $c = count($ids); $a = 'WHERE id IN (' . implode(',', array_fill(0, $c, '?')) . ')'; is_literal($a); // true, the one exception that involves functions.
Ideally string concatenation would be allowed, but Danack suggested this might raise performance concerns, and an array implode like function could be used instead (or a query builder).
Thanks to NikiC, it looks like we can reuse the GC_PROTECTED flag for strings.
As an aside, Xinchen Hui found the Taint extension was complex in PHP5, but “with PHP7's new zend_string, and string flags, the implementation will become easier”. Also, MarkR suggested that it might be possible to use the fact that “interned strings in PHP have a flag”, which is there because these “can't be freed”.
Unlike the Taint extension, there must not be an equivalent untaint() function, or support any kind of escaping.
As noted be Tyson Andre, it might be possible to use static analysis, for example psalm. But I can't find any which do these checks by default, can be incomplete, they are likely to miss things (especially at runtime), and we can't expect all programmers to use static analysis (especially those who are new to programming, who need this more than developers who know the concepts and just make the odd mistake).
- “unfiltered input can affect way more than only SQL” (Pierre Joye);
- this amount of work isn't ideal for “just for one use case” (Julien Pauli);
I also agree that “SQL injection is almost a solved problem [by using] prepared statements” (Scott Arciszewski), but we still is_literal() to identify mistakes.
Backward Incompatible Changes
Proposed PHP Version(s)
To Existing Extensions
- Name it something else? Jakob Givoni suggested is_from_literal(); or maybe is_safe().
- Would this cause performance issues?
- Can array_fill()+implode() pass though the “is_literal” flag for the “WHERE IN” case?
- Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), they might need to make some changes to use this check, as originally noted by Dennis Birkholz.
Unaffected PHP Functionality
As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar functions (mysqli_query, preg_match, exec, etc). But the basic idea can be used immediately by frameworks and general abstraction libraries, and they can give feedback for future work.
For example, a project could require the second argument for pg_query() only accept literals or their query_builder object (which provides a __toString method); and that any output (print, echo, readfile, etc) must use the html_output object that's returned by their trusted HTML Templating system (using ob_start() might be useful here).
Phase 3 could set a default of 'only literals' for all of the relevant PHP function arguments, so developers are given a warning, and later prevented (via an exception), when they provide an unsafe value to those functions (they could still specify that unsafe values are allowed, e.g. phpMyAdmin).
Proposed Voting Choices
Patches and Tests