PHP RFC: Is Literal Check


A new function, is_literal(string $string), to identify variables that have been created from a programmer defined string.

This takes the concept of “taint checking” and makes it simpler and stricter.

It does not allow a variable to be marked as untainted, and it does not allow escaping (important).

For example, take a database library that supports parametrised queries at the driver level, today a programmer could use either of these:

$db->query('SELECT * FROM users WHERE id = ?', [$_GET['id']]);
$db->query('SELECT * FROM users WHERE id = ' . $_GET['id']); // INSECURE

If the library only accepted a literal SQL string (written by the programmer), and simply rejected the second example (not written as a literal), the library can provide an “inherently safe API”.

This definition of an “inherently safe API” comes from Christoph Kern, who did a talk in 2016 about Preventing Security Bugs through Software Design (also at USENIX Security 2015), which covers how this is used at Google. The idea is that we “Don't Blame the Developer, Blame the API”; where we need to put the burden on libraries (written once, used by many) to ensure that it's impossible for the developer to make these mistakes.

By adding a way for libraries to check if the strings they receive came from the developer (from trusted PHP source code), it allows the library to check they are being used in a safe way.


The OWASP Top 10 lists common vulnerabilities sorted by prevalence, exploitability, detectability, and impact. Each ranked out of 3.

A1: Injection - common prevalence (2), easy for attackers to detect/exploit (3), severe impact (3).

A7: XSS - widespread prevalence (3), easy for attackers to detect/exploit (3), moderate impact (2).

And these two have always been listed: 2003 (A6/A4), 2004 (A6/A4), 2007 (A2/A1), 2010 (A1/A2), 2013 (A1/A3), 2017 (A1/A7).

It's because these mistakes are very easy to make, and hard to identify - is_literal() directly addresses this problem.


The Doctrine Query Builder allows a custom WHERE clause to be provided as a string. This is intended for use with literals and placeholders, but does not protect against this simple mistake:

   ->from('User', 'u')
   ->where('u.id = ' . $_GET['id'])

The definition of the where() method could check with is_literal() and throw an exception, advising the programmer to replace it with a safer use of placeholders:

   ->from('User', 'u')
   ->where('u.id = :identifier')
   ->setParameter('identifier', $_GET['id']);

Similarly, Twig allows loading a template from a string, which could allow accidentally skipping the default escaping functionality:

echo $twig->createTemplate('<p>Hi ' . $_GET['name'] . '</p>')->render();

If createTemplate() checked with is_literal(), the programmer could be advised to write this instead:

echo $twig->createTemplate('<p>Hi {{ name }}</p>')->render(['name' => $_GET['name']]);

Failed Solutions


Developer training has not worked, it simply does not scale (people start programming every day), and learning about every single issue is difficult.

Keeping in mind that programmers will frequently do just enough to complete their task (busy), where they often copy/paste a solution to their problem they find online (risky), modify it for their needs (risky), then move on.

We cannot keep saying they 'need to be careful', and relying on them to never make a mistake.


Escaping is hard, and error prone.

We have a list of common escaping mistakes.

Developers should use parameterised queries (e.g. SQL), or a well tested library that knows how to escape values based on their context (e.g. HTML).

Taint Checking

Some languages implement a “taint flag” which tracks whether values are considered “safe”.

There is a Taint extension for PHP by Xinchen Hui, and a previous RFC proposing it be added to the language by Wietse Venema.

These solutions rely on the assumption that the output of an escaping function is safe for a particular context. This sounds reasonable in theory, but the operation of escaping functions, and the context for which their output is safe, are very hard to define. This leads to a feature that is both complex and unreliable.

This proposal avoids the complexity by addressing a different part of the problem: separating inputs supplied by the programmer, from inputs supplied by the user.

Static Analysis

While I agree with Tyson Andre, it is highly recommended to use Static Analysis.

But they nearly always focus on other issues (type checking, basic logic flaws, code formatting, etc).

Those that attempt to address injection vulnerabilities, do so via Taint Checking (see above), and are often incomplete.

For a quick example, psalm, even in its strictest errorLevel (1), and/or running --taint-analysis, will not notice the missing quote marks in this SQL, and will incorrectly assume this is perfectly safe:

$db = new mysqli('...');
$id = (string) ($_GET['id'] ?? 'id'); // Keep the type checker happy.
$db->prepare('SELECT * FROM users WHERE id = ' . $db->real_escape_string($id));

When psalm comes to taint checking the usage of a library (like Doctrine), it assumes all methods are safe, because none of them note the sinks (and even if they did, you're back to escaping being an issue).

But the biggest problem is that Static Analysis is simply not used by most developers, especially those who are new to programming (usage tends to be higher by those writing well tested libraries).


This RFC proposes adding four functions:

  • is_literal(string $string): bool to check if a variable represents a value written into the source code or not.
  • literal_implode(string $glue, array $pieces): string - implode an array of literals, with a literal.
  • literal_combine(string $piece, string ...$pieces): string - allow concatenating literal strings.
  • literal_sprintf(string $format, string ...$values): string - a version of sprintf that uses literals.

A literal is defined as a value (string) which has been written by the programmer. The value may be passed between functions, as long as it is not modified in any way.

is_literal('Example'); // true
$a = 'Hello';
$b = 'World';
is_literal($a); // true
is_literal($a . $b); // TBC, details below.
$c = literal_combine($a, $b);
is_literal($c); // true
is_literal($_GET['id']); // false
is_literal('WHERE id = ' . intval($_GET['id'])); // false
is_literal(rand(0, 10)); // false
is_literal(sprintf('LIMIT %d', 3)); // false

There is no way to manually mark a string as a literal (i.e. no equivalent to untaint()); as soon as the value has been manipulated in any way, it is no longer marked as a literal.

Previous Work

Google uses “compile time constants” in Go, which isn't as good as a run time solution (e.g. the WHERE IN issue), but it works, and is used by go-safe-html and go-safesql.

Google also uses Error Prone in Java to augment the compiler's type analysis, where @CompileTimeConstant ensures method parameters can only use “compile-time constant expressions” (this isn't a complete solution either).

Perl has a Taint Mode, via the -T flag, where all input is marked as “tainted”, and cannot be used by some methods (like commands that modify files), unless you use a regular expression to match and return known-good values (where regular expressions are easy to get wrong).

JavaScript might get isTemplateObject to “Distinguishing strings from a trusted developer from strings that may be attacker controlled” (intended to be used with Trusted Types).

As noted above, there is the Taint extension for PHP by Xinchen Hui.

And there is the Automatic SQL Injection Protection RFC by Matt Tait, where this RFC uses a similar concept of the SafeConst. When Matt's RFC was being discussed, it was noted:

  • “unfiltered input can affect way more than only SQL” (Pierre Joye);
  • this amount of work isn't ideal for “just for one use case” (Julien Pauli);
  • It would have effected every SQL function, such as mysqli_query(), $pdo->query(), odbc_exec(), etc (concerns raised by Lester Caine and Anthony Ferrara);
  • Each of those functions would need a bypass for cases where unsafe SQL was intentionally being used (e.g. phpMyAdmin taking SQL from POST data) because some applications intentionally “pass raw, user submitted, SQL” (Ronald Chmara 1/2).

I also agree that “SQL injection is almost a solved problem [by using] prepared statements” (Scott Arciszewski), and this is where is_literal() can be used to check that no mistakes are made when using prepared statements.


By libraries:

class db {
  protected $level = 2; // Probably should default to 1 at first.
  function literal_check($var) {
    if (function_exists('is_literal') && !is_literal($var)) {
      if ($this->level === 0) {
        // Programmer aware, and is choosing to bypass this check.
      } else if ($this->level === 1) {
        trigger_error('Non-literal detected!', E_USER_WARNING);
      } else {
        throw new Exception('Non-literal detected!');
  function unsafe_disable_injection_protection() {
    $this->level = 0;
  function where($sql, $parameters = []) {
    // ...
$db->where('id = ?'); // OK
$db->where('id = ' . $_GET['id']); // Exception thrown

Table and Fields in SQL, which cannot use parameters; for example ORDER BY:

$order_fields = [
$order_id = array_search(($_GET['sort'] ?? NULL), $order_fields);
$sql = literal_combine(' ORDER BY ', $order_fields[$order_id]);

Undefined number of parameters; for example WHERE IN:

function where_in_sql($count) { // Should check for 0
  $sql = [];
  for ($k = 0; $k < $count; $k++) {
    $sql[] = '?';
  return literal_implode(',', $sql);
$sql = literal_combine('WHERE id IN (', where_in_sql(count($ids)), ')');



Literal string is the standard name for strings in source code. See Google.

A string literal is the notation for representing a string value within the text of a computer program. In PHP, strings can be created with single quotes, double quotes or using the heredoc or the nowdoc syntax...

Alternatives suggestions have included is_from_literal() from Jakob Givoni. I think is_safe_string() might be asking for trouble. Other terms have included “compile time constants” and “code string”.

Supporting Int/Float/Boolean values.

When converting to string, they aren't guaranteed (and often don't) have the exact same value they have in source code.

For example, TRUE and true when cast to string give “1”.

It's also a very low value feature, where there might not be space for a flag to be added.

Supporting Concatenation

This is the big question.

Máté Kocsis has done some primary testing on supporting string concat, and found a 0.124% performance hit for the Laravel Demo app, 0.161% for Symfony, and a more severe -3.719% when running this concat test.

In my own simplistic testing, where I included a basic version that did not support string concat. The results found:

  Laravel Demo App: +0.30% with, vs +0.18% without concat.
  Symfony Demo App: +0.06% with, vs +0.06% without concat.
  My Concat Test:   +4.36% with, vs +2.23% without concat.

In my basic test, I used a RAM Disk, and disabled the processors Turbo Boost. With the Demo Apps, I used /sapi/cgi/php-cgi “-T10” to get the timings (so would include the compilation), and /sapi/cli/php for My Concat Test.

There is still a small impact without concat because the concat_function() in “zend_operators.c” uses zend_string_extend() (where the literal flag needs to be removed). And in “zend_vm_def.h”, it has a similar version; and supports a quick concat with an empty string, which doesn't create a new variable (x2) and would need it's flag removed as well.

Technically string concat isn't needed for most libraries, like an ORM or Query Builder, where their methods nearly always take a small literal string. But it would make adoption of is_literal() easier for existing projects that are currently using string concat for their SQL, HTML Templates, etc.

And supporting runtime concat would make the literal check easier to understand, as it would be consistent (e.g. compiler vs runtime concat, where the compiler can concat two strings to create a single literal that has the literal flag set).

The non-concat version would use literal_combine() or literal_implode() as special functions to avoid most of the work during runtime contact. Where Dan Ackroyd notes that these functions would make it easier to identify exactly where mistakes are made, rather than it being picked up at the end of a potentially long script, after multiple string concatenations, e.g.

$sortOrder = 'ASC';
// 300 lines of code, or multiple function calls
$sql .= ' ORDER BY name ' . $sortOrder;
// 300 lines of code, or multiple function calls

If a developer changed the literal 'ASC' to $_GET['order'], the error raised by $db->query() would not be clear where the mistake was made. Whereas using literal_combine() highlights exactly where the issue happened:

$sql = literal_combine($sql, ' ORDER BY name ', $sortOrder);



See the section above.

Values from INI/JSON/YAML

As noted by Dennis Birkholz, Systems/Frameworks that define certain variables (e.g. table name prefixes) without the use of a literal (e.g. ini/json/yaml files), might need to make some changes to use this feature (depending on where they use the is_literal check).

Existing String Functions

Trying to determine if the is_literal flag should be passed through functions like str_repeat(), or substr() etc is difficult. Having a security feature be difficult to reason about, gives a much higher chance of mistakes.

For any use-case where dynamic strings are required, it would be better to build those strings with an appropriate query builder, or by using literal_combine()/literal_implode().

Backward Incompatible Changes

No known BC breaks, except for code-bases that already contain userland functions is_literal(), literal_implode() or literal_combine().

Proposed PHP Version(s)

PHP 8.1

RFC Impact


None known

To Existing Extensions

Not sure

To Opcache

Not sure

Open Issues


Unaffected PHP Functionality

None known

Future Scope

As noted by MarkR, the biggest benefit will come when it can be used by PDO and similar functions (mysqli_query, preg_match, exec, etc). But the basic idea can be used immediately by frameworks and general abstraction libraries, and they can give feedback for future work.

Phase 2 could introduce a way for programmers to specify certain PHP function/method arguments can only accept literals, and/or specific value-objects their project trusts (this idea comes from Trusted Types in JavaScript).

For example, a project could require the second argument for pg_query() only accept literals or their query_builder object (which provides a __toString method); and that any output (print, echo, readfile, etc) must use the html_output object that's returned by their trusted HTML Templating system (using ob_start() might be useful here).

Phase 3 could set a default of 'only literals' for all of the relevant PHP function arguments, so developers are given a warning, and later prevented (via an exception), when they provide an unsafe value to those functions (they could still specify that unsafe values are allowed, e.g. phpMyAdmin).

And, for a bit of silliness (Spaß ist verboten), MarkR would like a is_figurative() function (functionality to be confirmed).

Proposed Voting Choices

Accept the RFC. Yes/No

Patches and Tests



Dan Ackroyd has started an implementation, which uses functions like literal_combine() to avoid performance concerns.

Joe Watkins has created an implementation which supports string concat at runtime.



Rejected Features



  1. Dan Ackroyd, DanAck, for starting the first implementation (which made this a reality), and followup on the version that uses functions instead of string concat.
  2. Joe Watkins, krakjoe, for finding how to set the literal flag (tricky), and creating the implementation that supports string concat.
  3. Máté Kocsis, mate-kocsis, for setting up and doing the performance testing.
  4. Rowan Tommins, IMSoP, for re-writing this RFC to focus on the key features, and putting it in context of how it can be used by libraries.
  5. Nikita Popov, NikiC, for suggesting where the literal flag could be stored. Initially this was going to be the “GC_PROTECTED flag for strings”, which allowed Dan to start the first implementation.
  6. Mark Randall, MarkR, for alternative ideas, and noting that “interned strings in PHP have a flag”, which started the conversation on how this could be implemented.
  7. Xinchen Hui, who created the Taint Extension, allowing me to test the idea; and noting how Taint in PHP5 was complex, but “with PHP7's new zend_string, and string flags, the implementation will become easier” source.
