Table of Contents

Automatic SQL Injection Protection

Background

SQL Injection vulnerabilities are a scourge on Internet security. They remain the Number 1 exploited vulnerability category online according to OWASP, and as many as two-thirds of US company data-breaches are ultimately due to SQL injection attacks.

Given the severity and harm of SQL-injections, it may be surprising to hear that defending against SQL injection in the general case is a completely solved problem: if the website ensures that the dynamic parts of every single SQL query are sent as arguments to a constant prepared statement (aka “parameterized queries”), the website is immediately and provably invulnerable to SQL injection via the application.

But despite how easy this sounds in theory, hackers have one enormous advantage over developers: hackers need only find a single SQL-injection vulnerability in a website to fully compromise the server. Developers, in contrast, must successfully and systematically defend every single SQL query against an injection attack. That includes SQL in old code that nobody has looked at in years, SQL in third-party libraries and extensions, and SQL in that debug page that was never supposed to be uploaded to production websites, but somehow managed to get there nonetheless.

And unlike hackers, developers are often given lots of bad information about how to defend against SQL injection. Many online guides will tell you to sanitize your queries with frighteningly broken tools - from regexes, to firewalls, to “addslashes”. Even using the correct "SQL escape" function can be totally vulnerable to SQL injection. It is really easy for SQL-injections to hide in code that feels safe, but turns out not to be if you're not using parameterized queries for every dynamic SQL query.

It's easy to underestimate how attacker-controllable various strings in complex web-applications are; and how readily exploitable hidden SQL injections are with modern tools. And with some regularity, “unexploitable” SQL injections now become exploitable later, often when code is refactored and attacker-controllable strings find new routes to SQL statements buried deep inside the application.

Every big website that gets breached via a SQL injection attack thought that they were cleverer than hackers, and had invented a new way to defend against SQL injection attacks so they “didn't need” to use parameterized SQL queries systematically. And the damage caused to their company and users when they inevitably get breached is totally disproportionate to the amount of effort in simply making sure that they are systematically secure right from the start.

This has to end.

This proposal is to add a new capability to PHP so that it can track vulnerable calls to SQL-based functions. This allows PHP to distinguish strings that might be “tainted” by remote data (such as a $_REQUEST or database query result) from strings that are not (such as strings constructed from configuration files and string literals), and detect when SQL-queries are unsafely constructed with remote data, or even when parameterized SQL queries use a non-constant query string.

When PHP sees a vulnerable query, it can then ignore, log, or take evasive action based on how the website administrator has configured their PHP.INI settings. Companies can then block all SQL injections across their entire website - even in extensions they haven't reviewed and files they've forgotten about - with a one-line change in PHP.INI. Alternatively, they can opt-out of SQL-injection protection entirely, and run their existing code with no modifications or notices.

I've hosted a proof of concept showing a vulnerable application here. The code you see is the code that's running. If you find the SQL injection vulnerability (spoiler), you'll see that the custom build of PHP blocks it from being exploited.

Proposal (overview)

This proposal is to modify the “zend_string” type in Zend Core to “track” whether a string is statically constructed from string literals (like “SELECT VERSION()”) or concatenations of string literals (like $config['table']='foo'; “SELECT * from {$config['table']}”), or is dynamically generated in a way that might allow an attacker to control its value. In this proposal we refer to these statically constructed strings as “safeconsts”.

When a SQL builtin function (e.g. “mysql_query”) runs, it can then “ask” the SQL query string whether it is a “safeconst” string. If it is, we know that the SQL query can be safely sent to the SQL database without fear that the SQL query has not been injected with malicious SQL code from a hacker. If it is *not* a safeconst, we either log, emit an E_WARNING and continue, or throw a “SecurityError exception” and abort the request, depending on the content of PHP.INI.

SafeConst

The logic for determining “SafeConst”-ness is as follows:

By design, SafeConstness is *not applied to the output of SQL-escape functions. Consider the following query:

mysqli_query("SELECT * from USER where ID=" . mysqli_escape_query($_GET["userid"])) 

This can be injected via page.php?userid=1%20INJECT_HERE--. Even using correct SQL-escapes within quotes can can be injected in a way PHP cannot verify at runtime

mysqli_query("SELECT * from USER where ID='" . mysqli_escape_query($_GET["userid"]) . "'")

It is a feature, not a bug, of this proposal that we are deliberately encouraging developers to parameterize, rather than escape their SQL queries.

Example SQL-queries that are recognized as safe

A lot of the complexity in this proposal is in making sure that we avoid over-reporting potential SQL-injections when developers write legitimate and non-injectable code.

The following SQL queries are all correctly detected as “safe”:

Unparameterized queries are fine, so long as the query is constant:

mysqli_query($link, "SELECT * FROM TABLE");

Building SQL strings is fine, so long as the components are constant:

define("TABLE_NAME", "USERTABLE");
...
mysqli_query($link, "SELECT * FROM " . TABLE_NAME);

We also don't mind if remote data influences control-flow that constructs the query, so long as it doesn't influence the string data:

if($_GET["usertype"] == "admin")
  $usertype = ADMIN";
else
  $usertype = user";
..
mysqli_query($link, "SELECT * FROM users WHERE USERTYPE='$usertype'");

This is also fine:

$config = array("name" => "tablename");
mysqli_query($link, "SELECT * FROM ". $config["name"] ." WHERE usertype=admin");

Even this is fine:

if($_GET["isdebug"])
  $config = include("debug-settings.php");
else
  $config = include("release-settings.php");
mysqli_query($link, "SELECT * from ". $config["tablename"]);

PHP will even keep track of “SafeConst”ness even in quite complex scenarios, and only block the unsafe uses:

$option = array( "0", $_GET["id"]);
$id = ($_GET["opt"] == "1")? $option[1] : $option[0];
mysqli_query($link, "SELECT * from table where id=".$id);
// page.php?opt=0 <- fine
// page.php?opt=1&id=1 <- throws SecurityError
// page.php?opt=1&id=1%20INJECT <- SecurityError blocks SQL injection attack

Even SQL-queries that would normally involve complex inter-function analysis to prove are safe are (correctly) identified as safe when the control-flow guarantees the resulting query is constant:

function query_car_by_color($color) {
  return mysqli_query($link, "SELECT * FROM cars WHERE color='".$color."'");
}
...
query_car_color("red"); // fine, no warnings.

Examples of queries detected as "unsafe"

mysqli_query("SELECT * From Users Where NAME='". $_GET["name"] . "'");

mysqli_query("SELECT * From Users Where ID=". mysqli_escape_string($_GET["id"]) . "");

We detect when code-refactoring might expose a new bug. For example, if the old code

function query_car_by_color($color) {
  return mysqli_query($link, "SELECT * FROM cars WHERE color='".$color."'");
}
// Fix: Old version only showed red cars. Let's let the user decide
// - query_car_color("red");
query_car_color($_GET["color"]); // now flags as dangerous even when ?color=red, because "red" is a safe-const but $_GET["color"] is not.

Behavior when discovering unsafe use of SQL

The behavior of the feature when it detects that the query string in an unparameterized query, or the parameter-string in a parameterized query is not constant will be configurable via PHP.INI:

“Legacy vulnerable” mode is designed for companies and users who upgrade to PHP7 and need their application to work exactly as before. This will not be the default option, and website administrators will need to explicitly acknowledge that their website may be vulnerable to SQL-injections to enable this mode.

“Hardened mode” is designed for production servers of companies and users who want to keep their server secure. If PHP discovers a potentially vulnerable use of a SQL query, it conservatively blocks the query, implicitly prioritizing user data safety and server integrity over the possibility that rarely visited code-branches might now throw an Error.

“Warning mode” is designed as an interim measure for developers who are working towards implementing “hardened mode” on their production servers, but whose code includes too many SQL statements to be upgraded in one go. In this mode, developers are alerted to where vulnerable SQL statements might be in their codebase, but does not prevent the website from working in the mean-time.

Edge-case: SQL-injection as a feature

Some applications explicitly and deliberately expose SQL-management functionality to a user (for example: PHP-MyAdmin allows logged-in administrators to run SQL queries directly against the website).

The SQL-injection detection logic will (correctly) notice that such queries are allowing a remote user to run queries against the database. In order to support this edge-case feature, we will introduce an optional “allow-dangerous” flag to some SQL-builtin functions; for example:

if(is_adminstrator() && check_form_csrf_token())
  mysql_query($_POST["sql_query"], SECURITY_DISABLE_SQL_INJECTION_PROTECTION);

This design makes developers think twice about disabling the protection, and explicitly acknowledging and accepting the security consequences of doing so. It is also by-design an easy string to search for during a security audit.

Important consideration: behavior of string-interning

SafeConsts and non-SafeConsts can still be string-interned, but it is a security bug if the “SafeConst”-ness of a string can “bleed” over to another string of the same value. This proposal also involves adding code to make sure this never happens, and unit-tests to verify that it does not happen.

Proposal (internals plan)

Implementing this proposal is comprised of the following concrete steps:

Significant Changes

Non-goals of this proposal

For the avoidance of doubt, this proposal does not, and does not attempt to:

Proposed PHP Version(s)

Hopefully introduced in PHP 7, or soon after.

RFC Impact

To SAPIs

Describe the impact to CLI, Development web server, embedded PHP etc.

Requires all SAPIs to be recompiled because of an ABI change. Otherwise, no real difference.

To Existing Extensions

Requires all existing extensions to be recompiled because of an ABI change. Functions that expose SQL-query builtin functions to the user will be altered to call the zend security function if their query parameter does not have the “sql-safe-constant” flag set on their string. Otherwise no changes.

To Opcache

I've not yet verified how the change impacts opcache.

php.ini Defaults

There is one new change to PHP.INI that allows configuring to behavior of the SQL-injection-protection code when it detects a possible injection.

Open Issues

Make sure there are no open issues when the vote starts!

Unaffected PHP Functionality

Websites that already adopt security-best-practice of only issuing dynamic queries to their SQL database via parameterized SQL statements with constant parameter-strings will see no change when this proposal is adopted.

Future Scope

This sections details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Proposed Voting Choices

This does not introduce any syntax changes to the PHP language, and therefore requires a 50%+1 vote to be adopted.

Patches and Tests

Links to any external patches and tests go here.

If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.

Make it clear if the patch is intended to be the final patch, or is just a prototype.

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Links to external references, discussions or RFCs

Rejected Features

Keep this updated with features that were discussed on the mail lists.