Automatic SQL Injection Protection

Version: 0.9
Date: 2015-07-22
Author: Matt Tait, matttait#at#google.com
Status: Under discussion
First Published at: http://wiki.php.net/rfc/sql_injection_protection
You can |try it online (spoiler)

Background

SQL Injection vulnerabilities are a scourge on Internet security. They remain the Number 1 exploited vulnerability category online according to OWASP, and as many as two-thirds of US company data-breaches are ultimately due to SQL injection attacks.

Given the severity and harm of SQL-injections, it may be surprising to hear that defending against SQL injection in the general case is a completely solved problem: if the website ensures that the dynamic parts of every single SQL query are sent as arguments to a constant prepared statement (aka “parameterized queries”), the website is immediately and provably invulnerable to SQL injection via the application.

But despite how easy this sounds in theory, hackers have one enormous advantage over developers: hackers need only find a single SQL-injection vulnerability in a website to fully compromise the server. Developers, in contrast, must successfully and systematically defend every single SQL query against an injection attack. That includes SQL in old code that nobody has looked at in years, SQL in third-party libraries and extensions, and SQL in that debug page that was never supposed to be uploaded to production websites, but somehow managed to get there nonetheless.

And unlike hackers, developers are often given lots of bad information about how to defend against SQL injection. Many online guides will tell you to sanitize your queries with frighteningly broken tools - from regexes, to firewalls, to “addslashes”. Even using the correct "SQL escape" function can be totally vulnerable to SQL injection. It is really easy for SQL-injections to hide in code that feels safe, but turns out not to be if you're not using parameterized queries for every dynamic SQL query.

It's easy to underestimate how attacker-controllable various strings in complex web-applications are; and how readily exploitable hidden SQL injections are with modern tools. And with some regularity, “unexploitable” SQL injections now become exploitable later, often when code is refactored and attacker-controllable strings find new routes to SQL statements buried deep inside the application.

Every big website that gets breached via a SQL injection attack thought that they were cleverer than hackers, and had invented a new way to defend against SQL injection attacks so they “didn't need” to use parameterized SQL queries systematically. And the damage caused to their company and users when they inevitably get breached is totally disproportionate to the amount of effort in simply making sure that they are systematically secure right from the start.

This has to end.

This proposal is to add a new capability to PHP so that it can track vulnerable calls to SQL-based functions. This allows PHP to distinguish strings that might be “tainted” by remote data (such as a $_REQUEST or database query result) from strings that are not (such as strings constructed from configuration files and string literals), and detect when SQL-queries are unsafely constructed with remote data, or even when parameterized SQL queries use a non-constant query string.

When PHP sees a vulnerable query, it can then ignore, log, or take evasive action based on how the website administrator has configured their PHP.INI settings. Companies can then block all SQL injections across their entire website - even in extensions they haven't reviewed and files they've forgotten about - with a one-line change in PHP.INI. Alternatively, they can opt-out of SQL-injection protection entirely, and run their existing code with no modifications or notices.

I've hosted a proof of concept showing a vulnerable application here. The code you see is the code that's running. If you find the SQL injection vulnerability (spoiler), you'll see that the custom build of PHP blocks it from being exploited.

Proposal (overview)

This proposal is to modify the “zend_string” type in Zend Core to “track” whether a string is statically constructed from string literals (like “SELECT VERSION()”) or concatenations of string literals (like $config['table']='foo'; “SELECT * from {$config['table']}”), or is dynamically generated in a way that might allow an attacker to control its value. In this proposal we refer to these statically constructed strings as “safeconsts”.

When a SQL builtin function (e.g. “mysql_query”) runs, it can then “ask” the SQL query string whether it is a “safeconst” string. If it is, we know that the SQL query can be safely sent to the SQL database without fear that the SQL query has not been injected with malicious SQL code from a hacker. If it is *not* a safeconst, we either log, emit an E_WARNING and continue, or throw a “SecurityError exception” and abort the request, depending on the content of PHP.INI.

SafeConst

The logic for determining “SafeConst”-ness is as follows:

Any string literal (i.e. a T_STRING) is a safeconst string if it does not include any auto-eval parts (i.e. “Hello World” is a safeconst).
A string literal with auto-eval parts is a safeconst if every auto-eval part evaluates to a safeconst (i.e. $name=“World”, “Hello $name” is a safeconst, but “Hello {$_GET['q']}” is not a safeconst).
If $i is an Integer or FloatingPoint type, its string promotion is a safeconst string.
If $i is an object type, its string promotion is a safeconst string if $i->__toString() returns a safeconst string.
$i.$j is a safeconst only if $i and $j are both string types that are safeconsts, or are promoted to string types that are safeconsts.
Defined variables are safeconsts if the value they were constructed from was a safeconst. I.e define(“TABLE_NAME”, “mytable”); $i=TABLE_NAME yields a safeconst in $i, but define(“TABLE_NAME”, $_GET[“table”]); $i=TABLE_NAME does not yield a safeconst.
Safeconstness follows a string instance around, i.e. assigning a value into an array (by index or name), object, function parameter, return slot, local variable or global variable does not affect the SafeConst-ness of that variable. For example, mysqli_query(“SELECT * FROM ” . $config[“tablename”]) is fine, so long as $config[“tablename”] was populated with a SafeConst.
Safeconstness does NOT follow string instance by value. For example $i = “Hello”; $i = $_GET[“q”] leaves $i as *not* a safeconst, even if $_GET[“q”] holds the value “Hello”.

By design, SafeConstness is *not applied to the output of SQL-escape functions. Consider the following query:

mysqli_query("SELECT * from USER where ID=" . mysqli_escape_query($_GET["userid"]))

This can be injected via page.php?userid=1%20INJECT_HERE--. Even using correct SQL-escapes within quotes can can be injected in a way PHP cannot verify at runtime

mysqli_query("SELECT * from USER where ID='" . mysqli_escape_query($_GET["userid"]) . "'")

It is a feature, not a bug, of this proposal that we are deliberately encouraging developers to parameterize, rather than escape their SQL queries.

Example SQL-queries that are recognized as safe

A lot of the complexity in this proposal is in making sure that we avoid over-reporting potential SQL-injections when developers write legitimate and non-injectable code.

The following SQL queries are all correctly detected as “safe”:

Unparameterized queries are fine, so long as the query is constant:

mysqli_query($link, "SELECT * FROM TABLE");

Building SQL strings is fine, so long as the components are constant:

define("TABLE_NAME", "USERTABLE");
...
mysqli_query($link, "SELECT * FROM " . TABLE_NAME);

We also don't mind if remote data influences control-flow that constructs the query, so long as it doesn't influence the string data:

if($_GET["usertype"] == "admin")
  $usertype = ADMIN";
else
  $usertype = user";
..
mysqli_query($link, "SELECT * FROM users WHERE USERTYPE='$usertype'");

This is also fine:

$config = array("name" => "tablename");
mysqli_query($link, "SELECT * FROM ". $config["name"] ." WHERE usertype=admin");

Even this is fine:

if($_GET["isdebug"])
  $config = include("debug-settings.php");
else
  $config = include("release-settings.php");
mysqli_query($link, "SELECT * from ". $config["tablename"]);

PHP will even keep track of “SafeConst”ness even in quite complex scenarios, and only block the unsafe uses:

$option = array( "0", $_GET["id"]);
$id = ($_GET["opt"] == "1")? $option[1] : $option[0];
mysqli_query($link, "SELECT * from table where id=".$id);
// page.php?opt=0 <- fine
// page.php?opt=1&id=1 <- throws SecurityError
// page.php?opt=1&id=1%20INJECT <- SecurityError blocks SQL injection attack

Even SQL-queries that would normally involve complex inter-function analysis to prove are safe are (correctly) identified as safe when the control-flow guarantees the resulting query is constant:

function query_car_by_color($color) {
  return mysqli_query($link, "SELECT * FROM cars WHERE color='".$color."'");
}
...
query_car_color("red"); // fine, no warnings.

Examples of queries detected as "unsafe"

mysqli_query("SELECT * From Users Where NAME='". $_GET["name"] . "'");

mysqli_query("SELECT * From Users Where ID=". mysqli_escape_string($_GET["id"]) . "");

We detect when code-refactoring might expose a new bug. For example, if the old code

function query_car_by_color($color) {
  return mysqli_query($link, "SELECT * FROM cars WHERE color='".$color."'");
}
// Fix: Old version only showed red cars. Let's let the user decide
// - query_car_color("red");
query_car_color($_GET["color"]); // now flags as dangerous even when ?color=red, because "red" is a safe-const but $_GET["color"] is not.

Behavior when discovering unsafe use of SQL

The behavior of the feature when it detects that the query string in an unparameterized query, or the parameter-string in a parameterized query is not constant will be configurable via PHP.INI:

“Legacy vulnerable” mode is designed for companies and users who upgrade to PHP7 and need their application to work exactly as before. This will not be the default option, and website administrators will need to explicitly acknowledge that their website may be vulnerable to SQL-injections to enable this mode.

“Hardened mode” is designed for production servers of companies and users who want to keep their server secure. If PHP discovers a potentially vulnerable use of a SQL query, it conservatively blocks the query, implicitly prioritizing user data safety and server integrity over the possibility that rarely visited code-branches might now throw an Error.

“Warning mode” is designed as an interim measure for developers who are working towards implementing “hardened mode” on their production servers, but whose code includes too many SQL statements to be upgraded in one go. In this mode, developers are alerted to where vulnerable SQL statements might be in their codebase, but does not prevent the website from working in the mean-time.

Edge-case: SQL-injection as a feature

Some applications explicitly and deliberately expose SQL-management functionality to a user (for example: PHP-MyAdmin allows logged-in administrators to run SQL queries directly against the website).

The SQL-injection detection logic will (correctly) notice that such queries are allowing a remote user to run queries against the database. In order to support this edge-case feature, we will introduce an optional “allow-dangerous” flag to some SQL-builtin functions; for example:

if(is_adminstrator() && check_form_csrf_token())
  mysql_query($_POST["sql_query"], SECURITY_DISABLE_SQL_INJECTION_PROTECTION);

This design makes developers think twice about disabling the protection, and explicitly acknowledging and accepting the security consequences of doing so. It is also by-design an easy string to search for during a security audit.

Important consideration: behavior of string-interning

SafeConsts and non-SafeConsts can still be string-interned, but it is a security bug if the “SafeConst”-ness of a string can “bleed” over to another string of the same value. This proposal also involves adding code to make sure this never happens, and unit-tests to verify that it does not happen.

Proposal (internals plan)

Implementing this proposal is comprised of the following concrete steps:

It changes the definition of zend_string to include a field for tracking whether the string is a “sql-safe constant”. The fact that a string is a “sql-safe constant” follows a value, so if a “sql-safe constant” is stored into a variable, and then loaded back, the variable can still be safely passed to SQL-query functions.
The compiler is edited to automatically mark string literals (i.e T_STRING) as “sql-safe constants”.
The various internal string-concatenation functions are altered so that if two “sql-safe constants” are concatenated, the result is itself a “sql-safe constant”. This occurs regardless of whether the strings are concatenated during the “compile pass” of the compiler, or via the runtime implementation of the concatenation opcode in the compiled page.
The string duplication function so that duplicating a “sql-safe constant” yields another “sql-safe constant”. This allows “sql-safe constants” to be put in arrays, objects, locals, fields, and DEFINE() calls without losing their “sql-safeness”.
A new Error type “SecurityError” is added to the PHP built-in types.
A new PHP.INI entry is introduced (tbd) to allow developers to configure how this feature should behave when a potentially SQL-injectable query is performed.
A new C function is added to ZendCore to allow plugins and extensions to report a potential security alert. This allows all of the configuration of what to do with security warnings to be centralized to a single place.
The various SQL-extensions that provide built-in SQL functions are changed to validate that their query parameter is safe, and call the “security alert” function in ZendCore if the parameter is not constant.

Significant Changes

The SecurityError and E_WARNING message will contain a link to explain to PHP developers why they are seeing the error, and how to secure themselves against SQL-injection using prepared-statements. Realistically this page should be hosted somewhere on PHP.NET.

This change implicitly makes all of the various “SQL-escape” functions obsolete. This is a good thing, because the “SQL-escape” give a false sense of security to users, as SQL queries that use them can still be SQL-injected in some circumstances. Users who are using SQL-escape functions should use prepared statements instead.

This change necessarily alters the zend_string structure, and hence requires extensions to be recompiled.

In order to support the special case of administrator SQL administration portals that by-design expose SQL-injection to the user as a feature (e.g. phpmyadmins' “Run SQL Statement” feature), some extensions may need to add a new flag that allows users to explicitly disable SQL-injection protection for specific queries, for example:

mysqli_query($link, $_POST[“sql_request”], UNSAFE_ALLOW_SQL_INJECTION);

Non-goals of this proposal

For the avoidance of doubt, this proposal does not, and does not attempt to:

Detect, or prevent developers from making deliberately insecure websites. The purpose is to quickly alert developers to vulnerabilities and encourage them to fix them “the right way” - i.e. via prepared statements. It does not stop, or attempt to stop deliberately inserted back-doors or malware on the server from accessing user data.
Detect, or prevent developers from SQL vulnerabilities unrelated to injection attacks. For instance, detecting and preventing destructive queries like 'mysql_query($link, “DROP TABLES”)' is a non-goal of this feature.
Detect or prevent compromise of the server through other categories of non-SQL based injection vulnerabilities, such as command-line injection, eval-injection, LDAP-injection or path-injection vulnerabilities.
Prevent malicious developers or hackers who have already compromised the server and can run arbitrary crafted PHP scripts from “gaming” the SQL detection logic to trick the SQL-injection detection logic into running malicious queries.
Prevent SQL-injection attacks that occur due to injection vulnerabilities present in other applications on the server (such as in a SQL Stored Procedure) that PHP cannot see.
Make websites magically secure from hackers using non-SQL-injection based techniques.

Proposed PHP Version(s)

Hopefully introduced in PHP 7, or soon after.

RFC Impact

To SAPIs

Describe the impact to CLI, Development web server, embedded PHP etc.

Requires all SAPIs to be recompiled because of an ABI change. Otherwise, no real difference.

To Existing Extensions

Requires all existing extensions to be recompiled because of an ABI change. Functions that expose SQL-query builtin functions to the user will be altered to call the zend security function if their query parameter does not have the “sql-safe-constant” flag set on their string. Otherwise no changes.

To Opcache

I've not yet verified how the change impacts opcache.

php.ini Defaults

There is one new change to PHP.INI that allows configuring to behavior of the SQL-injection-protection code when it detects a possible injection.

Open Issues

Make sure there are no open issues when the vote starts!

Unaffected PHP Functionality

Websites that already adopt security-best-practice of only issuing dynamic queries to their SQL database via parameterized SQL statements with constant parameter-strings will see no change when this proposal is adopted.

Future Scope

This sections details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Proposed Voting Choices

This does not introduce any syntax changes to the PHP language, and therefore requires a 50%+1 vote to be adopted.

Patches and Tests

Links to any external patches and tests go here.

If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.

Make it clear if the patch is intended to be the final patch, or is just a prototype.

Implementation

After the project is implemented, this section should contain

the version(s) it was merged to
a link to the git commit(s)
a link to the PHP manual entry for the feature

References

Links to external references, discussions or RFCs

Rejected Features

Keep this updated with features that were discussed on the mail lists.