This RFC aims at deprecating and subsequently removing the /e
modifier (PREG_REPLACE_EVAL) that
preg_replace
provides.
Quoting the PHP manual:
If this modifier is set, preg_replace() does normal substitution of backreferences in the replacement string,
evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes,
backslashes (\) and NULL chars will be escaped by backslashes in substituted backreferences.
Example:
$code = preg_replace('(<h([1-6])>(.*?)</h\1>)e', '"<h$1>" . strtoupper("$2") . "</h$1>"', $code); // uppercases headings
There are several serious issues with the /e
modifier:
As /e
evaluates arbitrary PHP code it can easily be exploited if user input is not carefully validated
or sanitized.
For example the above example can be used to execute arbitrary PHP code by passing the string
<h1>{${eval($_GET[php_code])}}</h1>
. The evaluted code in this case would be
"<h1>" . strtoupper("{${eval($_GET[php_code])}}") . "</h1>"
and as such execute any PHP code passed in the
php_code
GET variable.
An example of a larger project which suffered from such a code injection vulnerability is RoundCube (see this changeset).
A both more secure and cleaner approach is the use of
preg_replace_callback
:
preg_replace_callback( '(<h([1-6])>(.*?)</h\1>)', function ($m) { return "<h$m[1]>" . strtoupper($m[2]) . "</h$m[1]>"; }, $code );
When passing the above exploit code (<h1>{${eval($_GET[php_code])}}</h1>
) into this function the result will
just be <h1>{${EVAL($_GET[PHP_CODE])}}</h1>
and no code will be executed.
The use of the /e
modifier is not comparable to the use of eval()
. *Any* use of /e
can be replaced
with a safer and cleaner use of preg_replace_callback
. eval()
on the other hand has *some* valid use cases
which can not be implemented in any other (sane) way.
Additionally it is much easier to create a vunerability with /e
than it is with eval
. Firstly many people
don't understand PCRE well enough to distinguish whether a use of /e
is safe or not. But more importantly the
fact that addslashes() is applied to all substituted backreferences creates a false feeling of security (as seen
above it can be easily circumvented.)
The application of addslashes
on all substituted backreferences not only isn't helping security, but additionally
also results in unexpected behavior when the input contains quotes:
addslashes
always escapes both quote types, but only one of them needs escaping (e.g. in double quoted strings
only "
should be escaped and in single quoted strings only ' should be escaped). This will result in one of
the quote types to be overescaped. E.g. if
<h1>Hallo 'World'</h1>
is passed into the above function the result
would be <h1>HALLO \'WORLD\'</h1>
(note the additional backslashes).
This behavior makes /e
unusable in many cases (or people just use it anyways, without knowing that it is broken).
Various exploit scripts use the /e
modifier to obfuscate their intention. Some examples from a quick SO search:
This obfuscation hides scripts from grep
-like searches (as preg_replace
is usually considered a “good”
function). Additionally - as you can see in the second link - it is possible to obfuscate the use of the /e
modifier itself, making it even harder to find.
Obviously the use by exploit scripts is not bad per se (most exploit scripts also use echo
and we better keep
that), so don't put too much weight on this argument ;)
The /e
modifier has little to no valid uses but imposes a rather big security risk. As it can in any case be
replaced by a callback there would be no loss in functionality.
The vote ended with 23 in favor and 4 against the proposal.
The /e
modifier has been deprecated in trunk in http://svn.php.net/viewvc?view=revision&revision=323862. It will
be removed at some later point in time.