This is an old revision of the document!

RFC: Remove preg_replace /e modifier

Version: 1.0
Date: 2012-02-04
Author: Nikita Popov nikic@php.net
Status: In Draft

Summary

This RFC aims at deprecating and subsequently removing the /e modifier (PREG_REPLACE_EVAL) that preg_replace provides.

What does the /e modifier do?

Quoting the PHP manual:

If this modifier is set, preg_replace() does normal substitution of backreferences in the replacement string,
evaluates it as PHP code, and uses the result for replacing the search string. Single quotes, double quotes,
backslashes (\) and NULL chars will be escaped by backslashes in substituted backreferences.

Example:

$code = preg_replace('(<h([1-6])>(.*?)</h\1>)e', '"<h$1>" . strtoupper("$2") . "</h$1>"', $code); // uppercases headings

Problems

There are several serious issues with the /e modifier:

Security issues

As /e evaluates arbitrary PHP code it can easily be exploited if user input is not carefully validated or sanitized.

For example the above example can be used to execute arbitrary PHP code by passing the string <h1>{${eval($_GET[php_code])}}</h1>. The evaluted code in this case would be “<h1>” . strtoupper(“{${eval($_GET[php_code])}}”) . “</h1>” and as such execute any PHP code passed in the php_code GET variable.

An example of a larger project which suffered from such a code injection vulnerability is RoundCube (see http://trac.roundcube.net/changeset/2148).

Alternative

A both more secure and cleaner approach is the use of preg_replace_callback:

preg_replace_callback(
    '(<h([1-6])>(.*?)</h\1>)',
    function ($m) {
        return "<h$m[1]>" . strtoupper($m[2]) . "</h$m[1]>";
    },
    $code
);

When passing the above exploit code (<h1>{${eval($_GET[php_code])}}</h1>) into this function the result will just be <h1>{${EVAL($_GET[PHP_CODE])}}</h1> and no code will be executed.

But we don't remove eval() either, do we?

The use of the /e modifier is not comparable to the use of eval(). *Any* use of /e can be replaced with a safer and cleaner use of preg_replace_callback. eval() on the other hand has *some* valid use cases which can not be implemented in any other (sane) way.

Additionally it is much easier to create a vunerability with /e than it is with eval. Firstly many people don't understand PCRE well enough to distinguish whether a use of /e is safe or not. But more importantly the fact that addslashes() is applied to all substituted backreferences creates a false feeling of security (as seen above it can be easily circumvented.)

Overescaping of quotes

The application of addslashes on all substituted backreferences not only isn't helping security, but additionally also results in unexpected behavior when the input contains quotes.

As always both quote types are escaped, but only one of them needs escaping, one of the quote types will always be overescaped. E.g. if <h1>Hallo 'World'</h1> is passed into the above function the result would be <h1>HALLO \'WORLD\'</h1> (note the additional backslash).

This behavior makes /e unusable for many applications (or people just use it anyways, without knowing that it is broken).

Use as obfuscation in exploit scripts

Various exploit scripts use the /e modifier to obfuscate their intention. Some examples from a quick SO search:

This obfuscation hides scripts from grep like searches (as preg_replace is usually considered a “good” function). Additionally - as you can see in the second link - it is possible to obfuscate the use of the /e modifier itself, making it even harder to find it's usage.

Obviously the use by exploit scripts is not bad per se (most exploit scripts also use echo and we better keep that), so don't put too much weight on this argument ;)

Conclusion

The /e modifier has little to no valid uses but imposes a rather big security risk. As it can in any case be replaced by a callback there would be no loss in functionality.

The time line for deprecation and removal is subject to discussion.