Table of Contents

PHP RFC: PCRE2 migration

Introduction

PCRE is the base for many core functionalities in PHP. Currently it is based on 8.x series which is a legacy library version today. It is supported by the mainstream, however no new feature implementations flow in there, it is a bugfix version only.

Still, as ext/pcre is the core functionality for PHP, and it is essential to keep it rolling. The up-to-date version is called PCRE2 and it pertains as currently actively supported, also it is where the new features are implemented. However, the API has certain differencies. As the original release announcement tells, PCRE2 should be taken as a new project. Nevertheless, it is the library with the same purpose, that inherits a lot from the original PCRE.

Today it's over two years past, since PCRE2 was released. Yes, the API is different, but to the big part is reusable compared to PCRE. It contains already features with no analogue to 8.x series. The PCRE2 JIT lists a wider platform support than PCRE JIT

Proposal

Migrate PHP core to use PCRE2 with the focus on the maximum backward compatibility. The main goal is to bring PCRE2 into the core and have it stable, no new features are targeted by this RFC. If accepted, some new features can definitely find their way before the targeted PHP version is stable.

Backward Incompatible Changes

Proposed PHP Version(s)

PHP 7.3

Impact to SAPIs, existing extensions and Opcache

Code, that makes use of PCRE needs to be rewritten to match the PCRE2 API. Otherwise there's no impact. The current patch takes care of all the core items depending on PCRE.

As PCRE2 is bundled with PHP, PHP can be compiled also on systems where libpcre2 is not available. External libpcre2 can be provided by a corresponding package system or compiled on the given system, if desired.

As ext/pcre code can change significantly, cross patching between different ext/pcre versions can certainly make impact. However to expect were, that issues between these versions are in most case unrelated to each other.

Performance

So far no negative performance impacts could be sighted at least from the linked patch. The performance is of course pattern and input specific, the tests show at least same performance PCRE2 vs. PCRE. Some test suite runs with phpunit show even a faster operation on the side of PCRE2, when preg_* functions are involved.

New Regex Syntax

These and more are available with the upgrade to PCRE2 10.x, almost nothing to be done on the PHP side.

More on the PCRE2 syntax vs PCRE syntax pages. In general, PCRE2 seems to have a more explicit pattern interpreter, so invalid patterns are checked more agressively.

New Constants

Open Issues

None.

Unaffected PHP Functionality

The userland code is unaffected, whereby the pattern checking is done more precise in PCRE2. Invalid patterns are more likely to fail the compilation. The behavior of 'X' modifier was made same in the patch, whereby PCRE2 has 'X' on by default. Also, as mentioned in the impacts section, any C code not using PCRE is unaffected. The 'S' modifier can persist, but won't take effect.

The current test suite passes with PCRE2 with almost no change to the tests. One test ext/pcre/tests/bug75207.phpt had to be adjusted because of the newer UNICODE engine. There can be of course behavior differences that teh current tests don't catch, thus it is all the more important to start the QA as early as possible.

Future Scope

PCRE2 has quite a few things to offer. Please check the compiled version of the API changes here http://www.rexegg.com/pcre-documentation.html. Specifically to mention were the following

PCRE2 also has better Unicode support and a new error reporting API, we might check whether our current UTF-8 sanity checks are still required. Beyond features coming in the new PCRE2 versions are also to take into account.

Vote

Migrate the PHP core to the most current PCRE2 version.

PCRE2 migration
Real name Yes No
ashnazg (ashnazg)  
bukka (bukka)  
bwoebi (bwoebi)  
cmb (cmb)  
daverandom (daverandom)  
derick (derick)  
dm (dm)  
dmitry (dmitry)  
emir (emir)  
galvao (galvao)  
jhdxr (jhdxr)  
kalle (kalle)  
kelunik (kelunik)  
malukenho (malukenho)  
narf (narf)  
nikic (nikic)  
ocramius (ocramius)  
peehaa (peehaa)  
philip (philip)  
pollita (pollita)  
ramsey (ramsey)  
remi (remi)  
sammyk (sammyk)  
stas (stas)  
tpunt (tpunt)  
zeev (zeev)  
Final result: 26 0
This poll has been closed.

2/3 majority required. Voting starts on 2017-10-30 and closes no 2017-11-13.

Patches and Tests

https://github.com/php/php-src/pull/2857

Implementation

Merged into 7.3 http://git.php.net/?p=php-src.git;a=commitdiff;h=a5bc5aed71f7a15f14f33bb31b8e17bf5f327e2d

References