rfc:pdo_driver_specific_parsers

This is an old revision of the document!


PHP RFC: PDO Driver specific SQL parser

Introduction

The PDO extension contains a SQL parser, whose main purpose is to recognise parameter placeholders inside queries (i.e. “?” and “::paramName”), so that it knows how many and what parameters to expect for a query, and pass the information to the PDO driver in use.

This parser had historically being modelled to work with what was the de-facto SQL standard in the PHP ecosystem at that time: MySQL. However, the SQL dialect used by MySQL is different when handling string literals from standard SQL, followed by other database vendors, such as PostgreSQL and SQLite.

Specifically, MySQL treats the backslash character as an escape character:

'This \'word\' is a single quoted'

Whereas the SQL standard uses double single quotes:

'This ''word'' is a single quoted'

When using databases other than MySQL with PDO, valid queries having string literals ending with a backslash character are throwing off the parser and causing apparently inexplicable bugs.

For example:

SELECT 'foo\' AS a, '?' AS b

will make PDO consider “'foo\' AS a, '” to be a string literal and will parse the following “?” as a positional parameter placeholder. In fact, if you pay close attention, even the DocuWiki SQL code formatter is thrown off by this very example.

We have several reports of similar bugs [1], [2], [3], and possibly other duplicates.

Likewise, having a single SQL parser in PDO meant that my previous RFC to support the PostgreSQL-only "?" operator[4] had to apply the parser change to all database types. The change had no side-effects for other databases, but I feel that can be improved too in this RFC.

Proposal

The proposal is to:

  1. change the default PDO scanner to expect standard SQL only;
  2. allow drivers to optionally provide a custom scanner function to handle their specific SQL dialect;
  3. re-use the current version of the scanner in pdo_mysql, cleaning it up of the PostgreSQL-specific handling of the “?” operator;
  4. add a pdo_pgsql specific scanner, derived from the new default scanner, but capable of dealing with the “?” operator.

In order to do that, a new member will be appended to:

struct pdo_dbh_methods {
	pdo_dbh_close_func		closer;
	pdo_dbh_prepare_func		preparer;
	pdo_dbh_do_func			doer;
	pdo_dbh_quote_func		quoter;
	// ...
	pdo_dbh_sql_scanner		scanner;
}

Each PDO driver defines already their own struct. Leaving the new member to NULL will make the driver use the default PDO scanner function. Otherwise a pointer to a custom scanner function will override the default when parsing queries. It's really as simple as that.

A few years back I attempted to fix a bug and came up with with a pull request that could be considered a proof of concept for this RFC. The same topic was also brought up by others on internals, but no one had time to go with a proper RFC.

Backward Incompatible Changes

No expected BC breaks in userland.

Proposed PHP Version(s)

Next PHP 8.x, hopefully 8.4.

RFC Impact

To SAPIs

No impact

To Existing Extensions

The PDO_DRIVER_API constant will need to be bumped and external PDO driver extension will need some work to ensure compatibility. That has historically been allowed/expected in minor versions (the last time was for PHP 7.2).

To Opcache

No impact to opcache.

New Constants

No new constant.

php.ini Defaults

No php.ini changes

Open Issues

No open issues ATM.

Unaffected PHP Functionality

Anything not PDO?

Future Scope

Perhaps other database drivers could benefit from a custom scanner.

Proposed Voting Choices

As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.

Patches and Tests

PoC pull request. Once discussed, I will start with the actual implementation of the proposal.

References

rfc/pdo_driver_specific_parsers.1712933797.txt.gz · Last modified: 2024/04/12 14:56 by mbeccati