rfc:pdo_driver_specific_parsers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:pdo_driver_specific_parsers [2024/04/23 08:32] mbeccatirfc:pdo_driver_specific_parsers [2024/05/20 17:46] (current) – Deprecation notice for "escaped question marks inside dollar quoted string" mbeccati
Line 1: Line 1:
 ====== PHP RFC: PDO Driver specific SQL parsers ====== ====== PHP RFC: PDO Driver specific SQL parsers ======
-  * Version: 0.3+  * Version: 0.7
   * Date: 2024-04-11   * Date: 2024-04-11
   * Author: Matteo Beccati, mbeccati@php.net   * Author: Matteo Beccati, mbeccati@php.net
   * Status: Under Discussion   * Status: Under Discussion
-  * Discussion: https://externals.io/message/123141 +  * Discussion: [[https://externals.io/message/123141]] 
-  * First Published at: http://wiki.php.net/rfc/pdo_driver_specific_parsers+  * First Published at: [[https://wiki.php.net/rfc/pdo_driver_specific_parsers]] 
 +  * Implementation: [[https://github.com/php/php-src/pull/14035]]
  
 ===== Introduction ===== ===== Introduction =====
Line 33: Line 34:
  
 ===== Proposal ===== ===== Proposal =====
-The proposal is to:+Following a detailed research (see below) for each of the databases currently supported by PDO in core, the proposal is to allow drivers to optionally provide a custom scanner function to handle their specific SQL dialect and:
  
-  - change the [[https://github.com/php/php-src/blob/ab589e4481f0cf35c8773e0c64dccc35b8870ae1/ext/pdo/pdo_sql_parser.re#L42|default PDO scanner]] to expect standard SQL only; +  - Change the [[https://github.com/php/php-src/blob/ab589e4481f0cf35c8773e0c64dccc35b8870ae1/ext/pdo/pdo_sql_parser.re#L42|default PDO scanner]] to expect standard SQL only: 
-  allow drivers to optionally provide custom scanner function to handle their specific SQL dialect; +    single and double quoted literals, with doubling as escaping mechanism 
-  re-use the current version of the scanner in pdo_mysqlcleaning it up of the PostgreSQL-specific handling of the "?" operator; +    - two-dashes and C-style comments (non-nested, as that seems to be the most common format, already implemented in PDO) 
-  - add pdo_pgsql specific scanner, derived from the new default scannerbut capable of dealing with the "?" operator.+  - Add MySQL specific scanner function: 
 +    single and double quoted literals with both doubling and backslash as escaping mechanisms (MySQL default) 
 +    backtick literals with doubling as escaping mechanism (I also tested and it seems MySQL doesn't accept backslashes as escapes) 
 +    - two-dashes (if followed by 1 whitespace), C-style comments, and Hash-comments 
 +    - Tests, as necessary 
 +  - Add a PgSQL specific scanner function: 
 +    - single and double quoted literalswith doubling as escaping mechanism 
 +    - C-style "escape" string literals 
 +    - Dollar-quoted string literals 
 +    - two-dashes and C-style comments (non-nested, as nesting would require unwanted changes to the common parser functionality) 
 +    Support for "??" as escape sequence for the "?" operator 
 +    - Tests, as necessary 
 +  - Add SqLite specific scanner function: 
 +    - singledouble quoted, and backtick literals, with doubling as escaping mechanism 
 +    - square brackets quoting for identifiers 
 +    - two-dashes and C-style comments (non-nested) 
 +    - Tests, as necessary 
 + 
 +In order to keep the change as simple as possible, the proposal tries to cover the default SQL syntax for each database as closely as possiblewithout heavy changes to the common parser code.
  
 One important thing to mention is that the proposed changes are only targeting the part of PDO that scans the SQL query for parameter placeholders. Quoting literals or using parameters in queries is **not going to be affected**. One important thing to mention is that the proposed changes are only targeting the part of PDO that scans the SQL query for parameter placeholders. Quoting literals or using parameters in queries is **not going to be affected**.
Line 57: Line 76:
  
 Each PDO driver defines already [[https://github.com/search?q=repo%3Aphp/php-src%20pdo_dbh_methods&type=code|their own struct]]. Leaving the new member to NULL will make the driver use the default PDO scanner function. Otherwise a pointer to a custom scanner function will override the default when parsing queries. It's really as simple as that. Each PDO driver defines already [[https://github.com/search?q=repo%3Aphp/php-src%20pdo_dbh_methods&type=code|their own struct]]. Leaving the new member to NULL will make the driver use the default PDO scanner function. Otherwise a pointer to a custom scanner function will override the default when parsing queries. It's really as simple as that.
 +
 +The rest of the implementation is the actual re2c scanner code, config.*, Makefile changes, etc. required to incorporate the driver specific scanner into the build.
 +
 +In order to support dollar-quoted strings on Postgres, the functionality of custom quoting has been added to the common PDO parser function. The change has no side effects for other database drivers.
 +
 +One minor potential BC-break was reported while researching bug [[https://github.com/php/php-src/issues/14244|#14244]], which basically describes lack of support for dollar quoting in pdo_pgsql. One of the currently viable workarounds is to use escaped question marks inside dollar quoted strings to avoid unexpected placeholder detection. The last version of the implementation still allows that, while raising the following deprecation notice:
 +
 +''Escaping question marks inside dollar quoted strings is not required anymore and is deprecated''.
 +
 +Such BC-compatibility can be removed in the next major version.
 +
  
 ===== Research on String Literals, Identifiers, and Comments =====  ===== Research on String Literals, Identifiers, and Comments ===== 
Line 68: Line 98:
  
 Several [[https://dev.mysql.com/doc/refman/8.0/en/comments.html|comment types]] supported: ''-- '', ''#'', and ''/* */'' (not nested). Several [[https://dev.mysql.com/doc/refman/8.0/en/comments.html|comment types]] supported: ''-- '', ''#'', and ''/* */'' (not nested).
 +
 +The RFC aims to support all the above kinds of string literals with string-affecting configuration variables set to their defaults. All comment types will be supported.
  
 ==== PostgreSQL ==== ==== PostgreSQL ====
Line 73: Line 105:
 Escaping has evolved during the years. Historically accepted "\'", but started gradually transitioning to the SQL standard around 2005, going from memory. Since 9.1 (2011+) it accepts only single quoted string literals by default according to the SQL standard. See [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS|the documentation]]. Escaping has evolved during the years. Historically accepted "\'", but started gradually transitioning to the SQL standard around 2005, going from memory. Since 9.1 (2011+) it accepts only single quoted string literals by default according to the SQL standard. See [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS|the documentation]].
  
-It also accepts [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE|escape string literals]], e.g.+Postgres also supports [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE|String Constants with Unicode Escapes ]], which follow the same conventions as standard strings and are parsed by PDO as regular strings. 
 + 
 +It also accepts [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE|escape” string constants]], e.g.
  
 <code php>E'This \'word\' is a single quoted'</code> <code php>E'This \'word\' is a single quoted'</code>
  
-I had seen them being used when PostgreSQL started emitting warnings about backslashes in string literalsduring the transition phase when ''magic_quotes_qpc'' was a thing, but I really see no place for this syntax in modern PHPIn fact this is the only syntax that is compatible with the current PDO SQL scanneralthough I wouldn't recommend anybody to change their SQL string literals to this variant.+Lastly[[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING|Dollar-Quoted String Constants]] are very commonespecially when defining functions.
  
 The behaviour of strings can also be manipulated in multiple ways through configuration variables, such as: [[https://www.postgresql.org/docs/16/runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS|standard_conforming_strings]], [[https://www.postgresql.org/docs/16/runtime-config-compatible.html#GUC-BACKSLASH-QUOTE|backslash_quote]], and [[https://www.postgresql.org/docs/16/runtime-config-compatible.html#GUC-ESCAPE-STRING-WARNING|escape_string_warning]]. The behaviour of strings can also be manipulated in multiple ways through configuration variables, such as: [[https://www.postgresql.org/docs/16/runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS|standard_conforming_strings]], [[https://www.postgresql.org/docs/16/runtime-config-compatible.html#GUC-BACKSLASH-QUOTE|backslash_quote]], and [[https://www.postgresql.org/docs/16/runtime-config-compatible.html#GUC-ESCAPE-STRING-WARNING|escape_string_warning]].
- 
-The RFC will ensure regular SQL standard string literals are supported on a Postgres instance with default configuration, while the rarer escape string literals will not be supported. 
- 
-Support for escape string literal is out of scope of this RFC and should be documented as incompatible in the UPGRADING file, as weirdly enough it previously was the only fully-compatible format. 
  
 About [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS|comments]], it follows the standard with ''--'' and ''/* */'' (w/ nested comments allowed) About [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS|comments]], it follows the standard with ''--'' and ''/* */'' (w/ nested comments allowed)
 +
 +The RFC aims to support all the above kinds of string literals with string-affecting configuration variables set to their defaults. Dollar-quoting support requires minimal changes to the common PDO parser function. All comment types are already supported, albeit support for nested comments will not be introduced.
  
 ==== SQLite ==== ==== SQLite ====
Line 97: Line 129:
 Almost SQL standard [[https://www.sqlite.org/lang_comment.html|comments]]: ''--'' and ''/* */'' (not nested). Almost SQL standard [[https://www.sqlite.org/lang_comment.html|comments]]: ''--'' and ''/* */'' (not nested).
  
-==== Oracle ==== +The RFC aims to support single-quoteddouble-quoted, backtick, and square-bracketed literalsAll comment types are already supported
- +
-SQL standard string literals, according to the [[https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Literals.html|documentation]]. +
-It also supports alternative quotinge.g. q'<literal>' and many other variantswhich is out of scope for this RFC. +
- +
-[[https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Database-Object-Names-and-Qualifiers.html#SQLRF-GUID-75337742-67FD-4EC0-985F-741C93D918DA|Double quoted identifiers]]. +
- +
-Almost SQL standard [[https://www.sqlite.org/lang_comment.html|comments]]: ''--'' and ''/* */'' (not nested).+
  
 ==== SQL Server ==== ==== SQL Server ====
Line 113: Line 138:
  
 Almost SQL standard [[https://learn.microsoft.com/en-us/sql/t-sql/language-elements/comment-transact-sql?view=sql-server-ver16|comments]]: ''--'' and ''/* */'' (non nested). Almost SQL standard [[https://learn.microsoft.com/en-us/sql/t-sql/language-elements/comment-transact-sql?view=sql-server-ver16|comments]]: ''--'' and ''/* */'' (non nested).
 +
 +No custom parser is planned in this RFC: the default scanner will be used by default, bringing compatibility for SQL standard string literals, identifiers, and comments.
  
 ==== Firebird ==== ==== Firebird ====
Line 120: Line 147:
  
 Almost SQL standard [[https://firebirdsql.org/file/documentation/chunk/en/refdocs/fblangref25/fblangref25-structure-comments.html|comments]]: ''--'' and ''/* */'' (non nested). Almost SQL standard [[https://firebirdsql.org/file/documentation/chunk/en/refdocs/fblangref25/fblangref25-structure-comments.html|comments]]: ''--'' and ''/* */'' (non nested).
 +
 +No custom parser is planned in this RFC: the default scanner will be used by default, bringing compatibility for SQL standard string literals, identifiers, and comments.
 +
  
 ==== ODBC ==== ==== ODBC ====
  
 Since ODBC can connect to various types of databases, the SQL standard parser hopefully will suffice. Since ODBC can connect to various types of databases, the SQL standard parser hopefully will suffice.
 +
 +==== Oracle ====
 +
 +SQL standard string literals, according to the [[https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Literals.html|documentation]].
 +It also supports alternative quoting, e.g. q'<literal>' and many other variants, which is out of scope for this RFC.
 +
 +[[https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Database-Object-Names-and-Qualifiers.html#SQLRF-GUID-75337742-67FD-4EC0-985F-741C93D918DA|Double quoted identifiers]].
 +
 +Almost SQL standard [[https://www.sqlite.org/lang_comment.html|comments]]: ''--'' and ''/* */'' (not nested).
 +
 +The OCI driver lives in PECL: the default scanner will be used by default, bringing compatibility for SQL standard string literals, identifiers, and comments.
  
  
Line 130: Line 171:
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-No expected BC breaks.+No BC breaks, but a deprecation notice will be raised when using the "escaped question marks inside dollar quoted string" workaround described before.
  
 Users having applications that can work with multiple database engines should still be very careful and write portable queries, possibly using the ''PDO::quote()'' method when necessary instead of hardcoding strings containing escape characters. Users having applications that can work with multiple database engines should still be very careful and write portable queries, possibly using the ''PDO::quote()'' method when necessary instead of hardcoding strings containing escape characters.
Line 160: Line 201:
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
 Anything not related to PDO scanning the SQL query for parameter placeholders. Anything not related to PDO scanning the SQL query for parameter placeholders.
 +
 +===== Out Of Scope =====
 +
 +==== Dynamic changes to the scanner ====
 +The scanners are generated when PHP is compiled and, currently, cannot be modified at runtime. However, some databases allow configuration directives or ''SET'' queries to change the accepted syntax for literals, identifiers, etc.
 +
 +Being able to understand all possible combinations would require tracking what directives are different from the expected default and having a number of scanners inside each driver for each possible permutation of such configuration directives.
  
 ===== Future Scope ===== ===== Future Scope =====
-If necessary, support "exotic" syntaxes in custom scanners for other database drivers.+Evaluate supporting "exotic" syntaxes in the existing scanners and/or add other custom scanner functionality.
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
Line 169: Line 217:
 ===== Patches and Tests ===== ===== Patches and Tests =====
  
-[[https://github.com/php/php-src/pull/6852|PoC pull request]]. Once discussed, I will start with the actual implementation of the proposal.+[[https://github.com/php/php-src/pull/14035|Implementation Pull Request]]
  
 ===== References ===== ===== References =====
rfc/pdo_driver_specific_parsers.1713861154.txt.gz · Last modified: 2024/04/23 08:32 by mbeccati