rfc:pdo_driver_specific_parsers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:pdo_driver_specific_parsers [2024/05/02 08:45] – Updated PR: support for more pgsql literals mbeccatirfc:pdo_driver_specific_parsers [2024/06/10 13:28] (current) – Minor fixes adiel
Line 1: Line 1:
 ====== PHP RFC: PDO Driver specific SQL parsers ====== ====== PHP RFC: PDO Driver specific SQL parsers ======
-  * Version: 0.5+  * Version: 1.0
   * Date: 2024-04-11   * Date: 2024-04-11
   * Author: Matteo Beccati, mbeccati@php.net   * Author: Matteo Beccati, mbeccati@php.net
-  * Status: Under Discussion+  * Status: Voting
   * Discussion: [[https://externals.io/message/123141]]   * Discussion: [[https://externals.io/message/123141]]
   * First Published at: [[https://wiki.php.net/rfc/pdo_driver_specific_parsers]]   * First Published at: [[https://wiki.php.net/rfc/pdo_driver_specific_parsers]]
Line 11: Line 11:
 The PDO extension contains a SQL parser, whose main purpose is to recognise parameter placeholders inside queries (i.e. ''"?"'' and ''":paramName"''), so that it knows how many and what parameters to expect for a query, and pass the information to the PDO driver in use. The PDO extension contains a SQL parser, whose main purpose is to recognise parameter placeholders inside queries (i.e. ''"?"'' and ''":paramName"''), so that it knows how many and what parameters to expect for a query, and pass the information to the PDO driver in use.
  
-This parser had historically been modelled to work with what was the de-facto SQL standard in the PHP ecosystem at that time: MySQL. However, the SQL dialect used by MySQL is different when handling string literals from standard SQL, followed by other database vendors, such as PostgreSQL and SQLite.+This parser had historically been modelled to work with the de-facto SQL standard in the PHP ecosystem at that time: MySQL. However, the SQL dialect used by MySQL is different when handling string literals from standard SQL, followed by other database vendors, such as PostgreSQL and SQLite.
  
 Specifically, MySQL treats the backslash character as an escape character: <code php>'This \'word\' is a single quoted'</code> Specifically, MySQL treats the backslash character as an escape character: <code php>'This \'word\' is a single quoted'</code>
Line 26: Line 26:
 will make PDO consider ''"'foo\' AS a, '"'' to be a string literal and will parse the following "?" as a positional parameter placeholder. In fact, if you pay close attention, even the DocuWiki SQL code formatter is thrown off by this very example. will make PDO consider ''"'foo\' AS a, '"'' to be a string literal and will parse the following "?" as a positional parameter placeholder. In fact, if you pay close attention, even the DocuWiki SQL code formatter is thrown off by this very example.
  
-We have several reports of similar bugs [[https://bugs.php.net/bug.php?id=78534|[1]]], [[https://bugs.php.net/bug.php?id=79276|[2]]], [[https://bugs.php.net/bug.php?id=80340|[3]]], and possibly other duplicates[7].+We have several reports of similar bugs [[https://bugs.php.net/bug.php?id=78534|[1]]], [[https://bugs.php.net/bug.php?id=79276|[2]]], [[https://bugs.php.net/bug.php?id=80340|[3]]], and possibly other duplicates[[https://github.com/php/php-src/issues/13958|[7]]].
  
 In a nutshell, the PDO SQL scanner function doesn't need to completely parse the SQL dialect in use. **It is sufficient to properly recognise quoted string literals, quoted identifier literals, and comments to skip placeholder detection inside them**. In a nutshell, the PDO SQL scanner function doesn't need to completely parse the SQL dialect in use. **It is sufficient to properly recognise quoted string literals, quoted identifier literals, and comments to skip placeholder detection inside them**.
  
 ==== Bonus improvement ==== ==== Bonus improvement ====
-The limitation of a global SQL parser in PDO meant that my previous [[rfc:pdo_escape_placeholders|RFC to support the PostgreSQL-only "?" operator]][4] had to apply the parser change to all database types. The change had no known side-effects, but I feel this RFC could improve by keeping this peculiarity inside the pdo_pgsql scanner only.+The limitation of a global SQL parser in PDO meant that my previous [[rfc:pdo_escape_placeholders|RFC to support the PostgreSQL-only "?" operator]][4] had to apply the parser change to all database types. The change had no known side effects, but I feel this RFC could improve by keeping this peculiarity inside the pdo_pgsql scanner only.
  
 ===== Proposal ===== ===== Proposal =====
Line 39: Line 39:
     - single and double quoted literals, with doubling as escaping mechanism     - single and double quoted literals, with doubling as escaping mechanism
     - two-dashes and C-style comments (non-nested, as that seems to be the most common format, already implemented in PDO)     - two-dashes and C-style comments (non-nested, as that seems to be the most common format, already implemented in PDO)
-  - Add a MySQL specific scanner function:+  - Add a MySQL-specific scanner function:
     - single and double quoted literals with both doubling and backslash as escaping mechanisms (MySQL default)     - single and double quoted literals with both doubling and backslash as escaping mechanisms (MySQL default)
     - backtick literals with doubling as escaping mechanism (I also tested and it seems MySQL doesn't accept backslashes as escapes)     - backtick literals with doubling as escaping mechanism (I also tested and it seems MySQL doesn't accept backslashes as escapes)
-    - two-dashes, C-style comments, and Hash-comments, albeit I couldn't force the requirement of a space after "--"+    - two-dashes (if followed by 1 whitespace), C-style comments, and Hash-comments
     - Tests, as necessary     - Tests, as necessary
-  - Add a PgSQL specific scanner function:+  - Add a PgSQL-specific scanner function:
     - single and double quoted literals, with doubling as escaping mechanism     - single and double quoted literals, with doubling as escaping mechanism
     - C-style "escape" string literals     - C-style "escape" string literals
Line 51: Line 51:
     - Support for "??" as escape sequence for the "?" operator     - Support for "??" as escape sequence for the "?" operator
     - Tests, as necessary     - Tests, as necessary
-  - Add a SqLite specific scanner function:+  - Add a SqLite-specific scanner function:
     - single, double quoted, and backtick literals, with doubling as escaping mechanism     - single, double quoted, and backtick literals, with doubling as escaping mechanism
 +    - square brackets quoting for identifiers
     - two-dashes and C-style comments (non-nested)     - two-dashes and C-style comments (non-nested)
     - Tests, as necessary     - Tests, as necessary
  
-In order to keep the change as simple as possible, the proposal tries to cover the default SQL syntax for each database as closely as possible, without heavy changes to the common parser code.+To keep the change as simple as possible, the proposal tries to cover the default SQL syntax for each database as closely as possible, without heavy changes to the common parser code.
  
 One important thing to mention is that the proposed changes are only targeting the part of PDO that scans the SQL query for parameter placeholders. Quoting literals or using parameters in queries is **not going to be affected**. One important thing to mention is that the proposed changes are only targeting the part of PDO that scans the SQL query for parameter placeholders. Quoting literals or using parameters in queries is **not going to be affected**.
  
 ===== Detailed Proposal ===== ===== Detailed Proposal =====
-In order to execute the plan, a new member will be appended to:+To execute the plan, a new member will be appended to:
  
 <code c> <code c>
Line 74: Line 75:
 </code> </code>
  
-Each PDO driver defines already [[https://github.com/search?q=repo%3Aphp/php-src%20pdo_dbh_methods&type=code|their own struct]]. Leaving the new member to NULL will make the driver use the default PDO scanner function. Otherwise a pointer to a custom scanner function will override the default when parsing queries. It's really as simple as that.+Each PDO driver defines already [[https://github.com/search?q=repo%3Aphp/php-src%20pdo_dbh_methods&type=code|their own struct]]. Leaving the new member to NULL will make the driver use the default PDO scanner function. Otherwisea pointer to a custom scanner function will override the default when parsing queries. It's really as simple as that.
  
-The rest of the implementation is the actual re2c scanner code, config.*, Makefile changes, etc. required to incorporate the driver specific scanner into the build.+The rest of the implementation is the actual re2c scanner code, config.*, Makefile changes, etc. required to incorporate the driver-specific scanner into the build
 + 
 +To support dollar-quoted strings on Postgres, the functionality of custom quoting has been added to the common PDO parser function. The change has no side effects for other database drivers. 
 + 
 +One minor potential BC-break was reported while researching bug [[https://github.com/php/php-src/issues/14244|#14244]], which basically describes lack of support for dollar quoting in pdo_pgsql. One of the currently viable workarounds is to use escaped question marks inside dollar-quoted strings to avoid unexpected placeholder detection. The last version of the implementation still allows that, while raising the following deprecation notice: 
 + 
 +''Escaping question marks inside dollar quoted strings is not required anymore and is deprecated''
 + 
 +Such BC-compatibility can be removed in the next major version.
  
-In order to support dollar-quoted strings on Postgres, the functionality of custom quoting has been added to the common PDO parser function. The change has no side effects for other database drivers. 
  
 ===== Research on String Literals, Identifiers, and Comments =====  ===== Research on String Literals, Identifiers, and Comments ===== 
  
 ==== MySQL ==== ==== MySQL ====
-MySQL by default accepts both backslash escaped quotes and SQL standard. String literals can use single or double quotes. See [[https://dev.mysql.com/doc/refman/8.0/en/string-literals.html|the documentation]] (8.0 current is linked here, but 5.7 and 8.3 bahave the same).+MySQL by default accepts both backslash escaped quotes and SQL standard. String literals can use single or double quotes. See [[https://dev.mysql.com/doc/refman/8.0/en/string-literals.html|the documentation]] (8.0 current is linked here, but 5.7 and 8.3 behave the same).
  
 The [[https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#sqlmode_no_backslash_escapes|NO_BACKSLASH_ESCAPE]] SQL mode will disable recognition of the backslash as escape character. If set it will break SQL scanning regardless of this RFC. The [[https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#sqlmode_no_backslash_escapes|NO_BACKSLASH_ESCAPE]] SQL mode will disable recognition of the backslash as escape character. If set it will break SQL scanning regardless of this RFC.
Line 91: Line 99:
 Several [[https://dev.mysql.com/doc/refman/8.0/en/comments.html|comment types]] supported: ''-- '', ''#'', and ''/* */'' (not nested). Several [[https://dev.mysql.com/doc/refman/8.0/en/comments.html|comment types]] supported: ''-- '', ''#'', and ''/* */'' (not nested).
  
-The RFC aims to support all the above kinds of string literals with string-affecting configuration variables set to their defaults. All comment types will be supported, however "--" is recognised as comment start following the SQL standard, even if MySQL requires a space character after it.+The RFC aims to support all the above kinds of string literals with string-affecting configuration variables set to their defaults. All comment types will be supported.
  
 ==== PostgreSQL ==== ==== PostgreSQL ====
  
-Escaping has evolved during the years. Historically accepted "\'", but started gradually transitioning to the SQL standard around 2005, going from memory. Since 9.1 (2011+) it accepts only single quoted string literals by default according to the SQL standard. See [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS|the documentation]].+Escaping has evolved over the years. Historically accepted "\'", but started gradually transitioning to the SQL standard around 2005, going from memory. Since 9.1 (2011+) it accepts only single-quoted string literals by default according to the SQL standard. See [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS|the documentation]].
  
 Postgres also supports [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE|String Constants with Unicode Escapes ]], which follow the same conventions as standard strings and are parsed by PDO as regular strings. Postgres also supports [[https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE|String Constants with Unicode Escapes ]], which follow the same conventions as standard strings and are parsed by PDO as regular strings.
Line 115: Line 123:
 Follows the SQL standard, and requires double single quotes to represent the single quote in a string literal. See [[https://www.sqlite.org/lang_expr.html#literal_values_constants_|documentation]]. Follows the SQL standard, and requires double single quotes to represent the single quote in a string literal. See [[https://www.sqlite.org/lang_expr.html#literal_values_constants_|documentation]].
  
-It will however accept double quoted strings as string literals under [[https://www.sqlite.org/quirks.html#double_quoted_string_literals_are_accepted|some circumstances]].+It will however accept double-quoted strings as string literals under [[https://www.sqlite.org/quirks.html#double_quoted_string_literals_are_accepted|some circumstances]].
  
 Double quoted identifiers, but also [[https://sqlite.org/lang_keywords.html|backticks and square brackets]]. Double quoted identifiers, but also [[https://sqlite.org/lang_keywords.html|backticks and square brackets]].
Line 121: Line 129:
 Almost SQL standard [[https://www.sqlite.org/lang_comment.html|comments]]: ''--'' and ''/* */'' (not nested). Almost SQL standard [[https://www.sqlite.org/lang_comment.html|comments]]: ''--'' and ''/* */'' (not nested).
  
-The RFC aims to support single-quoted, double-quoted, and backtick literals. All comment types are already supported. +The RFC aims to support single-quoted, double-quoted, backtick, and square-bracketed literals. All comment types are already supported. 
  
 ==== SQL Server ==== ==== SQL Server ====
Line 154: Line 162:
 [[https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Database-Object-Names-and-Qualifiers.html#SQLRF-GUID-75337742-67FD-4EC0-985F-741C93D918DA|Double quoted identifiers]]. [[https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Database-Object-Names-and-Qualifiers.html#SQLRF-GUID-75337742-67FD-4EC0-985F-741C93D918DA|Double quoted identifiers]].
  
-Almost SQL standard [[https://www.sqlite.org/lang_comment.html|comments]]: ''--'' and ''/* */'' (not nested).+Almost SQL standard [[https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Comments.html#SQLRF-GUID-5C84C344-CEB3-4DBF-B748-337DE11CCE2A|comments]]: ''--'' and ''/* */'' (not nested).
  
 The OCI driver lives in PECL: the default scanner will be used by default, bringing compatibility for SQL standard string literals, identifiers, and comments. The OCI driver lives in PECL: the default scanner will be used by default, bringing compatibility for SQL standard string literals, identifiers, and comments.
Line 160: Line 168:
  
 ===== Historical Background ===== ===== Historical Background =====
-A few years back I attempted to fix a bug and came up with with a [[https://github.com/php/php-src/pull/6852|pull request]] that could be considered a  proof of concept for this RFC. The same topic was also brought up [[https://externals.io/message/114016|by others on internals]], but no one had time to follow up with a proper RFC.+A few years back I attempted to fix a bug and came up with a [[https://github.com/php/php-src/pull/6852|pull request]] that could be considered a proof of concept for this RFC. The same topic was also brought up [[https://externals.io/message/114016|by others on internals]], but no one had time to follow up with a proper RFC.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-No expected BC breaks.+No BC breaks, but a deprecation notice will be raised when using the "escaped question marks inside dollar quoted string" workaround described before.
  
 Users having applications that can work with multiple database engines should still be very careful and write portable queries, possibly using the ''PDO::quote()'' method when necessary instead of hardcoding strings containing escape characters. Users having applications that can work with multiple database engines should still be very careful and write portable queries, possibly using the ''PDO::quote()'' method when necessary instead of hardcoding strings containing escape characters.
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
-Next PHP 8.x, hopefully 8.4.+Next PHP 8.x, hopefully8.4.
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 175: Line 183:
  
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
-Drivers outside of php-src might have to be modified if they make assumptions about the structure of enum pdo_param_type. They would have to be rebuilt since the PDO_DRIVER_API macro would be updated. +Drivers outside of php-src might have to be modified if they make assumptions about the structure of the enum pdo_param_type. They would have to be rebuilt since the PDO_DRIVER_API macro would be updated. 
  
 That has historically been allowed/expected in minor versions. The last time it happened was for PHP 7.2 with [[rfc:extended-string-types-for-pdo|PHP RFC: Extended String Types For PDO]]. That has historically been allowed/expected in minor versions. The last time it happened was for PHP 7.2 with [[rfc:extended-string-types-for-pdo|PHP RFC: Extended String Types For PDO]].
Line 199: Line 207:
 The scanners are generated when PHP is compiled and, currently, cannot be modified at runtime. However, some databases allow configuration directives or ''SET'' queries to change the accepted syntax for literals, identifiers, etc. The scanners are generated when PHP is compiled and, currently, cannot be modified at runtime. However, some databases allow configuration directives or ''SET'' queries to change the accepted syntax for literals, identifiers, etc.
  
-Being able to understand all possible combinations would require tracking what directives are different from the expected default and having a number of scanners inside each driver for each possible permutation of such configuration directives.+Being able to understand all possible combinations would require tracking what directives are different from the expected default and having several scanners inside each driver for each possible permutation of such configuration directives.
  
 ===== Future Scope ===== ===== Future Scope =====
 Evaluate supporting "exotic" syntaxes in the existing scanners and/or add other custom scanner functionality. Evaluate supporting "exotic" syntaxes in the existing scanners and/or add other custom scanner functionality.
  
-===== Proposed Voting Choices ===== +===== Voting ===== 
-As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted. +Voting will be open until Monday, 17 June 2024 at 15:00:00 UTC. As usual, 2/3 majority is needed for this proposal to be accepted. 
 + 
 +<doodle title="Implement PDO Driver specific SQL parsers?" auth="mbeccati" voteType="single" closed="false" closeon="2024-06-17T15:00:00Z"> 
 +   * Yes 
 +   * No 
 +</doodle>
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
  
-[[https://github.com/php/php-src/pull/14035|Draft Pull Request]]+[[https://github.com/php/php-src/pull/14035|Implementation Pull Request]]
  
 ===== References ===== ===== References =====
rfc/pdo_driver_specific_parsers.1714639540.txt.gz · Last modified: 2024/05/02 08:45 by mbeccati