====== PHP RFC: Kill proprietary CSV escaping mechanism ====== * Version: 0.1 * Date: 2018-09-27 * Author: Christoph M. Becker, * Status: Withdrawn * First Published at: http://wiki.php.net/rfc/kill-csv-escaping ===== Introduction ===== For many years, we receive bug reports regarding the strange behavior of the $escape parameter of our CSV writing and reading functions (''fputcsv'', ''fgetcsv'' etc.); the latest has been [[https://bugs.php.net/bug.php?id=76940|reported today]]. Apparently, this escaping mechanism causes more harm than good. Albeit CSV is still a widespread data exchange format, it has never been officially standardized. There exists, however, the “informational” [[https://tools.ietf.org/html/rfc4180|RFC 4180]] which has no notion of escape characters, but rather defines ''escaped'' as strings enclosed in double-quotes where contained double-quotes have to be doubled. While this concept is supported by PHP's implementation (''$enclosure''), the ''$escape'' sometimes interferes, so that ''fgetcsv()'' may be unable to correctly parse externally generated CSV, and ''fputcsv()'' is sometimes generating non-compliant CSV. Even a rountrip ''(fgetcsv(fputcsv(…))'' may fail. While in many cases passing ''"\0"'' as ''$escape'' parameter will yield the desired results, this won't work if someone is writing/reading binary CSV files, may have issues with some non ASCII compatible encodings, and is generally to be regarded as a hack. ===== Proposal ===== Since some may rely on the current behavior (and maybe explicitly work around it), we cannot simply drop support for the ''$escape'' parameter. Instead, the author proposes a stepwise process to keep BC as well as in any way possible: - PHP 7.4: allow to pass an empty string as ''$escape'' argument, which serves to deactivate the escaping - ?: deprecate passing an non-empty string as ''$escape'' argument - PHP 8: change the default value of ''$escape'' to an empty string - ?: deprecate passing an explicit ''$escape'' argument at all - PHP 9: remove the ''$escape'' parameter altogether The affected functions are ''fputcsv()'', ''fgetcsv()'' and ''str_getcsv()'', and also the ''::setCsvControl()'', ''::getCsvControl()'', ''::fputcsv()'', and ''::fgetcsv()'' methods of ''SplFileObject'', as well as any related functionality that might be introduced during the stepwise process. To facilitate this, the internal APIs ''php_fgetcsv()'' and ''php_fputcsv()'' will be adapted accordingly, i.e. their ''escape_char'' parameter type will be changed from ''char'' to ''int'' where ''-1'' will disable the escaping mechanism, and finally this parameter will be removed. Besides bringing our CSV support more inline with other CSV processors, we also reduce the rather lengthy parameter lists of the respective functions. ===== Backward Incompatible Changes ===== See above. ===== Proposed PHP Version(s) ===== See above. ===== New Constants ===== Temporarily the *internal* macro ''PHP_CSV_NO_ESCAPE'' (which expands to ''-1'') will be introduced in ''file.h''. ===== Open Issues ===== None, yet. ===== Future Scope ===== The CSV reading and writing functionality might be extended to support arbitrary character encodings, or respective alternatives might be introduced in the MBString extension. This is not subject of this RFC, though. ===== Proposed Voting Choices ===== Whether we follow the proposed stepwise process as outlined above, or not. To be accepted the vote requires a 2/3 majority. ===== Patches and Tests ===== A preliminary [[https://github.com/php/php-src/pull/3515|pull request]] implementing support for the empty ''$escape'' parameter is available. ===== Implementation ===== After the project is implemented, this section should contain - the version(s) it was merged into - a link to the git commit(s) - a link to the PHP manual entry for the feature - a link to the language specification section (if any) ===== References ===== * [[https://externals.io/message/103268|RFC discussion]] * [[https://externals.io/message/100729|former discussion]] started by the author. * [[https://externals.io/message/78990|former discussion]] started by Tjerk. ===== Rejected Features ===== None, yet.