PHP RFC: Kill proprietary CSV escaping mechanism
- Version: 0.1
- Date: 2018-09-27
- Author: Christoph M. Becker, cmb@php.net
- Status: Withdrawn
- First Published at: http://wiki.php.net/rfc/kill-csv-escaping
Introduction
For many years, we receive bug reports regarding the strange behavior of the $escape parameter of our CSV writing and reading functions (fputcsv, fgetcsv etc.); the latest has been reported today. Apparently, this escaping mechanism causes more harm than good.
Albeit CSV is still a widespread data exchange format, it has never been officially standardized. There exists, however, the “informational” RFC 4180 which has no notion of escape characters, but rather defines escaped as strings enclosed in double-quotes where contained double-quotes have to be doubled. While this concept is supported by PHP's implementation ($enclosure), the $escape sometimes interferes, so that fgetcsv() may be unable to correctly parse externally generated CSV, and fputcsv() is sometimes generating non-compliant CSV. Even a rountrip (fgetcsv(fputcsv(…)) may fail.
While in many cases passing “\0” as $escape parameter will yield the desired results, this won't work if someone is writing/reading binary CSV files, may have issues with some non ASCII compatible encodings, and is generally to be regarded as a hack.
Proposal
Since some may rely on the current behavior (and maybe explicitly work around it), we cannot simply drop support for the $escape parameter. Instead, the author proposes a stepwise process to keep BC as well as in any way possible:
- PHP 7.4: allow to pass an empty string as
$escapeargument, which serves to deactivate the escaping - ?: deprecate passing an non-empty string as
$escapeargument - PHP 8: change the default value of
$escapeto an empty string - ?: deprecate passing an explicit
$escapeargument at all - PHP 9: remove the
$escapeparameter altogether
The affected functions are fputcsv(), fgetcsv() and str_getcsv(), and also the ::setCsvControl(), ::getCsvControl(), ::fputcsv(), and ::fgetcsv() methods of SplFileObject, as well as any related functionality that might be introduced during the stepwise process.
To facilitate this, the internal APIs php_fgetcsv() and php_fputcsv() will be adapted accordingly, i.e. their escape_char parameter type will be changed from char to int where -1 will disable the escaping mechanism, and finally this parameter will be removed.
Besides bringing our CSV support more inline with other CSV processors, we also reduce the rather lengthy parameter lists of the respective functions.
Backward Incompatible Changes
See above.
Proposed PHP Version(s)
See above.
New Constants
Temporarily the *internal* macro PHP_CSV_NO_ESCAPE (which expands to -1) will be introduced in file.h.
Open Issues
None, yet.
Future Scope
The CSV reading and writing functionality might be extended to support arbitrary character encodings, or respective alternatives might be introduced in the MBString extension. This is not subject of this RFC, though.
Proposed Voting Choices
Whether we follow the proposed stepwise process as outlined above, or not. To be accepted the vote requires a 2/3 majority.
Patches and Tests
A preliminary pull request implementing support for the empty $escape parameter is available.
Implementation
After the project is implemented, this section should contain
- the version(s) it was merged into
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
- a link to the language specification section (if any)
References
- former discussion started by the author.
- former discussion started by Tjerk.
Rejected Features
None, yet.