For many years, we receive bug reports regarding the strange behavior of the $escape parameter of our CSV writing and reading functions (fputcsv
, fgetcsv
etc.); the latest has been reported today. Apparently, this escaping mechanism causes more harm than good.
Albeit CSV is still a widespread data exchange format, it has never been officially standardized. There exists, however, the “informational” RFC 4180 which has no notion of escape characters, but rather defines escaped
as strings enclosed in double-quotes where contained double-quotes have to be doubled. While this concept is supported by PHP's implementation ($enclosure
), the $escape
sometimes interferes, so that fgetcsv()
may be unable to correctly parse externally generated CSV, and fputcsv()
is sometimes generating non-compliant CSV. Even a rountrip (fgetcsv(fputcsv(…))
may fail.
While in many cases passing “\0”
as $escape
parameter will yield the desired results, this won't work if someone is writing/reading binary CSV files, may have issues with some non ASCII compatible encodings, and is generally to be regarded as a hack.
Since some may rely on the current behavior (and maybe explicitly work around it), we cannot simply drop support for the $escape
parameter. Instead, the author proposes a stepwise process to keep BC as well as in any way possible:
$escape
argument, which serves to deactivate the escaping$escape
argument$escape
to an empty string$escape
argument at all$escape
parameter altogether
The affected functions are fputcsv()
, fgetcsv()
and str_getcsv()
, and also the ::setCsvControl()
, ::getCsvControl()
, ::fputcsv()
, and ::fgetcsv()
methods of SplFileObject
, as well as any related functionality that might be introduced during the stepwise process.
To facilitate this, the internal APIs php_fgetcsv()
and php_fputcsv()
will be adapted accordingly, i.e. their escape_char
parameter type will be changed from char
to int
where -1
will disable the escaping mechanism, and finally this parameter will be removed.
Besides bringing our CSV support more inline with other CSV processors, we also reduce the rather lengthy parameter lists of the respective functions.
See above.
See above.
Temporarily the *internal* macro PHP_CSV_NO_ESCAPE
(which expands to -1
) will be introduced in file.h
.
None, yet.
The CSV reading and writing functionality might be extended to support arbitrary character encodings, or respective alternatives might be introduced in the MBString extension. This is not subject of this RFC, though.
Whether we follow the proposed stepwise process as outlined above, or not. To be accepted the vote requires a 2/3 majority.
A preliminary pull request implementing support for the empty $escape
parameter is available.
After the project is implemented, this section should contain
None, yet.