PHP RFC: Loosening heredoc/nowdoc scanner
- Version: 0.9
- Date: 2014-08-29
- Author: Tjerk Meesters, datibbaw@php.net
- Status: Obsolete
- First Published at: http://wiki.php.net/rfc/heredoc-scanner-loosening
Introduction
Currently the rules for ending a heredoc or nowdoc are quite restrictive, requiring a newline after the closing identifier; this makes it more awkward to combine multiple quotations, such as in array declarations or with other operators. The manual entry for heredocs has a big pink box to explain those intricate details.
For instance, this is how you would declare an array comprising a heredoc and regular string:
$strings = [<<<EOS hello EOS , ' world'];
The comma that you would normally put on the same line as the previous element must now be put on the next line.
Currently, this restriction also causes a parse error with code such as this:
return <<<EOS Foo! EOS; // <-- file ends here
Proposal
This proposal aims to lift the current newline restriction and make it less awkward to use heredocs and nowdocs within constructs, such as array declarations:
$strings = [<<<EOS a EOS, <<<EOS b EOS];
Or with other operators (e.g. concatenation):
class Test { const A = <<<EOS ab EOS . <<<EOS cd EOS; }
The proposal suggests two distinct ways in which this can be achieved:
Loosened restrictions
Ends a quotation when the closing identifier is followed by something that can't be part of an identifier.
Removed restrictions
Ends a quotation as soon as the closing identifier is encountered.
Backward Incompatible Changes
Depending on whether we choose to loosen the restrictions (defined above) or completely remove it, the following behaviour will be changed:
Loosened restrictions
The following test code (taken from the aptly called “torture the T_END_HEREDOC rules”) will not work as expected:
print <<<ENDOFHEREDOC ENDOFHEREDOC ; ENDOFHEREDOC; ENDOFHEREDOC ENDOFHEREDOC $ENDOFHEREDOC; ENDOFHEREDOC;
It emits “ENDOFHEREDOC ;\nENDOFHEREDOC;” and then stop scanning, leading to a parse error on the next line.
This is a rather extreme example of trying to break the scanner; while not entirely impossible, it's most likely not based on anything one would encounter in the wild.
Removed restrictions
Removing the restrictions altogether will cause issues in code such as this:
$s = <<<EOS Foo bar EOSBLA EOS;
It emits “Foo bar” and then stops scanning, leading to a parse error at “BLA”.
Although this may seem undesirable behaviour, it should be noted that the developer is in complete control of choosing the name for their enclosures; it's important to choose an enclosure that doesn't occur naturally inside the quotation.
Proposed PHP Version(s)
PHP 7
Unaffected PHP Functionality
It doesn't impact the rules that govern the contents inside the heredoc or nowdoc.
Proposed Voting Choices
Should the heredoc and nowdoc scanner be changed?
Voting choices will be:
- No, leave the scanner as it is.
- Yes, loosen the newline restriction with characters that can't be part of an identifier.
- Yes, remove the newline restriction altogether.
This proposal requires a 2/3 majority as it affects the language.
Note: Both “Yes” options count towards changing the current behaviour; if a single majority for the last option can't be reached, the “loosened restrictions” will be applied.
Patches and Tests
The RFC author will provide the patches.