PHP RFC: Loosening heredoc/nowdoc scanner


Currently the rules for ending a heredoc or nowdoc are quite restrictive, requiring a newline after the closing identifier; this makes it more awkward to combine multiple quotations, such as in array declarations or with other operators. The manual entry for heredocs has a big pink box to explain those intricate details.

For instance, this is how you would declare an array comprising a heredoc and regular string:

$strings = [<<<EOS
, ' world'];

The comma that you would normally put on the same line as the previous element must now be put on the next line.

Currently, this restriction also causes a parse error with code such as this:

return <<<EOS
EOS; // <-- file ends here


This proposal aims to lift the current newline restriction and make it less awkward to use heredocs and nowdocs within constructs, such as array declarations:

$strings = [<<<EOS

Or with other operators (e.g. concatenation):

class Test
    const A = <<<EOS
EOS . <<<EOS

The proposal suggests two distinct ways in which this can be achieved:

Loosened restrictions

Ends a quotation when the closing identifier is followed by something that can't be part of an identifier.

Removed restrictions

Ends a quotation as soon as the closing identifier is encountered.

Backward Incompatible Changes

Depending on whether we choose to loosen the restrictions (defined above) or completely remove it, the following behaviour will be changed:

Loosened restrictions

The following test code (taken from the aptly called “torture the T_END_HEREDOC rules”) will not work as expected:


It emits “ENDOFHEREDOC ;\nENDOFHEREDOC;” and then stop scanning, leading to a parse error on the next line.

This is a rather extreme example of trying to break the scanner; while not entirely impossible, it's most likely not based on anything one would encounter in the wild.

Removed restrictions

Removing the restrictions altogether will cause issues in code such as this:

$s = <<<EOS
Foo bar

It emits “Foo bar” and then stops scanning, leading to a parse error at “BLA”.

Although this may seem undesirable behaviour, it should be noted that the developer is in complete control of choosing the name for their enclosures; it's important to choose an enclosure that doesn't occur naturally inside the quotation.

Proposed PHP Version(s)


Unaffected PHP Functionality

It doesn't impact the rules that govern the contents inside the heredoc or nowdoc.

Proposed Voting Choices

Should the heredoc and nowdoc scanner be changed?

Voting choices will be:

  1. No, leave the scanner as it is.
  2. Yes, loosen the newline restriction with characters that can't be part of an identifier.
  3. Yes, remove the newline restriction altogether.

This proposal requires a 2/3 majority as it affects the language.

Note: Both “Yes” options count towards changing the current behaviour; if a single majority for the last option can't be reached, the “loosened restrictions” will be applied.

Patches and Tests

The RFC author will provide the patches.

rfc/heredoc-scanner-loosening.txt · Last modified: 2018/06/10 10:26 by cmb