rfc:heredoc-scanner-loosening

PHP RFC: Loosening heredoc/nowdoc scanner

Introduction

Currently the rules for ending a heredoc or nowdoc are quite restrictive, requiring a newline after the closing identifier; this makes it more awkward to combine multiple quotations, such as in array declarations or with other operators. The manual entry for heredocs has a big pink box to explain those intricate details.

For instance, this is how you would declare an array comprising a heredoc and regular string:

$strings = [<<<EOS
hello
EOS
, ' world'];

The comma that you would normally put on the same line as the previous element must now be put on the next line.

Currently, this restriction also causes a parse error with code such as this:

return <<<EOS
Foo!
EOS; // <-- file ends here

Proposal

This proposal aims to lift the current newline restriction and make it less awkward to use heredocs and nowdocs within constructs, such as array declarations:

$strings = [<<<EOS
a
EOS, <<<EOS
b
EOS];

Or with other operators (e.g. concatenation):

class Test
{
    const A = <<<EOS
ab
EOS . <<<EOS
cd
EOS;
}

The proposal suggests two distinct ways in which this can be achieved:

Loosened restrictions

Ends a quotation when the closing identifier is followed by something that can't be part of an identifier.

Removed restrictions

Ends a quotation as soon as the closing identifier is encountered.

Backward Incompatible Changes

Depending on whether we choose to loosen the restrictions (defined above) or completely remove it, the following behaviour will be changed:

Loosened restrictions

The following test code (taken from the aptly called “torture the T_END_HEREDOC rules”) will not work as expected:

print <<<ENDOFHEREDOC
ENDOFHEREDOC    ;
    ENDOFHEREDOC;
ENDOFHEREDOC   
    ENDOFHEREDOC
$ENDOFHEREDOC;
 
ENDOFHEREDOC;

It emits “ENDOFHEREDOC ;\nENDOFHEREDOC;” and then stop scanning, leading to a parse error on the next line.

This is a rather extreme example of trying to break the scanner; while not entirely impossible, it's most likely not based on anything one would encounter in the wild.

Removed restrictions

Removing the restrictions altogether will cause issues in code such as this:

$s = <<<EOS
Foo bar
EOSBLA
EOS;

It emits “Foo bar” and then stops scanning, leading to a parse error at “BLA”.

Although this may seem undesirable behaviour, it should be noted that the developer is in complete control of choosing the name for their enclosures; it's important to choose an enclosure that doesn't occur naturally inside the quotation.

Proposed PHP Version(s)

PHP 7

Unaffected PHP Functionality

It doesn't impact the rules that govern the contents inside the heredoc or nowdoc.

Proposed Voting Choices

Should the heredoc and nowdoc scanner be changed?

Voting choices will be:

  1. No, leave the scanner as it is.
  2. Yes, loosen the newline restriction with characters that can't be part of an identifier.
  3. Yes, remove the newline restriction altogether.

This proposal requires a 2/3 majority as it affects the language.

Note: Both “Yes” options count towards changing the current behaviour; if a single majority for the last option can't be reached, the “loosened restrictions” will be applied.

Patches and Tests

The RFC author will provide the patches.

rfc/heredoc-scanner-loosening.txt · Last modified: 2018/06/10 10:26 by cmb