====== PHP RFC: Shorter Attribute Syntax Change ====== * Version: 0.7 * Date: 2020-08-04 * Author: Derick Rethans, Benjamin Eberlei * Status: Implemented * First Published at: https://wiki.php.net/rfc/shorter_attribute_syntax_change ===== Introduction ===== With the continued discussion over the currently selected attribute syntax ''@@'' from the [[rfc:shorter_attribute_syntax|Shorter Attribute Syntax RFC]] we want to revisit the syntax choice once and for all, so that we can be as sure as possible the right choice was picked. There are multiple reasons why we believe the previous vote should be revisited: * At the point of the vote for ''@@'', it was not clear that the syntax required the namespace token RFC to be viable. While this is not a problem anymore, the ''@@'' syntax might not have come out on top if this information was known beforehand. A revote could dispel doubts whether ''@@'' is still favoured by a majority. * The shorter attributes RFC removed the support for grouped syntaxes, which was voted on favourably in Attributes Amendments. Due to the missing namespace token RFC at that point ''@@'' did not support grouping at that point. With the namespace token RFC it would support grouping along the lines of ''@@[]''. As ''@@'' still does not support grouping, voters that both favoured grouping and ''@@'' should given a chance to decide again with full information. * The ''#[]'' syntax provides the benefit of forward compatibility, but this also introduces some potential problems for PHP 7 code. An alternative syntax ''@[]'' was suggested to alleviate these problems which was not previously voted on. Another new syntax proposal is ''@{}'' which has the benefit of not causing a BC break. * We argue why we should strongly favour a syntax with closing delimiter to keep consistency with other parts of the language and propose to use ''#[]'', ''@[]'', ''@{}'', or the original ''<< … >>'' instead. While reasons for or against a syntax obviously include subjective opinions, please keep in mind that we want the best syntax, and not necessarily the best **looking** syntax. ===== Proposal ===== Pick the **best** syntax from the following options, taking into account the different pros and cons: ^ Syntax ^ ''@@Attr'' ^ ''#[Attr]'' ^ ''@[Attr]'' ^ ''<>'' ^ ''@:Attr'' ^ ''@{Attr}'' ^ | Number of required Characters | 2 | 3 | 3 | 4 | 2 | 3 | | Has End Delimiter | //No// | //Yes// | //Yes// | //Yes// | //No// | //Yes// | | Allows Grouping (Accepted in previous [[rfc:attribute_amendments|RFC]]¹) | //No// | //Yes// | //Yes// | //Yes// | //No// | //Yes// | | Forward Compatibilty in PHP 7 | //No// | //Yes// | //No// | //No// | //No// | //No// | | Breaks BC of valid PHP 7 code | //Yes// | //Yes// | //Yes// | //No// | //No// | //No// | | Example of BC break code | //@@foo();// | //#[todo] comment // | //@["foo" => foo()];// | - | - | - | | Used by other language | //No// | //Yes// | //No// | //Yes// | //No// | //No// | | Using familiar symbols from Doc Comments | //Yes// | //No// | //Yes// | //No// | //Yes// | //Yes// | | Tokens used | //New T_ATTRIBUTE "@@"// | //New T_ATTRIBUTE "#["// | //New T_ATTRIBUTE "@["// | Existing T_SL, T_SR | //New T_ATTRIBUTE "@:"// | //New T_ATTRIBUTE "@{"// | | Changes the lexing of **remaining** tokens | //No// | //Yes// | //No// | //No// | //No// | //No// | | Does syntax prevent nested attributes in future? | //No// | //No// | //No// | //No// | //No// | //No// | | Target | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | 8.0 | | Patch | - | [[https://github.com/php/php-src/pull/5989|patch]] | [[https://github.com/php/php-src/pull/5928|patch]] | | [[https://github.com/theodorejb/php-src/pull/1|patch]] | [[https://github.com/php/php-src/pull/6012|patch]] | ¹ If the chosen syntax allows grouping, it will be reintroduced. Explanations: **Has End Delimiter** - An attribute syntax with ending delimiter means that the declaration of attributes is "always" enclosed in a start and an ending symbol, to more clearly separate them from other parts of the code. [[#discussion_on_ending_delimiterenclosing_delimiters|More Details Below]] **Allows Grouping** - Grouping syntax means that you can declare multiple attributes using one syntax construct. [[##discussion_on_grouping_procons|More Details Below]] **Forward Compatibility in PHP 7** means that you can use at least a subset of the new syntax in PHP 7 without the code causing parsing errors. It does not mean that you can already use attributes in PHP 7 already. [[#discussion_of_forwards_compatibility_procons|More Details Below]] **Breaks BC of valid PHP 7 code** means that the syntax chosen for attribute is already valid code with different meaning in PHP 7. As such existing code would need to change. [[#discussion_of_backwards_compatibility_breaks|More Details Below]] **Used by other language** means that this exact syntax is or was used by at least one other programming language for the same feature (Annotations, Attributes, Metadata). **Familiar with Docblock Usage** means that it resembles syntax that is already used at the moment in annotations made using PHP docblock comments. **Tokens used** explains if the syntax introduces a new token into the language or re-uses existing ones. **Changes lexing of remaining tokens** is related to forward compatibility. Specific usages of Attributes in PHP 8 can lead to code that compiles very differently on PHP 7 but still runs. [[#discussion_of_forwards_compatibility_procons|More Details Below]] **Does syntax prevent nested attributes in future?** This was added to make clear that none of the proposed syntaxes prevents the same symbols being used from potentially introducing nested attributes in the future. Even with grouped syntax the parser can be trivially made to distinguish between "top-level" attribute declarations that allow grouped syntax and nested attribute declarations that don't. ===== Syntax Side by Side ===== All syntaxes side by side in order of the table above. Syntaxes that would allow the use of a group syntax demonstrate single-line grouped use on the id property and ungrouped use on the email property and multi-line on the class declaration. /** * @psalm-suppress foo */ @@ORM\Entity @@ORM\Table("user") class User { @@ORM\Id @@ORM\Column("integer") @@ORM\GeneratedValue private $id; @@ORM\Column("string", ORM\Column::UNIQUE) @@Assert\Email(["message" => "The email '{{ value }}' is not a valid email."]) private $email; } /** * @psalm-suppress foo */ #[ ORM\Entity, ORM\Table("user") ] class User { #[ORM\Id, ORM\Column("integer"), ORM\GeneratedValue] private $id; #[ORM\Column("string", ORM\Column::UNIQUE)] #[Assert\Email(["message" => "The email '{{ value }}' is not a valid email."])] private $email; } /** * @psalm-suppress foo */ @[ ORM\Entity, ORM\Table("user") ] class User { @[ORM\Id, ORM\Column("integer"), ORM\GeneratedValue] private $id; @[ORM\Column("string", ORM\Column::UNIQUE)] @[Assert\Email(["message" => "The email '{{ value }}' is not a valid email."])] private $email; } /** * @psalm-suppress foo */ << ORM\Entity, ORM\Table("user") >> class User { <> private $id; <> < "The email '{{ value }}' is not a valid email."])>> private $email; } /** * @psalm-suppress foo */ @:ORM\Entity @:ORM\Table("user") class User { @:ORM\Id @:ORM\Column("integer") @:ORM\GeneratedValue private $id; @:ORM\Column("string", ORM\Column::UNIQUE) @:Assert\Email(["message" => "The email '{{ value }}' is not a valid email."]) private $email; } /** * @psalm-suppress foo */ @{ ORM\Entity, ORM\Table("user") } class User { @{ORM\Id, ORM\Column("integer"), ORM\GeneratedValue} private $id; @{ORM\Column("string", ORM\Column::UNIQUE)} @{Assert\Email(["message" => "The email '{{ value }}' is not a valid email."])} private $email; } ===== Discussion on Ending Delimiter / Enclosing Delimiters ===== The current syntax ''@@'' and the alternative ''@:'' both do not have an ending delimiter to mark the end of an attribute declaration. While this keeps the syntax extremely short and concise, it has a few downsides. ==== Complexity of Attribute Declaration ==== Many complex syntax constructs in PHP have an ending delimiter or are enclosed in a starting and corresponding ending delimiter pair. * Classes and function body are enclosed in ''{}'' * Blocks are enclosed in ''{}'' (with the exception of single line blocks with only one statement, but coding styles discourage them for that reason) * Argument/Parameter lists are enclosed in ''()'' * Statements end in '';'' * Doc Comments are enclosed in /** and */ * Arrays are enclosed in ''[]'' or ''array()''. Attributes are complex syntax as well, because they are built upon a large set of pre-existing complex parser rules, such as namespace parts, list of arguments, declaration of variables and constant expressions. An Attribute can be declared over multiple lines: @@\Doctrine\ORM\ManyToMany( targetEntity: User::class, joinColumn: "group_id", inverseJoinColumn: "user_id", cascade: array("persist", "remove") ) @@Assert\Valid @@JMSSerializer\XmlList(inline: true, entry: "user") public $users; A consistent ending delimiter would be helpful for screening attributes, both for humans and for machines. For humans a starting delimiter followed by a corresponding ending delimiter activates pattern recognition of the brain. @[ \Doctrine\ORM\ManyToMany( targetEntity: User::class, joinColumn: "group_id", inverseJoinColumn: "user_id", cascade: array("persist", "remove") ), Assert\Valid, JMSSerializer\XmlList(inline: true, entry: "user") ] public $users; Or without grouped: #[\Doctrine\ORM\ManyToMany( targetEntity: User::class, joinColumn: "group_id", inverseJoinColumn: "user_id", cascade: array("persist", "remove") )] #[Assert\Valid] #[JMSSerializer\XmlList(inline: true, entry: "user")] public $users; ==== Attributes are not like Modifiers ==== A counter argument why attributes should not need an ending delimiter is that attributes are modifiers to declarations, similar to the existing keywords "public", "protected", "private", "final" and so on, which also do not have an ending symbol. But this compares simple with complex syntax declarations and therefore falls short, because * these modifier keywords all have only exactly one token that can immediately follow them, T_WHITESPACE * they are all non-complex and are only made up of a handful ascii letters, not of arbitrary length argument lists. * these keywords are always on a single line and attributes can be declared over multiple lines * visibility keywords are only boolean or bitflags in Reflection, but Attributes are a full fledged ''ReflectionAttribute'' representing their own distinct language concept. Furthermore, a closing delimiter for a complex syntax feature has benefits for IDEs and editors: * Consistent colouring for being an end of the attribute syntax and the keywords in between can use different colors. * Implement regions to open/close the grouped declaration of one or multiple attributes. * For VIM users, the % operation to jump between opening and closing part of declaration that would automatically work with ''['' and '']''. When we compare Attributes to metadata, a consistent argument is to compare attribute declarations to another complex metadata declaration: docblock comments. They are also required to be enclosed by start and end symbols when defined on a single or multiple lines, notably // cannot be used to declare a docblock comment. /** * A comment describing things. * * @psalm-suppress SomeRule */ #[ ORM\Entity(), ORM\Table("baz") ] final class Something { } This groups docblock comment and attributes into two similarly shaped syntax blocks that prefix the declaration increasing familiarity. This might be more useful when attributes and docblock comment are declared the other way around, or mixed: #[ ORM\Entity(), ORM\Table("baz") ] /** * A comment describing things. * * @psalm-suppress SomeRule */ #[Another\Attribute] final class Something { } ==== Attributes are not like Type Declarations ==== A more complex part of a declaration that is not enclosed in syntax are types. Up until PHP 8 this was a simple scalar type or a class name. With PHP 8 union types added more complexity and could lead to multiline declarations even with reasonable coding standards. However a type declaration itself is not as complex as an attribute, which also includes arguments with constant expressions. There are two viewpoints to consider: * Union Types were added onto an existing syntax construct that was not enclosed as such inherited this decision. Other languages have the same union type syntax as PHP (TypeScript) but other languages require more explicit and complex "typedef" declarations. The union type RFC itself hints at a future extension with typedefs to reduce the complexity in declaration. * Union type declarations show that more complex syntax in declarations works without enclosing. ==== Forcing @@ Attributes to end with parenthesis does not solve issues ==== One suggestion around the missing end delimiter for ''@@'' was to always force the end with parenthesis. But this would not solve the attribute class and argument declaration potentially being detached from each other by arbitrary whitespace characters. @@Foo () function bar() {} In addition it would again provide an inconsistency, as attribute declarations are modelled after object instantiation, which are allowed without parenthesis (example ''new stdClass;'') @Foo() @Bar ==== Potential Future Benefits of Enclosed Delimiter Syntax ==== For any enclosed delimiter syntax such as ''@[]'', ''#[]'' or ''<<>>'' the attribute name and its arguments can be thought of as item in a list that is of type Attribute/object. In the future, the attributes concept can potentially be extended to other types in support of other styles of meta programming such as Aspect Oriented Design, Design By Contract, or even to allow simpler "attributes" than objects such as strings: @["foo", fn ($x) => $x*4] function foo($x) { return $x * 2;} While not impossible with ''@@'' it introduces more readability concerns on when an attribute ends: @@"foo" @@fn ($x) => $x*4 // a closure as attribute is likely not going to work with @@ function foo() {} Another approach to solve this could be wrapping more complex syntax in an attribute: @:Before(fn ($x) => $x*4) ===== Discussion on Grouping Pro/Cons ===== The optional grouping syntax was accepted as part of the [[rfc:attribute_amendments|Attribute Amendments RFC]], but removed as part of the vote for ''@@''. Since a choice for syntax affects the inclusion of grouping syntax a short discussion of the pros and cons should help the decision process: Pro Grouping: * Increased consistency between attribute declaration blocks and doc-comment blocks * Allow potentially to group attributes of different libraries together and separate from each other when they are put on a single declaration. * Can be implemented with just ~30 lines of new code and is not increasing complexity of maintenance Con Grouping: * Can introduce unnecessary noise in diffs when adding new attributes to a grouped attribute list depending on the choice of coding standards. * Adds a second syntax style to do the same thing * '@@' syntax is short and simple that it does not need grouping It should be noted that for both @@ and @: grouping can be added in the future as a secondary syntax, if we so choose to vote on an RFC on this topic for a future version. This might lead us with a syntax ''@@[Attr]'' though, when we could have a shorter unified syntax ''@[Attr]'' or ''#[Attr]'' now. ===== Discussion of Backwards Compatibility Breaks ===== Three of the proposed syntaxes break backwards compatibility at varying degrees. This section attempts to give a full overview of the BC break potential of each choice. Overall all three BC breaks are clean cut and immediately lead to compile errors when running old code using it on PHP 8. They all have simple, mechanical ways to address them with a single workaround that applies to every occurrence. ==== @@ Syntax and BC Breaks ==== The following code snippets work in PHP 7 and would break if ''@@'' becomes attribute syntax: In Words: Using the error suppress operator twice after each other will cause this BC break. Realistically this BC break is going to be extremely unlikely to happen and is the least critical from all three potential BC breaks. This BC break can be fixed mechanically with a project-wide search and replace from "@@" to "@ @" or "@" only even. ==== #[] Syntax and BC Breaks ==== The following code snippets work in PHP 7 and would break if ''#[]'' becomes attribute syntax: In Words: Commenting out a line or expression that starts with [ using # instead of // would trigger this BC break. Using grep.app a few occurrences in open source code of this BC break have been found, but overall it looks like it is used in old exploit scripts and not in code that is still used or widespread. The impact is higher than for ''@@''. https://grep.app/search?q=%23%5B&filter[lang][0]=PHP This BC break can be fixed mechanically with a project-wide search and replace from ''#['' to ''# ['', with some care as ''#['' sometimes appears in regular expressions. ==== @[] Syntax and BC Breaks ==== The following code snippets work in PHP 7 and would break if ''@[]'' becomes attribute syntax: In Words: Suppressing errors on a line that immediately follows with the short array syntax will not be possible with ''@[]'' for attributes anymore. Using grep.app occurrences of these code patterns cannot be found, only in regexp strings that would not be affected. Yet at least the explode case with short list syntax seems to be something that could be used widely, but is not at the moment. https://grep.app/search?q=%40%5B&filter[lang][0]=PHP This BC break can be fixed mechanically with a project-wide search and replace from ''@['' to ''@ ['', with some care as ''@['' often appears in regular expressions. As the short syntax for list expressions was only added in 7.1, there is unlikely going to be a very large body of code out there using this breaking pattern, as recent code is often run avoiding error suppression of this kind. ===== Discussion of Forwards Compatibility Pro/Cons ===== The ''#[]'' Syntax has the unique property of being forwards compatible, meaning specifically that the syntax can be used in PHP 7 code without leading to a compile fatal error, but is instead interpreted as a comment. This is different to ''@@'', ''@[]'' and ''<<>>'' which would lead to a fatal error during compilation when used with any PHP 7 version. The primary benefit of this forward compatibility is for libraries that want to use a class for an Attribute in PHP 8 but use it with doc-comment based Annotations libraries in PHP 7. This code compiles fine on PHP 7, interpreting #[Attribute] as a comment. While #[] would allow forward compatibility, it is important to mention that it would not work for 100% of all attribute syntax uses with #[] and in the cases it does not work, it might break code on PHP 7 in subtle ways. The forward compatibility does lead to a few theoretically problematic cases where working code in PHP 7 and PHP 8 behaves very different: As PHP doesn't warn about calling a user-defined function too many parameters, making this harder to detect. Especially since IDEs and PHPs own linter will accept the syntax as valid PHP. Code would need to be written a very specific way to benefit from forwards compatibility: Another example where code would be interpreted differently on PHP 7: $f1 = #[ExampleAttribute] function () {}; $f2 = #[ExampleAttribute] fn() => 1; $object = new #[ExampleAttribute] class () {}; foo(); // On PHP 7 this is interpreted as $f1 = $f2 = $object = new foo(); This example echoes the rest of the source code in php 7 and echoes "Test" in php 8. ')] function main() {} const APP_SECRET = 'app-secret'; echo "Test\n"; These examples are artificially crafted and would only be problematic on new attribute code that runs on PHP 7. Developers writing code running on multiple versions need to handle versions differences already, so these problems might not be a problem at all in the end. This is especially true, because these edge cases will not happen for existing PHP 7 code running on PHP 8, but only when new code primarily written for PHP 8 is then also run on PHP 7. So ultimately the fact that the ''#[]'' syntax is only forward compatible in a very narrow scope might not cause that big a problem. Credit here goes to Tyson who thoroughly documented the potential problems in https://externals.io/message/111416#111508 ===== Discussion on grep'ability ===== One argument made on the discussion thread was that ''@@'' (and also ''@:'') are easier to grep for than the other syntaxes that allow start symbols to be on another line than the attribute name. But since attribute names are imported class names, you cannot rely on just a grep, because the attribute could be renamed during import: use MyProject\Attributes\FooAttr as BarAttr; use MyProject\Attributes as Attr; @@FooAttr @@MyProject\FooAttr @@Attr\FooAttr @@BarAttr function foo() { } These declarations all refer to the same attribute. As such we did not include "Better Grep'ability" as a Yes/No argument in the Proposals overview table, since it depends on coding style and in every syntax a coding style can be found that has good grepability. ===== Proposed PHP Version(s) ===== PHP 8.0 ===== Voting ===== A first vote (⅔rds) to allow the vote to change the syntax An STV vote among all the qualifying syntaxes. With STV you SHOULD rank **all** the choices in order. Don't pick the same option more than once, as that invalidates your vote. Voting started at August 19th, 2020 10:45 UTC and will run until September 2nd, 12:00 UTC. **Wondering where your vote cast between August 10th to August 15th went? We stopped and restarted the vote because the minimum discussion period was not over yet. Sorry for the inconvenience.** ==== Primary vote ==== * Yes * No ==== Secondary vote ==== This is a ranked-choice poll (following [[https://en.wikipedia.org/wiki/Single_transferable_vote#Example|STV]]) between the syntax alternatives. We use the Droop quota, and on a tie, the syntax with the highest first preferential votes wins. You can vote **six** times, but make sure you select each syntax only once. === First preference === * @@Attr * #[Attr] * @[Attr] * <> * @:Attr * @{Attr} === Second preference === * @@Attr * #[Attr] * @[Attr] * <> * @:Attr * @{Attr} === Third preference === * @@Attr * #[Attr] * @[Attr] * <> * @:Attr * @{Attr} === Fourth preference === * @@Attr * #[Attr] * @[Attr] * <> * @:Attr * @{Attr} === Fifth preference === * @@Attr * #[Attr] * @[Attr] * <> * @:Attr * @{Attr} === Sixth preference === * @@Attr * #[Attr] * @[Attr] * <> * @:Attr * @{Attr} ===== Implementation ===== Implemented into PHP 8.0 via http://git.php.net/?p=php-src.git;a=commit;h=8b37c1e9. ===== References ===== Links to external references, discussions or RFCs * Original RFC: https://wiki.php.net/rfc/shorter_attribute_syntax * https://externals.io/message/111101 * https://www.reddit.com/r/PHP/comments/hjpu79/it_is/ * An RFC that by coincidence fixes the original parser conflict: https://wiki.php.net/rfc/namespaced_names_as_token ===== Updates ====== * v0.3: Removed "Difficulties for Userland Parsers" as its a subjective opinion and boils down to the fact that a new token T_ATTRIBUTE is introduced in some syntaxes that would include tokens that were parsed differently in previous PHP versions. Added "Tokens used" and "Changes the lexing of remaining tokens". * v0.4 Added more details about BC breaks and forward compatibility issues. * v0.5 add new sections summarizing different discussions from the mailing list * v0.6 added section on why attributes not compare against even more complex type declarations, removed section on machine parsing as too narrow and ultimately not important * v0.7 added last minute syntax entry ''@{}''