rfc:uniform_variable_syntax
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
rfc:uniform_variable_syntax [2014/06/02 19:39] – nikic | rfc:uniform_variable_syntax [2014/07/30 00:03] – Whatever we'd both like it to have been, this is for 7, not 6 ;) ajf | ||
---|---|---|---|
Line 2: | Line 2: | ||
* Date: 2014-05-31 | * Date: 2014-05-31 | ||
* Author: Nikita Popov < | * Author: Nikita Popov < | ||
- | * Status: | + | * Status: |
- | * Proposed for: PHP 6 | + | * Discussion: http:// |
- | ===== Overview | + | ===== Introduction |
This RFC proposes the introduction of an internally consistent and complete variable syntax. To achieve this goal the | This RFC proposes the introduction of an internally consistent and complete variable syntax. To achieve this goal the | ||
semantics of some rarely used variable-variable constructions need to be changed. | semantics of some rarely used variable-variable constructions need to be changed. | ||
- | TODO | + | Examples of expressions that were previously invalid, but will be valid with the uniform variable syntax: |
+ | |||
+ | <code php> | ||
+ | // support missing combinations of operations | ||
+ | $foo()[' | ||
+ | [$obj1, $obj2][0]-> | ||
+ | getStr(){0} | ||
+ | |||
+ | // support nested :: | ||
+ | $foo[' | ||
+ | $foo:: | ||
+ | $foo-> | ||
+ | |||
+ | // support nested () | ||
+ | foo()() | ||
+ | $foo-> | ||
+ | Foo:: | ||
+ | $foo()() | ||
+ | |||
+ | // support operations on arbitrary (...) expressions | ||
+ | (...)[' | ||
+ | (...)-> | ||
+ | (...)-> | ||
+ | (...):: | ||
+ | (...):: | ||
+ | (...)() | ||
+ | |||
+ | // two more practical examples for the last point | ||
+ | (function() { ... })() | ||
+ | ($obj-> | ||
+ | |||
+ | // support all operations on dereferencable scalars (not very useful) | ||
+ | " | ||
+ | [$obj, ' | ||
+ | ' | ||
+ | </ | ||
+ | |||
+ | Example of expressions those meaning changes: | ||
+ | |||
+ | <code php> | ||
+ | // old meaning | ||
+ | $$foo[' | ||
+ | $foo-> | ||
+ | $foo-> | ||
+ | Foo:: | ||
+ | </ | ||
+ | |||
+ | Examples of statements which are no longer supported: | ||
+ | |||
+ | <code php> | ||
+ | global $$foo-> | ||
+ | // instead use: | ||
+ | global ${$foo-> | ||
+ | </ | ||
===== Issues with the current syntax ===== | ===== Issues with the current syntax ===== | ||
Line 17: | Line 70: | ||
The root cause for most issues in PHP's current variable syntax are the semantics of the variable-variable syntax | The root cause for most issues in PHP's current variable syntax are the semantics of the variable-variable syntax | ||
- | '' | + | '' |
- | '' | + | '' |
Why this choice of semantics is problematic to our parser design is explained in the next section. Before getting to | Why this choice of semantics is problematic to our parser design is explained in the next section. Before getting to | ||
Line 24: | Line 77: | ||
Normally variable accesses are interpreted from left to right. '' | Normally variable accesses are interpreted from left to right. '' | ||
- | named '' | + | named '' |
- | '' | + | The '' |
- | offset, it will first fetch '' | + | its '' |
+ | with the name of the result. | ||
- | This combination of an indirect reference and an offset is the **only** case where interpretation is inverted. For | + | This combination of an indirect reference and an offset is the **only** case where the interpretation is inverted. For |
example the very similar '' | example the very similar '' | ||
- | It follows normal left-to-right semantics. Similarly '' | + | It follows normal left-to-right semantics. Similarly '' |
- | '' | + | and not as '' |
To ensure maximum possible inconsistency there exists an exception to this rule. Namely '' | To ensure maximum possible inconsistency there exists an exception to this rule. Namely '' | ||
Line 37: | Line 91: | ||
This issue applies not only to simple indirect references, but also to indirected property and method names. For example | This issue applies not only to simple indirect references, but also to indirected property and method names. For example | ||
- | '' | + | '' |
- | and 3 offsets. On the other hand '' | + | the 1, 2 and 3 offsets. On the other hand '' |
- | different interpretation. This does not call the function stored at '' | + | entirely |
- | method of class '' | + | calls the static method of class '' |
The last issue implies that PHP's variable syntax is non-local. It is not possible to parse a PHP variable access with | The last issue implies that PHP's variable syntax is non-local. It is not possible to parse a PHP variable access with | ||
- | a fixed finite lookahead | + | a fixed finite lookahead, without transplanting the generated syntax tree or instructions after the fact. |
- | or instructions after the fact. | + | |
==== Impact on parser definition ==== | ==== Impact on parser definition ==== | ||
- | In addition to the problems described above the semantics for indirect references also has far-reaching consequences on | + | In addition to the problems described above the semantics for indirect references also have far-reaching consequences on |
- | how the variable syntax is defined in our parser. In the following I will outline the kind of issue it causes, for | + | how the variable syntax is defined in our parser. In the following I will outline the kind of issues |
readers not familiar with parser construction: | readers not familiar with parser construction: | ||
- | The " | + | The " |
- | which could look roughly as follows (somewhat simplified): | + | rule, which could look roughly as follows (somewhat simplified): |
< | < | ||
Line 70: | Line 123: | ||
left-to-right, | left-to-right, | ||
- | What happens to this scheme if we add a (right-associative) '' | + | What happens to this scheme if we add a (right-associative) '' |
defined as follows: | defined as follows: | ||
Line 88: | Line 141: | ||
</ | </ | ||
- | However, this is not possible because it makes the grammer | + | However, this is not possible because it makes the grammar |
- | could either interpret it using the ''' | + | could either interpret it using the '' |
- | '' | + | the '' |
- | conflict" | + | " |
- | How can this issue be resolved? By removing the '' | + | How can this issue be resolved? By removing the '' |
- | can no longer write '' | + | you can no longer write '' |
- | other dereferencing types: You need to implement it for '' | + | other dereferencing types: You need to implement it for '' |
- | Furthermore you need to ensure that you can continue to nest arbitrary types of dereferences after that. | + | '' |
+ | that. | ||
- | This is both extremely complicated and fragile. This is the reason why PHP only introduced the '' | + | This is both extremely complicated and fragile. This is the reason why PHP only introduced the '' |
- | in PHP 5.4 and even then the support is not perfect. | + | syntax |
==== Incomplete dereferencing support ==== | ==== Incomplete dereferencing support ==== | ||
Because of the implementational hurdles described in the previous section, we do not support all combinations of | Because of the implementational hurdles described in the previous section, we do not support all combinations of | ||
- | dereferencing operations to an arbitrary | + | dereferencing operations to an arbitrary |
'' | '' | ||
Basically, there are two classes of issues. The first one is that we do not always properly support nesting of different | Basically, there are two classes of issues. The first one is that we do not always properly support nesting of different | ||
- | dereferencing types. For example, while it is possible to write both '' | + | dereferencing types. For example, while it is possible to write both '' |
- | combination '' | + | the combination '' |
- | implemented in PHP 5.5 allows you to write '' | + | syntax |
- | Yet another example is that the alternative array syntax '' | + | possible. Yet another example is that the alternative array syntax '' |
- | '' | + | i.e. '' |
- | The second class of issues is that some nesting types aren't supported altogether. For example ''::'' | + | The second class of issues is that some nesting types aren't supported altogether. For example '' |
- | simple reference variables on the left hand side. Writing something like '' | + | simple reference variables on the left hand side. Writing something like '' |
- | possible. Writing '' | + | not possible. Writing '' |
familiar from JavaScript is not allowed as well. | familiar from JavaScript is not allowed as well. | ||
Lack of support for dereferencing parenthesis-expressions also prevents you from properly disambiguating some | Lack of support for dereferencing parenthesis-expressions also prevents you from properly disambiguating some | ||
- | expressions. For example, it is a common problem that '' | + | expressions. For example, it is a common problem that '' |
- | rather than calling the closure stored in '' | + | method, rather than calling the closure stored in '' |
- | need to use a temporary variable. Another example is the case of '' | + | possible and you need to use a temporary variable. Another example is the case of '' |
- | interpreted as '' | + | above (which is interpreted as '' |
- | '' | + | behavior |
==== Miscellaneous other issues ==== | ==== Miscellaneous other issues ==== | ||
- | === Behavior in write/ | + | === Behavior in non-read |
- | The new '' | + | The new '' |
- | " | + | " |
- | if it would be technically possible. E.g. there is nothing inherently problematic with writing '' | + | |
- | (it's no different | + | |
- | Furthermore this causes inconsistent behavior in '' | + | For example |
- | generate an " | + | |
- | === Superfluous CVs on static property access === | + | This also means that assignments to dereferences of parenthesis-expressions are never allowed, even when they would be technically possible. E.g. it's not possible to write '' |
- | Upon encountering a static property access | + | === Superfluous CVs on static property access |
- | even though it is not necessary and never used. This is once again related to the way static member access needs to be | + | |
- | implemented to support our weird indirect reference semantics. | + | |
- | === TODO === | + | Upon encountering a static property access '' |
+ | '' | ||
+ | needs to be implemented to support our weird indirect reference semantics. | ||
===== Proposal ===== | ===== Proposal ===== | ||
Line 151: | Line 202: | ||
A formal definition of the new variable syntax is provided in Bison syntax. This is a slightly simplified version of the | A formal definition of the new variable syntax is provided in Bison syntax. This is a slightly simplified version of the | ||
- | grammer used in the actual implementation. Furthermore definitions for '' | + | grammer used in the actual implementation. Furthermore definitions for '' |
- | '' | + | '' |
< | < | ||
Line 213: | Line 264: | ||
</ | </ | ||
- | ==== Differences | + | ==== Semantic differences |
The main difference to the existing variable syntax, is that indirect variable, property and method references are now | The main difference to the existing variable syntax, is that indirect variable, property and method references are now | ||
Line 219: | Line 270: | ||
< | < | ||
- | $$foo[' | + | $$foo[' |
- | Foo::$bar[' | + | $foo->$bar[' |
- | $foo-> | + | $foo-> |
+ | Foo:: | ||
</ | </ | ||
Line 227: | Line 279: | ||
However it is always possible to recreate the old behavior by explicitly using braces: | However it is always possible to recreate the old behavior by explicitly using braces: | ||
- | < | + | < |
${$foo[' | ${$foo[' | ||
Foo:: | Foo:: | ||
Line 237: | Line 289: | ||
==== Newly added and generalized syntax ==== | ==== Newly added and generalized syntax ==== | ||
- | * There are no longer any restrictions on nesting of dereferencing operations. In particular the examples '' | + | * There are no longer any restrictions on nesting of dereferencing operations. In particular the examples '' |
- | * Static property fetches and method calls can now be applied to any dereferencable expression. E.g. '' | + | * Static property fetches and method calls can now be applied to any dereferencable expression. E.g. '' |
- | * The result of a call can now be directly called again, i.e. all of '' | + | * The result of a call can now be directly called again, i.e. all of '' |
* All dereferencing operations can now be applied to arbitrary parenthesis-expressions. I.e. all of '' | * All dereferencing operations can now be applied to arbitrary parenthesis-expressions. I.e. all of '' | ||
- | * All dereferencing operations can now be applied to dereferencable scalars (array and string literals as of PHP 5.5). E.g. it is possible to write '' | + | * All dereferencing operations can now be applied to dereferencable scalars (array and string literals as of PHP 5.5). E.g. it is possible to write '' |
+ | ==== Global keyword takes only simple variables ==== | ||
+ | |||
+ | Previously the '' | ||
+ | variable could follow after the '' | ||
+ | |||
+ | The '' | ||
+ | '' | ||
+ | |||
+ | ==== Behavior in write context ==== | ||
+ | |||
+ | Expressions of type '' | ||
+ | applied to a non-variable) are now parsed as a '' | ||
+ | previously). | ||
+ | |||
+ | This means that the expression will behave correctly in '' | ||
+ | throw an undefined offset notice. Furthermore it is now possible to assign to expressions of this kind, for example | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | However assignment is not allowed if the left hand expression yields an '' | ||
+ | For example the expression '' | ||
+ | compile error. For '' | ||
+ | |||
+ | ==== Class name variable for new expression ==== | ||
+ | |||
+ | It has always been possible to create classes using a dynamically specified class name by writing '' | ||
+ | However the supported variables are more limited in this case: They may not include calls anywhere, as this would cause | ||
+ | ambiguity with the constructor parameter list. New variables are now defined as follows: | ||
+ | |||
+ | < | ||
+ | new_variable: | ||
+ | simple_variable | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | ; | ||
+ | </ | ||
+ | |||
+ | This matches the previously allowed variable expressions, | ||
+ | For example '' | ||
+ | | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | TODO | + | The changes described in [[# |
+ | [[# | ||
+ | breaks. | ||
+ | |||
+ | The former is a change in the behavior of currently existing syntax. Examples: | ||
+ | |||
+ | <code php> | ||
+ | // old meaning | ||
+ | $$foo[' | ||
+ | $foo-> | ||
+ | $foo-> | ||
+ | Foo:: | ||
+ | </ | ||
+ | |||
+ | An analysis of the Zend Framework and Symfony projects (including standard dependencies) showed that only a single | ||
+ | occurrence of '' | ||
+ | This occurrence must be replaced with '' | ||
+ | PHP 5 and PHP 6. | ||
+ | |||
+ | The latter change turns currently valid syntax into a parse error. Expressions like '' | ||
+ | longer valid and '' | ||
+ | |||
+ | As these changes only apply to some very rarely used syntax, the breakage seems acceptable for PHP 6. | ||
+ | |||
+ | ===== Open issues ===== | ||
+ | |||
+ | The current patch introduces a new "write context" | ||
+ | was as '' | ||
+ | not exist, whereas the latter does not throw a notice. | ||
+ | |||
+ | The reason for this is that '' | ||
+ | the part in parentheses will always be compiled in "read context", | ||
+ | |||
+ | This issue already exists currently, in a different context: The expression '' | ||
+ | standards notice, if '' | ||
+ | reference. On the other hand '' | ||
+ | as an expression instead of a variable. | ||
===== Patch ===== | ===== Patch ===== | ||
- | https:// | + | An implementation of this proposal against the phpng branch is available at https:// |
+ | |||
+ | The main changes are limited to the language parser and compiler. Furthermore some opcode handlers had to be modified | ||
+ | to support '' | ||
+ | |||
+ | ===== Vote ===== | ||
+ | |||
+ | As this is a language change, a 2/3 majority is required for acceptance. The vote started on 2014-07-07 and ended on 2014-07-14. | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ |
rfc/uniform_variable_syntax.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1