rfc:uniform_variable_syntax
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
rfc:uniform_variable_syntax [2014/05/31 19:14] – created nikic | rfc:uniform_variable_syntax [2017/09/22 13:28] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
* Date: 2014-05-31 | * Date: 2014-05-31 | ||
* Author: Nikita Popov < | * Author: Nikita Popov < | ||
- | * Status: | + | * Status: |
- | * Proposed for: PHP 6 | + | * Discussion: http:// |
===== Introduction ===== | ===== Introduction ===== | ||
+ | This RFC proposes the introduction of an internally consistent and complete variable syntax. To achieve this goal the | ||
+ | semantics of some rarely used variable-variable constructions need to be changed. | ||
+ | |||
+ | Examples of expressions that were previously invalid, but will be valid with the uniform variable syntax: | ||
+ | |||
+ | <code php> | ||
+ | // support missing combinations of operations | ||
+ | $foo()[' | ||
+ | [$obj1, $obj2][0]-> | ||
+ | getStr(){0} | ||
+ | |||
+ | // support nested :: | ||
+ | $foo[' | ||
+ | $foo:: | ||
+ | $foo-> | ||
+ | |||
+ | // support nested () | ||
+ | foo()() | ||
+ | $foo-> | ||
+ | Foo:: | ||
+ | $foo()() | ||
+ | |||
+ | // support operations on arbitrary (...) expressions | ||
+ | (...)[' | ||
+ | (...)-> | ||
+ | (...)-> | ||
+ | (...)::$foo | ||
+ | (...):: | ||
+ | (...)() | ||
+ | |||
+ | // two more practical examples for the last point | ||
+ | (function() { ... })() | ||
+ | ($obj-> | ||
+ | |||
+ | // support all operations on dereferencable scalars (not very useful) | ||
+ | " | ||
+ | [$obj, ' | ||
+ | ' | ||
+ | </ | ||
+ | |||
+ | Example of expressions those meaning changes: | ||
+ | |||
+ | <code php> | ||
+ | // old meaning | ||
+ | $$foo[' | ||
+ | $foo-> | ||
+ | $foo-> | ||
+ | Foo:: | ||
+ | </ | ||
+ | |||
+ | Examples of statements which are no longer supported: | ||
+ | |||
+ | <code php> | ||
+ | global $$foo-> | ||
+ | // instead use: | ||
+ | global ${$foo-> | ||
+ | </ | ||
+ | |||
+ | ===== Issues with the current syntax ===== | ||
+ | |||
+ | ==== Root cause ==== | ||
+ | |||
+ | The root cause for most issues in PHP's current variable syntax are the semantics of the variable-variable syntax | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | Why this choice of semantics is problematic to our parser design is explained in the next section. Before getting to | ||
+ | that, I will discuss why this choice is also bad from a language design perspective: | ||
+ | |||
+ | Normally variable accesses are interpreted from left to right. '' | ||
+ | named '' | ||
+ | The '' | ||
+ | its '' | ||
+ | with the name of the result. | ||
+ | |||
+ | This combination of an indirect reference and an offset is the **only** case where the interpretation is inverted. For | ||
+ | example the very similar '' | ||
+ | It follows normal left-to-right semantics. Similarly '' | ||
+ | and not as '' | ||
+ | |||
+ | To ensure maximum possible inconsistency there exists an exception to this rule. Namely '' | ||
+ | **will** be interpreted as '' | ||
+ | |||
+ | This issue applies not only to simple indirect references, but also to indirected property and method names. For example | ||
+ | '' | ||
+ | the 1, 2 and 3 offsets. On the other hand '' | ||
+ | entirely different interpretation: | ||
+ | calls the static method of class '' | ||
+ | |||
+ | The last issue implies that PHP's variable syntax is non-local. It is not possible to parse a PHP variable access with | ||
+ | a fixed finite lookahead, without transplanting the generated syntax tree or instructions after the fact. | ||
+ | |||
+ | ==== Impact on parser definition ==== | ||
+ | |||
+ | In addition to the problems described above the semantics for indirect references also have far-reaching consequences on | ||
+ | how the variable syntax is defined in our parser. In the following I will outline the kind of issues it causes, for | ||
+ | readers not familiar with parser construction: | ||
+ | |||
+ | The " | ||
+ | rule, which could look roughly as follows (somewhat simplified): | ||
+ | |||
+ | < | ||
+ | variable: | ||
+ | T_VARIABLE | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | ... | ||
+ | ; | ||
+ | </ | ||
+ | |||
+ | This approach ensures that we can arbitrarily nest different access types (in the following called " | ||
+ | example the above definition allows you to write '' | ||
+ | left-to-right, | ||
+ | |||
+ | What happens to this scheme if we add a (right-associative) '' | ||
+ | defined as follows: | ||
+ | |||
+ | < | ||
+ | reference_variable: | ||
+ | T_VARIABLE | ||
+ | | | ||
+ | ; | ||
+ | |||
+ | variable: | ||
+ | T_VARIABLE | ||
+ | | ' | ||
+ | | | ||
+ | | | ||
+ | | ... | ||
+ | ; | ||
+ | </ | ||
+ | |||
+ | However, this is not possible because it makes the grammar ambiguous. When the parser encounters '' | ||
+ | could either interpret it using the '' | ||
+ | the '' | ||
+ | " | ||
+ | |||
+ | How can this issue be resolved? By removing the '' | ||
+ | you can no longer write '' | ||
+ | other dereferencing types: You need to implement it for '' | ||
+ | '' | ||
+ | that. | ||
+ | |||
+ | This is both extremely complicated and fragile. This is the reason why PHP only introduced the '' | ||
+ | syntax in PHP 5.4 and even then the support is not perfect. | ||
+ | |||
+ | ==== Incomplete dereferencing support ==== | ||
+ | |||
+ | Because of the implementational hurdles described in the previous section, we do not support all combinations of | ||
+ | dereferencing operations to an arbitrary depth. While PHP 5.4 fixed the most glaring issue (support for | ||
+ | '' | ||
+ | |||
+ | Basically, there are two classes of issues. The first one is that we do not always properly support nesting of different | ||
+ | dereferencing types. For example, while it is possible to write both '' | ||
+ | the combination '' | ||
+ | syntax implemented in PHP 5.5 allows you to write '' | ||
+ | possible. Yet another example is that the alternative array syntax '' | ||
+ | i.e. '' | ||
+ | |||
+ | The second class of issues is that some nesting types aren't supported altogether. For example '' | ||
+ | simple reference variables on the left hand side. Writing something like '' | ||
+ | not possible. Writing '' | ||
+ | familiar from JavaScript is not allowed as well. | ||
+ | |||
+ | Lack of support for dereferencing parenthesis-expressions also prevents you from properly disambiguating some | ||
+ | expressions. For example, it is a common problem that '' | ||
+ | method, rather than calling the closure stored in '' | ||
+ | possible and you need to use a temporary variable. Another example is the case of '' | ||
+ | above (which is interpreted as '' | ||
+ | behavior '' | ||
+ | |||
+ | ==== Miscellaneous other issues ==== | ||
+ | |||
+ | === Behavior in non-read context === | ||
+ | |||
+ | The new '' | ||
+ | " | ||
+ | |||
+ | For example '' | ||
+ | |||
+ | This also means that assignments to dereferences of parenthesis-expressions are never allowed, even when they would be technically possible. E.g. it's not possible to write '' | ||
+ | |||
+ | === Superfluous CVs on static property access === | ||
+ | |||
+ | Upon encountering a static property access '' | ||
+ | '' | ||
+ | needs to be implemented to support our weird indirect reference semantics. | ||
===== Proposal ===== | ===== Proposal ===== | ||
+ | ==== Formal definition ==== | ||
+ | A formal definition of the new variable syntax is provided in Bison syntax. This is a slightly simplified version of the | ||
+ | grammer used in the actual implementation. Furthermore definitions for '' | ||
+ | '' | ||
+ | |||
+ | < | ||
+ | variable: | ||
+ | callable_variable | ||
+ | | class_name_or_dereferencable T_PAAMAYIM_NEKUDOTAYIM simple_variable | ||
+ | | dereferencable T_OBJECT_OPERATOR member_name | ||
+ | ; | ||
+ | |||
+ | callable_variable: | ||
+ | simple_variable | ||
+ | | dereferencable ' | ||
+ | | dereferencable ' | ||
+ | | ||
+ | | function_name function_call_parameter_list | ||
+ | | dereferencable T_OBJECT_OPERATOR member_name function_call_parameter_list | ||
+ | | class_name_or_dereferencable T_PAAMAYIM_NEKUDOTAYIM member_name function_call_parameter_list | ||
+ | | callable_expr function_call_parameter_list | ||
+ | ; | ||
+ | |||
+ | simple_variable: | ||
+ | T_VARIABLE | ||
+ | | ' | ||
+ | | ' | ||
+ | ; | ||
+ | |||
+ | dereferencable: | ||
+ | variable | ||
+ | | ' | ||
+ | | dereferencable_scalar | ||
+ | ; | ||
+ | |||
+ | dereferencable_scalar: | ||
+ | T_ARRAY ' | ||
+ | | ' | ||
+ | | T_CONSTANT_ENCAPSED_STRING | ||
+ | ; | ||
+ | |||
+ | class_name_or_dereferencable: | ||
+ | class_name | ||
+ | | | ||
+ | ; | ||
+ | |||
+ | member_name: | ||
+ | T_STRING | ||
+ | | ' | ||
+ | | simple_variable | ||
+ | ; | ||
+ | |||
+ | dim_offset: | ||
+ | /* empty */ | ||
+ | | expr | ||
+ | ; | ||
+ | |||
+ | callable_expr: | ||
+ | callable_variable | ||
+ | | ' | ||
+ | | dereferencable_scalar | ||
+ | ; | ||
+ | </ | ||
+ | |||
+ | ==== Semantic differences in existing syntax ==== | ||
+ | |||
+ | The main difference to the existing variable syntax, is that indirect variable, property and method references are now | ||
+ | interpreted with left-to-right semantics. Examples: | ||
+ | |||
+ | < | ||
+ | $$foo[' | ||
+ | $foo-> | ||
+ | $foo-> | ||
+ | Foo:: | ||
+ | </ | ||
+ | |||
+ | This change is **backwards incompatible** (with low practical impact), which is the reason why this RFC targets PHP 7. | ||
+ | However it is always possible to recreate the old behavior by explicitly using braces: | ||
+ | |||
+ | <code php> | ||
+ | ${$foo[' | ||
+ | Foo:: | ||
+ | $foo-> | ||
+ | </ | ||
+ | |||
+ | This syntax will have guaranteed same behavior in both PHP 5 and PHP 7. | ||
+ | |||
+ | ==== Newly added and generalized syntax ==== | ||
+ | |||
+ | * There are no longer any restrictions on nesting of dereferencing operations. In particular the examples '' | ||
+ | * Static property fetches and method calls can now be applied to any dereferencable expression. E.g. '' | ||
+ | * The result of a call can now be directly called again, i.e. all of '' | ||
+ | * All dereferencing operations can now be applied to arbitrary parenthesis-expressions. I.e. all of '' | ||
+ | * All dereferencing operations can now be applied to dereferencable scalars (array and string literals as of PHP 5.5). E.g. it is possible to write '' | ||
+ | |||
+ | ==== Global keyword takes only simple variables ==== | ||
+ | |||
+ | Previously the '' | ||
+ | variable could follow after the '' | ||
+ | |||
+ | The '' | ||
+ | '' | ||
+ | |||
+ | ==== Behavior in write context ==== | ||
+ | |||
+ | Expressions of type '' | ||
+ | applied to a non-variable) are now parsed as a '' | ||
+ | previously). | ||
+ | |||
+ | This means that the expression will behave correctly in '' | ||
+ | throw an undefined offset notice. Furthermore it is now possible to assign to expressions of this kind, for example | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | However assignment is not allowed if the left hand expression yields an '' | ||
+ | For example the expression '' | ||
+ | compile error. For '' | ||
+ | |||
+ | ==== Class name variable for new expression ==== | ||
+ | |||
+ | It has always been possible to create classes using a dynamically specified class name by writing '' | ||
+ | However the supported variables are more limited in this case: They may not include calls anywhere, as this would cause | ||
+ | ambiguity with the constructor parameter list. New variables are now defined as follows: | ||
+ | |||
+ | < | ||
+ | new_variable: | ||
+ | simple_variable | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | ; | ||
+ | </ | ||
+ | |||
+ | This matches the previously allowed variable expressions, | ||
+ | For example '' | ||
+ | | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
+ | The changes described in [[# | ||
+ | [[# | ||
+ | breaks. | ||
+ | |||
+ | The former is a change in the behavior of currently existing syntax. Examples: | ||
+ | |||
+ | <code php> | ||
+ | // old meaning | ||
+ | $$foo[' | ||
+ | $foo-> | ||
+ | $foo-> | ||
+ | Foo:: | ||
+ | </ | ||
+ | |||
+ | An analysis of the Zend Framework and Symfony projects (including standard dependencies) showed that only a single | ||
+ | occurrence of '' | ||
+ | This occurrence must be replaced with '' | ||
+ | PHP 5 and PHP 7. | ||
+ | |||
+ | The latter change turns currently valid syntax into a parse error. Expressions like '' | ||
+ | longer valid and '' | ||
+ | |||
+ | As these changes only apply to some very rarely used syntax, the breakage seems acceptable for PHP 7. | ||
+ | |||
+ | ===== Open issues ===== | ||
+ | |||
+ | The current patch introduces a new "write context" | ||
+ | was as '' | ||
+ | not exist, whereas the latter does not throw a notice. | ||
+ | |||
+ | The reason for this is that '' | ||
+ | the part in parentheses will always be compiled in "read context", | ||
+ | |||
+ | This issue already exists currently, in a different context: The expression '' | ||
+ | standards notice, if '' | ||
+ | reference. On the other hand '' | ||
+ | as an expression instead of a variable. | ||
===== Patch ===== | ===== Patch ===== | ||
- | https:// | + | An implementation of this proposal against the phpng branch is available at https:// |
+ | |||
+ | The main changes are limited to the language parser and compiler. Furthermore some opcode handlers had to be modified | ||
+ | to support '' | ||
+ | |||
+ | ===== Vote ===== | ||
+ | |||
+ | As this is a language change, a 2/3 majority is required for acceptance. The vote started on 2014-07-07 and ended on 2014-07-14. | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ |
rfc/uniform_variable_syntax.1401563662.txt.gz · Last modified: 2017/09/22 13:28 (external edit)