rfc:uniform_variable_syntax
Differences
This shows you the differences between two versions of the page.
rfc:uniform_variable_syntax [2014/07/07 14:13] nikic start vote |
rfc:uniform_variable_syntax [2017/09/22 13:28] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: Uniform Variable Syntax ====== | ||
- | * Date: 2014-05-31 | ||
- | * Author: Nikita Popov < | ||
- | * Status: In Voting | ||
- | * Proposed for: PHP 6 | ||
- | * Discussion: http:// | ||
- | |||
- | ===== Introduction ===== | ||
- | |||
- | This RFC proposes the introduction of an internally consistent and complete variable syntax. To achieve this goal the | ||
- | semantics of some rarely used variable-variable constructions need to be changed. | ||
- | |||
- | Examples of expressions that were previously invalid, but will be valid with the uniform variable syntax: | ||
- | |||
- | <code php> | ||
- | // support missing combinations of operations | ||
- | $foo()[' | ||
- | [$obj1, $obj2][0]-> | ||
- | getStr(){0} | ||
- | |||
- | // support nested :: | ||
- | $foo[' | ||
- | $foo:: | ||
- | $foo-> | ||
- | |||
- | // support nested () | ||
- | foo()() | ||
- | $foo-> | ||
- | Foo:: | ||
- | $foo()() | ||
- | |||
- | // support operations on arbitrary (...) expressions | ||
- | (...)[' | ||
- | (...)-> | ||
- | (...)-> | ||
- | (...)::$foo | ||
- | (...):: | ||
- | (...)() | ||
- | |||
- | // two more practical examples for the last point | ||
- | (function() { ... })() | ||
- | ($obj-> | ||
- | |||
- | // support all operations on dereferencable scalars (not very useful) | ||
- | " | ||
- | [$obj, ' | ||
- | ' | ||
- | </ | ||
- | |||
- | Example of expressions those meaning changes: | ||
- | |||
- | <code php> | ||
- | // old meaning | ||
- | $$foo[' | ||
- | $foo-> | ||
- | $foo-> | ||
- | Foo:: | ||
- | </ | ||
- | |||
- | Examples of statements which are no longer supported: | ||
- | |||
- | <code php> | ||
- | global $$foo-> | ||
- | // instead use: | ||
- | global ${$foo-> | ||
- | </ | ||
- | |||
- | ===== Issues with the current syntax ===== | ||
- | |||
- | ==== Root cause ==== | ||
- | |||
- | The root cause for most issues in PHP's current variable syntax are the semantics of the variable-variable syntax | ||
- | '' | ||
- | '' | ||
- | |||
- | Why this choice of semantics is problematic to our parser design is explained in the next section. Before getting to | ||
- | that, I will discuss why this choice is also bad from a language design perspective: | ||
- | |||
- | Normally variable accesses are interpreted from left to right. '' | ||
- | named '' | ||
- | The '' | ||
- | its '' | ||
- | with the name of the result. | ||
- | |||
- | This combination of an indirect reference and an offset is the **only** case where the interpretation is inverted. For | ||
- | example the very similar '' | ||
- | It follows normal left-to-right semantics. Similarly '' | ||
- | and not as '' | ||
- | |||
- | To ensure maximum possible inconsistency there exists an exception to this rule. Namely '' | ||
- | **will** be interpreted as '' | ||
- | |||
- | This issue applies not only to simple indirect references, but also to indirected property and method names. For example | ||
- | '' | ||
- | the 1, 2 and 3 offsets. On the other hand '' | ||
- | entirely different interpretation: | ||
- | calls the static method of class '' | ||
- | |||
- | The last issue implies that PHP's variable syntax is non-local. It is not possible to parse a PHP variable access with | ||
- | a fixed finite lookahead, without transplanting the generated syntax tree or instructions after the fact. | ||
- | |||
- | ==== Impact on parser definition ==== | ||
- | |||
- | In addition to the problems described above the semantics for indirect references also have far-reaching consequences on | ||
- | how the variable syntax is defined in our parser. In the following I will outline the kind of issues it causes, for | ||
- | readers not familiar with parser construction: | ||
- | |||
- | The " | ||
- | rule, which could look roughly as follows (somewhat simplified): | ||
- | |||
- | < | ||
- | variable: | ||
- | T_VARIABLE | ||
- | | | ||
- | | | ||
- | | | ||
- | | | ||
- | | ... | ||
- | ; | ||
- | </ | ||
- | |||
- | This approach ensures that we can arbitrarily nest different access types (in the following called " | ||
- | example the above definition allows you to write '' | ||
- | left-to-right, | ||
- | |||
- | What happens to this scheme if we add a (right-associative) '' | ||
- | defined as follows: | ||
- | |||
- | < | ||
- | reference_variable: | ||
- | T_VARIABLE | ||
- | | | ||
- | ; | ||
- | |||
- | variable: | ||
- | T_VARIABLE | ||
- | | ' | ||
- | | | ||
- | | | ||
- | | ... | ||
- | ; | ||
- | </ | ||
- | |||
- | However, this is not possible because it makes the grammar ambiguous. When the parser encounters '' | ||
- | could either interpret it using the '' | ||
- | the '' | ||
- | " | ||
- | |||
- | How can this issue be resolved? By removing the '' | ||
- | you can no longer write '' | ||
- | other dereferencing types: You need to implement it for '' | ||
- | '' | ||
- | that. | ||
- | |||
- | This is both extremely complicated and fragile. This is the reason why PHP only introduced the '' | ||
- | syntax in PHP 5.4 and even then the support is not perfect. | ||
- | |||
- | ==== Incomplete dereferencing support ==== | ||
- | |||
- | Because of the implementational hurdles described in the previous section, we do not support all combinations of | ||
- | dereferencing operations to an arbitrary depth. While PHP 5.4 fixed the most glaring issue (support for | ||
- | '' | ||
- | |||
- | Basically, there are two classes of issues. The first one is that we do not always properly support nesting of different | ||
- | dereferencing types. For example, while it is possible to write both '' | ||
- | the combination '' | ||
- | syntax implemented in PHP 5.5 allows you to write '' | ||
- | possible. Yet another example is that the alternative array syntax '' | ||
- | i.e. '' | ||
- | |||
- | The second class of issues is that some nesting types aren't supported altogether. For example '' | ||
- | simple reference variables on the left hand side. Writing something like '' | ||
- | not possible. Writing '' | ||
- | familiar from JavaScript is not allowed as well. | ||
- | |||
- | Lack of support for dereferencing parenthesis-expressions also prevents you from properly disambiguating some | ||
- | expressions. For example, it is a common problem that '' | ||
- | method, rather than calling the closure stored in '' | ||
- | possible and you need to use a temporary variable. Another example is the case of '' | ||
- | above (which is interpreted as '' | ||
- | behavior '' | ||
- | |||
- | ==== Miscellaneous other issues ==== | ||
- | |||
- | === Behavior in non-read context === | ||
- | |||
- | The new '' | ||
- | " | ||
- | |||
- | For example '' | ||
- | |||
- | This also means that assignments to dereferences of parenthesis-expressions are never allowed, even when they would be technically possible. E.g. it's not possible to write '' | ||
- | |||
- | === Superfluous CVs on static property access === | ||
- | |||
- | Upon encountering a static property access '' | ||
- | '' | ||
- | needs to be implemented to support our weird indirect reference semantics. | ||
- | |||
- | === TODO === | ||
- | |||
- | ===== Proposal ===== | ||
- | |||
- | ==== Formal definition ==== | ||
- | |||
- | A formal definition of the new variable syntax is provided in Bison syntax. This is a slightly simplified version of the | ||
- | grammer used in the actual implementation. Furthermore definitions for '' | ||
- | '' | ||
- | |||
- | < | ||
- | variable: | ||
- | callable_variable | ||
- | | class_name_or_dereferencable T_PAAMAYIM_NEKUDOTAYIM simple_variable | ||
- | | dereferencable T_OBJECT_OPERATOR member_name | ||
- | ; | ||
- | |||
- | callable_variable: | ||
- | simple_variable | ||
- | | dereferencable ' | ||
- | | dereferencable ' | ||
- | | ||
- | | function_name function_call_parameter_list | ||
- | | dereferencable T_OBJECT_OPERATOR member_name function_call_parameter_list | ||
- | | class_name_or_dereferencable T_PAAMAYIM_NEKUDOTAYIM member_name function_call_parameter_list | ||
- | | callable_expr function_call_parameter_list | ||
- | ; | ||
- | |||
- | simple_variable: | ||
- | T_VARIABLE | ||
- | | ' | ||
- | | ' | ||
- | ; | ||
- | |||
- | dereferencable: | ||
- | variable | ||
- | | ' | ||
- | | dereferencable_scalar | ||
- | ; | ||
- | |||
- | dereferencable_scalar: | ||
- | T_ARRAY ' | ||
- | | ' | ||
- | | T_CONSTANT_ENCAPSED_STRING | ||
- | ; | ||
- | |||
- | class_name_or_dereferencable: | ||
- | class_name | ||
- | | | ||
- | ; | ||
- | |||
- | member_name: | ||
- | T_STRING | ||
- | | ' | ||
- | | simple_variable | ||
- | ; | ||
- | |||
- | dim_offset: | ||
- | /* empty */ | ||
- | | expr | ||
- | ; | ||
- | |||
- | callable_expr: | ||
- | callable_variable | ||
- | | ' | ||
- | | dereferencable_scalar | ||
- | ; | ||
- | </ | ||
- | |||
- | ==== Semantic differences in existing syntax ==== | ||
- | |||
- | The main difference to the existing variable syntax, is that indirect variable, property and method references are now | ||
- | interpreted with left-to-right semantics. Examples: | ||
- | |||
- | < | ||
- | $$foo[' | ||
- | $foo-> | ||
- | $foo-> | ||
- | Foo:: | ||
- | </ | ||
- | |||
- | This change is **backwards incompatible** (with low practical impact), which is the reason why this RFC targets PHP 6. | ||
- | However it is always possible to recreate the old behavior by explicitly using braces: | ||
- | |||
- | <code php> | ||
- | ${$foo[' | ||
- | Foo:: | ||
- | $foo-> | ||
- | </ | ||
- | |||
- | This syntax will have guaranteed same behavior in both PHP 5 and PHP 6. | ||
- | |||
- | ==== Newly added and generalized syntax ==== | ||
- | |||
- | * There are no longer any restrictions on nesting of dereferencing operations. In particular the examples '' | ||
- | * Static property fetches and method calls can now be applied to any dereferencable expression. E.g. '' | ||
- | * The result of a call can now be directly called again, i.e. all of '' | ||
- | * All dereferencing operations can now be applied to arbitrary parenthesis-expressions. I.e. all of '' | ||
- | * All dereferencing operations can now be applied to dereferencable scalars (array and string literals as of PHP 5.5). E.g. it is possible to write '' | ||
- | |||
- | ==== Global keyword takes only simple variables ==== | ||
- | |||
- | Previously the '' | ||
- | variable could follow after the '' | ||
- | |||
- | The '' | ||
- | '' | ||
- | |||
- | ==== Behavior in write context ==== | ||
- | |||
- | Expressions of type '' | ||
- | applied to a non-variable) are now parsed as a '' | ||
- | previously). | ||
- | |||
- | This means that the expression will behave correctly in '' | ||
- | throw an undefined offset notice. Furthermore it is now possible to assign to expressions of this kind, for example | ||
- | '' | ||
- | '' | ||
- | |||
- | However assignment is not allowed if the left hand expression yields an '' | ||
- | For example the expression '' | ||
- | compile error. For '' | ||
- | |||
- | ==== Class name variable for new expression ==== | ||
- | |||
- | It has always been possible to create classes using a dynamically specified class name by writing '' | ||
- | However the supported variables are more limited in this case: They may not include calls anywhere, as this would cause | ||
- | ambiguity with the constructor parameter list. New variables are now defined as follows: | ||
- | |||
- | < | ||
- | new_variable: | ||
- | simple_variable | ||
- | | | ||
- | | | ||
- | | | ||
- | | | ||
- | | | ||
- | ; | ||
- | </ | ||
- | |||
- | This matches the previously allowed variable expressions, | ||
- | For example '' | ||
- | | ||
- | ===== Backward Incompatible Changes ===== | ||
- | |||
- | The changes described in [[# | ||
- | [[# | ||
- | breaks. | ||
- | |||
- | The former is a change in the behavior of currently existing syntax. Examples: | ||
- | |||
- | <code php> | ||
- | // old meaning | ||
- | $$foo[' | ||
- | $foo-> | ||
- | $foo-> | ||
- | Foo:: | ||
- | </ | ||
- | |||
- | An analysis of the Zend Framework and Symfony projects (including standard dependencies) showed that only a single | ||
- | occurrence of '' | ||
- | This occurrence must be replaced with '' | ||
- | PHP 5 and PHP 6. | ||
- | |||
- | The latter change turns currently valid syntax into a parse error. Expressions like '' | ||
- | longer valid and '' | ||
- | |||
- | As these changes only apply to some very rarely used syntax, the breakage seems acceptable for PHP 6. | ||
- | |||
- | ===== Open issues ===== | ||
- | |||
- | The current patch introduces a new "write context" | ||
- | was as '' | ||
- | not exist, whereas the latter does not throw a notice. | ||
- | |||
- | The reason for this is that '' | ||
- | the part in parentheses will always be compiled in "read context", | ||
- | |||
- | This issue already exists currently, in a different context: The expression '' | ||
- | standards notice, if '' | ||
- | reference. On the other hand '' | ||
- | as an expression instead of a variable. | ||
- | |||
- | ===== Patch ===== | ||
- | |||
- | An implementation of this proposal against the phpng branch is available at https:// | ||
- | |||
- | The main changes are limited to the language parser and compiler. Furthermore some opcode handlers had to be modified | ||
- | to support '' | ||
- | |||
- | ===== Vote ===== | ||
- | |||
- | As this is a language change, a 2/3 majority is required for acceptance. The vote started on 2014-07-07 and ends on 2014-07-14. | ||
- | |||
- | <doodle title=" | ||
- | * Yes | ||
- | * No | ||
- | </ | ||
rfc/uniform_variable_syntax.txt · Last modified: 2017/09/22 13:28 (external edit)