rfc:uniform_variable_syntax
Differences
This shows you the differences between two versions of the page.
rfc:uniform_variable_syntax [2014/06/02 19:06] nikic |
rfc:uniform_variable_syntax [2017/09/22 13:28] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== PHP RFC: Uniform Variable Syntax ====== | ||
- | * Date: 2014-05-31 | ||
- | * Author: Nikita Popov < | ||
- | * Status: Draft | ||
- | * Proposed for: PHP 6 | ||
- | ===== Overview ===== | ||
- | |||
- | This RFC proposes the introduction of an internally consistent and complete variable syntax. To achieve this goal the | ||
- | semantics of some rarely used variable-variable constructions need to be changed. | ||
- | |||
- | TODO | ||
- | |||
- | ===== Issues with the current syntax ===== | ||
- | |||
- | ==== Root cause ==== | ||
- | |||
- | The root cause for most issues in PHP's current variable syntax are the semantics of the variable-variable syntax | ||
- | '' | ||
- | '' | ||
- | |||
- | Why this choice of semantics is problematic to our parser design is explained in the next section. Before getting to | ||
- | that, I will discuss why this choice is also bad from a language design perspective: | ||
- | |||
- | Normally variable accesses are interpreted from left to right. '' | ||
- | named '' | ||
- | '' | ||
- | offset, it will first fetch '' | ||
- | |||
- | This combination of an indirect reference and an offset is the **only** case where interpretation is inverted. For | ||
- | example the very similar '' | ||
- | It follows normal left-to-right semantics. Similarly '' | ||
- | '' | ||
- | |||
- | To ensure maximum possible inconsistency there exists an exception to this rule. Namely '' | ||
- | **will** be interpreted as '' | ||
- | |||
- | This issue applies not only to simple indirect references, but also to indirected property and method names. For example | ||
- | '' | ||
- | and 3 offsets. On the other hand '' | ||
- | different interpretation. This does not call the function stored at '' | ||
- | method of class '' | ||
- | |||
- | The last issue implies that PHP's variable syntax is non-local. It is not possible to parse a PHP variable access with | ||
- | a fixed finite lookahead (most parsers have only one token lookahead), without transplanting the generated syntax tree | ||
- | or instructions after the fact. | ||
- | |||
- | ==== Impact on parser definition ==== | ||
- | |||
- | In addition to the problems described above the semantics for indirect references also has far-reaching consequences on | ||
- | how the variable syntax is defined in our parser. In the following I will outline the kind of issue it causes, for | ||
- | readers not familiar with parser construction: | ||
- | |||
- | The " | ||
- | which could look roughly as follows (somewhat simplified): | ||
- | |||
- | < | ||
- | variable: | ||
- | T_VARIABLE | ||
- | | | ||
- | | | ||
- | | | ||
- | | | ||
- | | ... | ||
- | ; | ||
- | </ | ||
- | |||
- | This approach ensures that we can arbitrarily nest different access types (in the following called " | ||
- | example the above definition allows you to write '' | ||
- | left-to-right, | ||
- | |||
- | What happens to this scheme if we add a (right-associative) '' | ||
- | defined as follows: | ||
- | |||
- | < | ||
- | reference_variable: | ||
- | T_VARIABLE | ||
- | | | ||
- | ; | ||
- | |||
- | variable: | ||
- | T_VARIABLE | ||
- | | ' | ||
- | | | ||
- | | | ||
- | | ... | ||
- | ; | ||
- | </ | ||
- | |||
- | However, this is not possible because it makes the grammer ambiguous. When the parser encounters '' | ||
- | could either interpret it using the ''' | ||
- | '' | ||
- | conflict" | ||
- | |||
- | How can this issue be resolved? By removing the '' | ||
- | can no longer write '' | ||
- | other dereferencing types: You need to implement it for '' | ||
- | Furthermore you need to ensure that you can continue to nest arbitrary types of dereferences after that. | ||
- | |||
- | This is both extremely complicated and fragile. This is the reason why PHP only introduced the '' | ||
- | in PHP 5.4 and even then the support is not perfect. | ||
- | |||
- | ==== Incomplete dereferencing support ==== | ||
- | |||
- | Because of the implementational hurdles described in the previous section, we do not support all combinations of | ||
- | dereferencing operations to an arbitrary death. While PHP 5.4 fixed the most glaring issue (support for | ||
- | '' | ||
- | |||
- | Basically, there are two classes of issues. The first one is that we do not always properly support nesting of different | ||
- | dereferencing types. For example, while it is possible to write both '' | ||
- | combination '' | ||
- | implemented in PHP 5.5 allows you to write '' | ||
- | Yet another example is that the alternative array syntax '' | ||
- | '' | ||
- | |||
- | The second class of issues is that some nesting types aren't supported altogether. For example ''::'' | ||
- | simple reference variables on the left hand side. Writing something like '' | ||
- | possible. Writing '' | ||
- | familiar from JavaScript is not allowed as well. | ||
- | |||
- | Lack of support for dereferencing parenthesis-expressions also prevents you from properly disambiguating some | ||
- | expressions. For example, it is a common problem that '' | ||
- | rather than calling the closure stored in '' | ||
- | need to use a temporary variable. Another example is the case of '' | ||
- | interpreted as '' | ||
- | '' | ||
- | |||
- | ==== Miscellaneous other issues ==== | ||
- | |||
- | === Behavior in write/isset context === | ||
- | |||
- | The new '' | ||
- | " | ||
- | if it would be technically possible. E.g. there is nothing inherently problematic with writing '' | ||
- | (it's no different than writing '' | ||
- | |||
- | Furthermore this causes inconsistent behavior in '' | ||
- | generate an " | ||
- | |||
- | === Superfluous CVs on static property access === | ||
- | |||
- | Upon encountering a static property access '' | ||
- | even though it is not necessary and never used. This is once again related to the way static member access needs to be | ||
- | implemented to support our weird indirect reference semantics. | ||
- | |||
- | === TODO === | ||
- | |||
- | ===== Proposal ===== | ||
- | |||
- | ==== Formal definition ==== | ||
- | |||
- | A formal definition of the new variable syntax is provided in Bison syntax. This is a slightly simplified version of the | ||
- | grammer used in the proposed implementation. Definitions for '' | ||
- | '' | ||
- | |||
- | < | ||
- | variable: | ||
- | callable_variable | ||
- | | class_name_or_dereferencable T_PAAMAYIM_NEKUDOTAYIM simple_variable | ||
- | | dereferencable T_OBJECT_OPERATOR member_name | ||
- | ; | ||
- | |||
- | callable_variable: | ||
- | simple_variable | ||
- | | dereferencable ' | ||
- | | dereferencable ' | ||
- | | ||
- | | function_name function_call_parameter_list | ||
- | | dereferencable T_OBJECT_OPERATOR member_name function_call_parameter_list | ||
- | | class_name_or_dereferencable T_PAAMAYIM_NEKUDOTAYIM member_name function_call_parameter_list | ||
- | | callable_expr function_call_parameter_list | ||
- | ; | ||
- | |||
- | simple_variable: | ||
- | T_VARIABLE | ||
- | | ' | ||
- | | ' | ||
- | ; | ||
- | |||
- | dereferencable: | ||
- | variable | ||
- | | ' | ||
- | | dereferencable_scalar | ||
- | ; | ||
- | |||
- | dereferencable_scalar: | ||
- | T_ARRAY ' | ||
- | | ' | ||
- | | T_CONSTANT_ENCAPSED_STRING | ||
- | ; | ||
- | |||
- | class_name_or_dereferencable: | ||
- | class_name | ||
- | | | ||
- | ; | ||
- | |||
- | member_name: | ||
- | T_STRING | ||
- | | ' | ||
- | | simple_variable | ||
- | ; | ||
- | |||
- | dim_offset: | ||
- | /* empty */ | ||
- | | expr | ||
- | ; | ||
- | |||
- | callable_expr: | ||
- | callable_variable | ||
- | | ' | ||
- | | dereferencable_scalar | ||
- | ; | ||
- | </ | ||
- | |||
- | ===== Backward Incompatible Changes ===== | ||
- | |||
- | TODO | ||
- | |||
- | ===== Patch ===== | ||
- | |||
- | https:// |
rfc/uniform_variable_syntax.txt · Last modified: 2017/09/22 13:28 (external edit)