rfc:abstract_syntax_tree
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:abstract_syntax_tree [2014/07/31 15:20] – nikic | rfc:abstract_syntax_tree [2017/09/22 13:28] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
* Date: 2014-07-28 | * Date: 2014-07-28 | ||
* Author: Nikita Popov < | * Author: Nikita Popov < | ||
- | * Status: | + | * Status: |
- | * Targeting: PHP.next | + | * Discussion: http:// |
===== Introduction ===== | ===== Introduction ===== | ||
Line 17: | Line 17: | ||
==== More maintainable parser and compiler ==== | ==== More maintainable parser and compiler ==== | ||
- | In the new AST-based implementation the compiler is fully decoupled from the parser, which leads to a code quality and maintainability improvement. In the following some examples of how this improves our code are discussed: | + | In the new AST-based implementation the compiler is fully decoupled from the parser, which leads to a code quality and maintainability improvement. In the following some examples of such improvements |
* The parser no longer needs to define separate productions in cases where the same syntax requires different compilation. For example static scalar expressions no longer need to redefine all basic operations and can reuse the normal '' | * The parser no longer needs to define separate productions in cases where the same syntax requires different compilation. For example static scalar expressions no longer need to redefine all basic operations and can reuse the normal '' | ||
- | * The parser needs far fewer mid-rule semantic actions. Now mid-rule reduction is only used to back up doc comments, whereas previously the use was ubiquitous. Apart from code quality concerns this is beneficial because mid-rule actions force the parser to reduce earlier, i.e. the parser is allowed to inspect a smaller number of tokens in order to decide which rule should be reduced. This limits the syntax we can implement. | + | * The parser needs far fewer mid-rule semantic actions. Now mid-rule reduction is only used to back up doc comments, whereas previously the use was ubiquitous. Apart from code quality concerns, this is beneficial because mid-rule actions force the parser to reduce earlier, i.e. the parser is allowed to inspect a smaller number of tokens in order to decide which rule should be reduced. This limits the syntax we can implement. |
- | * Implementations of control flow structures were usually spread across multiple functions called as mid-rule actions. Jump instruction opnums were backed up into arbitrary znodes, which usually results in very hard to follow code (like '' | + | * Implementations of control flow structures were usually spread across multiple functions called as mid-rule actions. Jump instruction opnums were backed up into arbitrary znodes |
- | * Variables were implemented through a backpatch list (and stack), into which oplines necessary for '' | + | * Variables were previously |
- | * We also no longer need to backpatch in a number of other places, e.g. during list compilation. | + | * We also no longer need to backpatch in a number of other places, e.g. during |
==== Decoupling syntax decisions from technical issues ==== | ==== Decoupling syntax decisions from technical issues ==== | ||
Line 36: | Line 36: | ||
This choice was made purely due to technical restrictions and is removed in the AST implementation. | This choice was made purely due to technical restrictions and is removed in the AST implementation. | ||
- | Additionally the current compiler architecture prevents | + | Additionally the current compiler architecture prevents |
* Array destructuring using '' | * Array destructuring using '' | ||
Line 48: | Line 48: | ||
The abstract syntax tree has little impact on runtime performance or memory usage. While the AST does allow us to generate slightly better and smaller instruction sequences in some cases (e.g. for '' | The abstract syntax tree has little impact on runtime performance or memory usage. While the AST does allow us to generate slightly better and smaller instruction sequences in some cases (e.g. for '' | ||
- | The introduction of an AST does however impact the performance and memory usage of the compilation process itself. It should be emphasized that this difference is only relevant when no opcode cache is in use. If an opcode cache is used then each file is only compiled once, as such any difference does not have a practical impact. | + | The introduction of an AST does however impact the performance and memory usage of the compilation process itself. It should be emphasized that this difference is //only relevant when no opcode cache is in use//. If an opcode cache is used then each file is only compiled once, as such any difference does not have a practical impact. |
The script used to measure the following numbers is available as a [[https:// | The script used to measure the following numbers is available as a [[https:// | ||
Line 78: | Line 78: | ||
The introduction of the AST comes with minor changes to syntax and behavior, which are listed in the following: | The introduction of the AST comes with minor changes to syntax and behavior, which are listed in the following: | ||
- | ==== '' | + | ==== yield does not require parentheses ==== |
This is the only syntax related change and was already mentioned previously. '' | This is the only syntax related change and was already mentioned previously. '' | ||
Line 94: | Line 94: | ||
Similarly '' | Similarly '' | ||
- | ==== Order of '' | + | ==== Changes to list() ==== |
+ | |||
+ | > **Note**: The behavior | ||
'' | '' | ||
Line 106: | Line 108: | ||
</ | </ | ||
- | Furthermore '' | + | Another example where the assignment order is relevant is if both the left and right side of the list assignment use the same variable: |
- | ==== Order of evaluation of assignments ==== | + | <code php> |
+ | $a = [1, 2]; | ||
+ | list($a, $b) = $a; | ||
- | For assignments the right-hand side is now evaluated before the left-hand side: | + | // OLD: $a = 1, $b = 2 |
+ | // NEW: $a = 1, $b = null + " | ||
+ | |||
+ | $b = [1, 2]; | ||
+ | list($a, $b) = $b; | ||
+ | // OLD: $a = null + " | ||
+ | // NEW: $a = 1, $b = 2 | ||
+ | </ | ||
+ | |||
+ | '' | ||
<code php> | <code php> | ||
- | $i = 1; | + | list(list($a, $b)) = $array; |
- | $array[$i++] | + | |
- | // OLD: [1 => 2] | + | // OLD: |
- | // NEW: [2 => 1] | + | $b = $array[0][1]; |
+ | $a = $array[0][0]; | ||
+ | |||
+ | // NEW: | ||
+ | $_tmp = $array[0]; | ||
+ | $a = $_tmp[0]; | ||
+ | $b = $_tmp[1]; | ||
</ | </ | ||
- | Note that order of evaluation | + | The only visible change this has for most purposes is that an " |
+ | |||
+ | Empty '' | ||
<code php> | <code php> | ||
- | $i = 1; | + | list() |
- | $array[$i] | + | list($b, list()) |
- | // OLD + NEW: [2 => 1] | + | foreach ($a as list()) |
</ | </ | ||
- | ===== Patch ===== | + | ==== Auto-vivification order for by-reference assignments |
- | The AST implementation can be found on GitHub: https:// | + | > **Note**: |
- | The branch already includes the Uniform Variable Syntax RFC, as it was a necessary prerequisite for the implementation. | + | While by-reference assignments are (CVs notwithstanding) evaluated left-to-right, auto-vivification currently occurs right-to-left. In the AST implementation |
- | The implementation has everything ported, but probably still has some bugs and needs some cleanup :) | + | <code php> |
+ | $obj = new stdClass; | ||
+ | $obj->a = & | ||
+ | $obj->b = 1; | ||
+ | var_dump($obj); | ||
+ | |||
+ | // OLD: | ||
+ | object(stdClass)# | ||
+ | [" | ||
+ | & | ||
+ | [" | ||
+ | & | ||
+ | } | ||
+ | |||
+ | // NEW: | ||
+ | object(stdClass)# | ||
+ | [" | ||
+ | & | ||
+ | [" | ||
+ | & | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Note: The order can easily be changed, but the old behavior looks like a bug to me, so I decided to keep the new behavior. | ||
+ | |||
+ | ==== Directly calling __clone is allowed ==== | ||
+ | |||
+ | Doing calls like '' | ||
===== Implementation ===== | ===== Implementation ===== | ||
Line 441: | Line 489: | ||
The '' | The '' | ||
+ | |||
+ | ===== Additional possibilities (not implemented) ===== | ||
+ | |||
+ | The generated AST can be exposed to userland via an extension, for use by static analysers. This should be relatively easy to implement and we might even want to provide this as a bundled extension (like ext/ | ||
+ | |||
+ | More interestingly, | ||
+ | |||
+ | As an example, this is roughly how an implementation of the [[rfc: | ||
+ | |||
+ | <code c> | ||
+ | /* Works by rewriting ifsetor($foo, | ||
+ | void ext_ifsetor_hook(zend_ast **ast_ptr TSRMLS_DC) { | ||
+ | zend_ast *ast = *ast_ptr; | ||
+ | | ||
+ | if (ast-> | ||
+ | zend_string *name = zval_get_string(zend_ast_get_zval(ast-> | ||
+ | zend_ast_list *args = zend_ast_get_list(ast-> | ||
+ | | ||
+ | if (zend_str_equals_literal_ci(name, | ||
+ | && args-> | ||
+ | ) { | ||
+ | if (!zend_is_variable(args-> | ||
+ | zend_error_noreturn(E_COMPILE_ERROR, | ||
+ | "must be a variable" | ||
+ | } | ||
+ | | ||
+ | /* Note: One would need a function for adding refs to args-> | ||
+ | * as it is used two times - as written here it won't work correctly. */ | ||
+ | *ast_ptr = zend_ast_create(ZEND_AST_CONDITIONAL, | ||
+ | zend_ast_create(ZEND_AST_ISSET, | ||
+ | args-> | ||
+ | args-> | ||
+ | ); | ||
+ | } | ||
+ | | ||
+ | STR_RELEASE(name); | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | I don't know how useful this is and how many things can be implemented in this way, but I think it's worth considering. | ||
+ | |||
+ | An additional possibility is to drop the keywords for '' | ||
+ | |||
+ | ===== Patch ===== | ||
+ | |||
+ | The AST implementation can be found at https:// | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | //The branch already includes the Uniform Variable Syntax RFC//, as it was a necessary prerequisite for the implementation. | ||
+ | |||
+ | The implementation has everything ported, but probably still has some bugs and needs some cleanup :) | ||
+ | |||
+ | ===== Vote ===== | ||
+ | The vote started on 2014-08-18 and ended on 2014-08-25. The necessary 2/3 majority was reached, as such the RFC is accepted. | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ |
rfc/abstract_syntax_tree.1406820048.txt.gz · Last modified: 2017/09/22 13:28 (external edit)