rfc:parser-extension-api

PHP RFC: Parser Extension API

Introduction

This RFC proposes an introduction of userland parser extensions API for providing an access to the low-level Abstract Syntax Tree (AST) parser.

Parsing API proposal

As you know, all previous versions of PHP do not provide an API for accessing an information about Abstract Syntax Tree for specified code. This limitation was due to the absence of AST on the engine level. There is only tokenizer PHP extension with token_get_all() function that provides an information about lexical tokens. However, this stream of tokens can not be easily used because of complex grammar of PHP which requires a development of grammar on PHP side.

Therefore, latest version of PHP now includes powerful AST-based implementation of the compiler which is fully decoupled from the parser. This parser gives an opportunity for better code quality and maintainability improvement. Information about Abstract Syntax Tree can be useful on the userland side too, so I want to propose to provide a parsing API for building an AST tree in addition to the existing tokenizer extension.

Why AST is needed on userland side?

Currently, there are some libraries that provide a top-level API for accessing an information about the source code. This includes PHP-Parser (https://github.com/nikic/PHP-Parser), PHP-Token-Reflection (https://github.com/Andrewsville/PHP-Token-Reflection), Doctrine Annotations and other tools. Information about structure of the source code is also used by all existing QA tools that performs a static analysis of source code, heavily relying on tokenizer extension and custom parsers. Introduction of system API for parsing can simplify this tools and make them more reliable and faster.

Parser API

Structural unit of Abstract Syntax Tree is a single node that holds an information about concrete element:

<?php
namespace Php\Parser;
 
class Node
{
    public $kind;
    public $flags;
    public $lineNumber;
    public $value; 
 
    /**
     * @var Node[]|array List of children nodes
     */
    public $children;
 
    /**
     * Returns the text representation of current node
     * Recursively applied to all children 
     *
     * @return string
     */
    public function dump()
 
    /**
     * Returns a user-friendly name of node kind, e.g. "AST_ASSIGN" 
     * @return string
     */
    public function getKindName()
 
    /**
     * Is current node uses flags or not
     * @return bool
     */
    public function isUsingFlags()
}

The `kind` property specified the type of the node. It is an integral value, which corresponds to one of the AST_* constants, for example AST_STMT_LIST. To retrieve the string name of an integral kind getKindName() method of node can be used.

The `flags` property contains node specific flags. It is always defined, but for most nodes it is always zero. isUsingFlags() method for node can be used to determine whether a node has a meaningful flags value.

The `value` property contains a value only from zval AST nodes. The `lineNumber` property specified the starting line number of the node. The `children` property contains an array of child-nodes.

To access an information about AST for the code, `Php\Parser\Engine` class will be used:

<?php
namespace Php\Parser;
 
final class Engine
{
     /**
      * Parses the given code and returns an AST for it
      *
      * @param string $phpCode Source code to analyse
      *
      * @return Node
      */
     public static function parse($phpCode): Node
}

The static Engine::parse() method accepts a source code string (which is parsed in INITIAL mode, i.e. it should generally include an opening PHP tag) and returns an abstract syntax tree consisting of Node objects. An abstract syntax tree can be compiled/pretty-printed later into a Php code.

Here is an example of getting an AST for simple code:

<?php
use Php\Parser\Engine as ParserEngine;
 
$code = <<<'EOC'
<?php
$var = 42;
EOC;
 
$astTree = ParserEngine::parse($code);
echo $astTree->dump(); 
 
// Output:
AST_STMT_LIST @ 1 {
    0: AST_ASSIGN @ 1 {
        0: AST_VAR @ 1 {
            0: "var"
        }
        1: 42
    }
}

This information about AST can be used later for custom Parser Extensions, QA static analysis tools, source code rewriting tools and much more.

I want to notice, that this part was originally implemented and described by Nikita Popov as an experimental php-ast extension https://github.com/nikic/php-ast, so it can be used as base implementation.

Backward Incompatible Changes

No changes.

Proposed PHP Version(s)

Target: PHP7.x

RFC Impact

To SAPIs

No impact to SAPI.

To Existing Extensions

Existing extensions are not affected.

To Opcache

This RFC doesn't affect an opcache logic, because it provides an API only for accessing the AST information.

New Constants

`Php\Parser\Node` class will contain several constans for describing different kind of nodes, their names and flags.

php.ini Defaults

No

Open Issues

  1. Should each node type be represented as personal class?
  2. Where metadata should be stored (flags, names of kind nodes, relation between node types)? This information will be needed later for validation of AST

Future Scope

This implementation of RFC can be used later for building userland parser extensions (based on zend_ast_process() hook). We could allow userland extensions to hook into the compilation process. This would allow extensions to implement some types of language features, for example, Design-by-Contract verifying, Aspect-Oriented programming, analysis of annotation metadata and much more.

Proposed Voting Choices

Target version:

  1. 7.0
  2. 7.x
  3. Do not include this API into core

Implementation paradigm:

  1. Object-oriented: Php\Parser\Engine
  2. Functional: ast_xxx() functions

Namespace:

  1. None (top-level)
  2. Php\Parser\
  3. Ast\

Patches and Tests

No patch is available at the moment.

Implementation

No information yet.

References

  1. PHP RFC: Abstract syntax tree https://wiki.php.net/rfc/abstract_syntax_tree
  2. Compiler hook for altering the AST pre-compilation https://github.com/php/php-src/commit/1010b0ea4f4b9f96ae744f04c1191ac228580e48
  3. Abstract Syntax Trees API in Python Language https://docs.python.org/2/library/ast.html

Rejected Features

  1. Userland parser extensions - need more time to clarify details, possible targets are 7.x or 8.0
rfc/parser-extension-api.txt · Last modified: 2021/03/27 14:49 by ilutov