rfc:context_sensitive_lexer
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:context_sensitive_lexer [2015/02/20 06:53] – marcio | rfc:context_sensitive_lexer [2017/09/22 13:28] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Context Sensitive Lexer ====== | ====== PHP RFC: Context Sensitive Lexer ====== | ||
- | * Version: 0.3 | + | * Version: 0.4.1 |
* Date: 2015-02-15 | * Date: 2015-02-15 | ||
* Author: Márcio Almada | * Author: Márcio Almada | ||
- | * Status: | + | * Status: |
* First Published at: http:// | * First Published at: http:// | ||
Line 18: | Line 18: | ||
class Collection { | class Collection { | ||
public function forEach(callable $callback) { /* */ } | public function forEach(callable $callback) { /* */ } | ||
+ | public function list() { /* */ } | ||
} | } | ||
- | class List { | ||
- | public function append(List $list) { /* */ } | ||
- | } | ||
</ | </ | ||
- | Notice that it's currently **not** possible to have the '' | + | Notice that it's currently **not** possible to have the '' |
PHP Parse error: Syntax error, unexpected T_FOREACH, expecting T_STRING on line 2 | PHP Parse error: Syntax error, unexpected T_FOREACH, expecting T_STRING on line 2 | ||
- | PHP Parse error: Syntax error, unexpected T_LIST, expecting T_STRING on line 5 | + | PHP Parse error: Syntax error, unexpected T_LIST, expecting T_STRING on line 3 |
===== Proposal ===== | ===== Proposal ===== | ||
This RFC revisits the topic of [[https:// | This RFC revisits the topic of [[https:// | ||
- | presenting a minimal and maintainable [[https:// | + | presenting a minimal and maintainable [[https:// |
- | restricted to OO scope only, consistently comprehending: | + | |
- | * Namespace, class, trait and interface names | ||
* Properties, constants and methods defined on classes, interfaces and traits | * Properties, constants and methods defined on classes, interfaces and traits | ||
* Access of properties, constants and methods from objects and classes | * Access of properties, constants and methods from objects and classes | ||
Line 43: | Line 39: | ||
- Reduce the surface of BC breaks whenever new keywords are introduced | - Reduce the surface of BC breaks whenever new keywords are introduced | ||
- | - Avoid restricting userland APIs. Dispensing the need for hacks like unecessary | + | - Avoid restricting userland APIs. Dispensing the need for hacks like unnecessary |
This is a list of currently **globally** reserved words that will become **semi-reserved** in case proposed change gets approved: | This is a list of currently **globally** reserved words that will become **semi-reserved** in case proposed change gets approved: | ||
Line 51: | Line 47: | ||
namespace | namespace | ||
print echo require | print echo require | ||
- | function | + | function |
==== Limitations ==== | ==== Limitations ==== | ||
- | On purporse, it's still forbidden to define a **namespace**, | + | On purpose, it's still forbidden to define a **class |
- | + | ||
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
<code php> | <code php> | ||
- | namespace|class|interface|trait Namespace | + | class Foo { |
- | namespace|class|interface|trait Self {} // Fatal error | + | const class = ' |
- | namespace|class|interface|trait Static {} // Fatal error | + | } |
- | namespace|class|interface|trait Parent {} // Fatal error | + | |
- | namespace|class|interface|trait Array {} // Fatal error | + | |
- | namespace|class|interface|trait Callable {} // Fatal error | + | |
- | // Fatal error: Cannot | + | // Fatal error: Cannot |
</ | </ | ||
- | On purporse, it's still forbidden | + | In practice, it means that we would drop from **64** |
+ | |||
+ | '' | ||
<code php> | <code php> | ||
class Foo { | class Foo { | ||
- | | + | |
} | } | ||
- | // Fatal error: Cannot redefine | + | (new Foo)->list; |
</ | </ | ||
Line 154: | Line 142: | ||
} | } | ||
</ | </ | ||
+ | |||
+ | ===== Impact On Other RFCs ===== | ||
+ | |||
+ | Some RFCs are proposing to reserve new keywords in order to add features or reserve typehints names: | ||
+ | |||
+ | * https:// | ||
+ | * https:// | ||
+ | * https:// | ||
+ | |||
+ | With the approval of the current RFC, BC breaks surface would be much smaller in such cases. | ||
+ | |||
+ | One notable example is the **in** operator RFC. Without a context sensitive lexer, proposed here, the new operator would create a BC break on **Doctrine** library and pretty much many other SQL writers or ORMs out there: | ||
+ | |||
+ | https:// | ||
===== Implementation Details ===== | ===== Implementation Details ===== | ||
+ | |||
+ | ==== Patch 1 - Discarded ==== | ||
The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable. | The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable. | ||
- | For instance, the lexing rules to disambiguate '':: | + | For instance, the lexing rules to disambiguate '':: |
<code c++> | <code c++> | ||
Line 172: | Line 176: | ||
</ | </ | ||
- | One additional compile time check was created: | + | A few additional compile time check were created: |
<code c> | <code c> | ||
- | if (zend_string_equals_literal_ci(name, " | + | if(ZEND_NOT_RESERVED != zend_check_reserved_method_name(decl->name)) { |
- | zend_error_noreturn(E_COMPILE_ERROR, | + | zend_error_noreturn(E_COMPILE_ERROR, |
- | ce->name-> | + | |
} | } | ||
</ | </ | ||
- | Others were just adapted because, surprisingly, | + | ==== Patch 2 ==== |
- | adjustments | + | |
+ | A new patch has been added during the voting phase. It's a different approach that proved to have many advantages over the first patch and therefore it is intended to supersede it. | ||
+ | |||
+ | The new patch just requires the maintenance | ||
+ | |||
+ | - It offers no regression | forward compatibility risks and is highly predictable | ||
+ | - It has a very small footprint when compared to the previous attempt involving a pure lexical approach | ||
+ | - Requires no compile time checks | ||
+ | - Is highly configurable, | ||
+ | |||
+ | In order to send information to the lexer about the context change, we just have to use '' | ||
<code c> | <code c> | ||
// before | // before | ||
- | if(ZEND_FETCH_CLASS_DEFAULT != zend_get_class_fetch_type(name)) { | + | method_modifiers function returns_ref T_STRING '(' |
- | zend_error_noreturn(E_COMPILE_ERROR, | + | |
- | } | + | |
// after | // after | ||
- | if(ZEND_FETCH_CLASS_DEFAULT != zend_check_reserved_name(name)) { | + | method_modifiers function returns_ref identifier '(' |
- | zend_error_noreturn(E_COMPILE_ERROR, | + | |
- | } | + | |
</ | </ | ||
- | Current proposed patch: | + | ===== Future Work And Maintenance |
- | + | ||
- | * Doesn' | + | |
- | * Keeps ext tokenizer functional | + | |
- | * Introduces no maintenance issues | + | |
- | * Has no performance impact | + | |
- | * Introduces a minimal amount of changes on lexer | + | |
- | + | ||
- | => Many experiments with parsing were done before the current proposed patch which involves only lexing changes. But turns out the patches involving parsing had too many disadvantages and maintence issues.\\ | + | |
- | => No performance loss was noticed but maybe the patch requires a better benchmark. | + | |
- | + | ||
- | ===== Impact on performance | + | |
- | No loss noticed. | + | * All php-src tests are passing with the new patch, some work still has to be done. There is a better possibility to expand semi reserved words support to namespaces and class names with the new patch, but this more ambitious proposal will be tailored only for PHP 7.1 by the RFC author. |
- | -- Add benchmark here if asked on discussion | + | => The first patch has been discarded during discussion |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
Line 216: | Line 216: | ||
This is proposed for the next PHP x, which at the time of this writing would be PHP 7. | This is proposed for the next PHP x, which at the time of this writing would be PHP 7. | ||
- | ===== Open Issues | + | ===== Votes ===== |
- | The patch may still contain small bugs related to the topics below, but this can be addressed during discussion phase: | + | This voting requires a 2/3 majority. |
+ | considered good enough, independently of the voting results. The RCF author encourages voting for the feature. | ||
- | * I still have to add more tests involving traits and trait conflict resolution syntax | + | <doodle title=" |
- | * I still have to add more tests involving '' | + | * Yes |
+ | | ||
+ | </ | ||
- | The patch is 98% implemented | + | Voting started on 2015-02-28 |
- | finishing these small details without impact | + | |
===== Patch ===== | ===== Patch ===== | ||
- | | + | ==== Patch 1 - Discarded ==== |
- | - Pull request with all the tests and regenerated ext tokenizer | + | |
+ | - Pull request with all the tests and regenerated ext tokenizer | ||
+ | |||
+ | ==== Patch 2 ==== | ||
+ | |||
+ | - Pull request with all the tests is at [[https:// | ||
+ | |||
+ | ==== Later Changes === | ||
+ | |||
+ | The *Patch 2* was merged and, later, method modifiers were allowed as class member names. This was a limitation from the older implementation candidate - Patch 1 - and there was no reason to keep it. The **Limitations** section was updated accordingly. Only the keyword **class** for class constants is reserved now. | ||
===== References ===== | ===== References ===== | ||
Line 237: | Line 248: | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
- | None so far. | + | * Prior to voting, the support for '' |
+ | |||
+ | => The RFC author will try to solve the wider problem on PHP 7.1 | ||
===== Changelog ===== | ===== Changelog ===== | ||
Line 243: | Line 256: | ||
* 0.2: Additional support to namespaces, classes, interafces and traits names | * 0.2: Additional support to namespaces, classes, interafces and traits names | ||
* 0.3: Oops. Add forgotten support for typehints | * 0.3: Oops. Add forgotten support for typehints | ||
+ | * 0.4: Reverts to 0.1 feature set because class name support created undesired situations regarding the future addition of a future short lambda syntax and possibly block other language changes. | ||
+ | * 0.4.1: A new compatible implementation has been introduced | ||
+ | |||
+ | ===== Acknowledgements ===== | ||
+ | |||
+ | Thanks to: | ||
+ | |||
+ | * Bob Weinand, author of the last [[https:// | ||
+ | * Nikita Popov for providing accurate information about the PHP implementation and constructive criticism. | ||
+ | * Anthony Ferrara, Joe Watkins and Daniel Ackroyd for the quick reviews. | ||
+ | * All people on http:// | ||
rfc/context_sensitive_lexer.1424415239.txt.gz · Last modified: 2017/09/22 13:28 (external edit)