rfc:context_sensitive_lexer
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:context_sensitive_lexer [2015/02/25 01:55] – Reverts to 0.1 feature set because class name support created undesired situations regarding the future addition of a future short lambda syntax and possibly block other language changes marcio | rfc:context_sensitive_lexer [2017/09/22 13:28] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Context Sensitive Lexer ====== | ====== PHP RFC: Context Sensitive Lexer ====== | ||
- | * Version: 0.3 | + | * Version: 0.4.1 |
* Date: 2015-02-15 | * Date: 2015-02-15 | ||
* Author: Márcio Almada | * Author: Márcio Almada | ||
- | * Status: | + | * Status: |
* First Published at: http:// | * First Published at: http:// | ||
Line 18: | Line 18: | ||
class Collection { | class Collection { | ||
public function forEach(callable $callback) { /* */ } | public function forEach(callable $callback) { /* */ } | ||
- | public function | + | public function list() { /* */ } |
} | } | ||
</ | </ | ||
- | Notice that it's currently **not** possible to have the '' | + | Notice that it's currently **not** possible to have the '' |
PHP Parse error: Syntax error, unexpected T_FOREACH, expecting T_STRING on line 2 | PHP Parse error: Syntax error, unexpected T_FOREACH, expecting T_STRING on line 2 | ||
Line 31: | Line 31: | ||
This RFC revisits the topic of [[https:// | This RFC revisits the topic of [[https:// | ||
- | presenting a minimal and maintainable [[https:// | + | presenting a minimal and maintainable [[https:// |
- | restricted to OO scope only, consistently comprehending: | + | |
* Properties, constants and methods defined on classes, interfaces and traits | * Properties, constants and methods defined on classes, interfaces and traits | ||
Line 40: | Line 39: | ||
- Reduce the surface of BC breaks whenever new keywords are introduced | - Reduce the surface of BC breaks whenever new keywords are introduced | ||
- | - Avoid restricting userland APIs. Dispensing the need for hacks like unecessary | + | - Avoid restricting userland APIs. Dispensing the need for hacks like unnecessary |
This is a list of currently **globally** reserved words that will become **semi-reserved** in case proposed change gets approved: | This is a list of currently **globally** reserved words that will become **semi-reserved** in case proposed change gets approved: | ||
Line 48: | Line 47: | ||
namespace | namespace | ||
print echo require | print echo require | ||
- | function | + | function |
==== Limitations ==== | ==== Limitations ==== | ||
- | On purporse, it's still forbidden to define '' | + | On purpose, it's still forbidden to define a **class constant** named as '' |
- | + | ||
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | + | ||
- | So the following code would still be invalid: | + | |
- | + | ||
- | <code php> | + | |
- | class Foo { | + | |
- | const public|protected|private|static|abstract|final|class = ' | + | |
- | public function public|protected|private|static|abstract|final(){} // Fatal error | + | |
- | } | + | |
- | + | ||
- | // Fatal error: Cannot use %s as class member name as it is reserved in %s on line %d | + | |
- | </ | + | |
- | + | ||
- | On purporse, it's still forbidden to define a **class constant** named as '' | + | |
<code php> | <code php> | ||
Line 82: | Line 61: | ||
</ | </ | ||
- | '' | + | In practice, it means that we would drop from **64** to only **1** reserved word that affects only class constant names. |
+ | |||
+ | '' | ||
<code php> | <code php> | ||
Line 91: | Line 72: | ||
(new Foo)-> | (new Foo)-> | ||
</ | </ | ||
- | |||
- | In practice, it means that we would drop from **64** to only **6** **globally** reserved words. | ||
===== Practical Examples ===== | ===== Practical Examples ===== | ||
Line 174: | Line 153: | ||
With the approval of the current RFC, BC breaks surface would be much smaller in such cases. | With the approval of the current RFC, BC breaks surface would be much smaller in such cases. | ||
- | One notable example is the **in** operator RFC. Without a context sensitive lexer, proposed here, the new operator would create a BC break on **Doctrine** library and pretty much many other SQL writers or ORMS out there: | + | One notable example is the **in** operator RFC. Without a context sensitive lexer, proposed here, the new operator would create a BC break on **Doctrine** library and pretty much many other SQL writers or ORMs out there: |
https:// | https:// | ||
===== Implementation Details ===== | ===== Implementation Details ===== | ||
+ | |||
+ | ==== Patch 1 - Discarded ==== | ||
The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable. | The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable. | ||
- | For instance, the lexing rules to disambiguate '':: | + | For instance, the lexing rules to disambiguate '':: |
<code c++> | <code c++> | ||
Line 204: | Line 185: | ||
</ | </ | ||
- | Current proposed patch: | + | ==== Patch 2 ==== |
- | * Doesn't require '' | + | A new patch has been added during the voting phase. It's a different approach that proved |
- | * Keeps ext tokenizer functional | + | |
- | * Introduces no maintenance issues | + | |
- | * Has no performance impact | + | |
- | * Introduces a minimal amount of changes on lexer | + | |
- | => Many experiments | + | The new patch just requires the maintenance of a single inclusive parser rule listing all tokens that should be matched as a '' |
+ | |||
+ | - It offers no regression | forward compatibility risks and is highly predictable | ||
+ | - It has a very small footprint when compared to the previous attempt involving a pure lexical approach | ||
+ | - Requires no compile time checks | ||
+ | - Is highly configurable, | ||
+ | |||
+ | In order to send information to the lexer about the context change, we just have to use '' | ||
+ | |||
+ | <code c> | ||
+ | // before | ||
+ | method_modifiers function returns_ref T_STRING ' | ||
+ | |||
+ | // after | ||
+ | method_modifiers function returns_ref identifier ' | ||
+ | </ | ||
+ | |||
+ | ===== Future Work And Maintenance ===== | ||
+ | |||
+ | * All php-src tests are passing | ||
+ | |||
+ | => The first patch has been discarded during discussion on voting phase. It was considered | ||
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== | ||
This is proposed for the next PHP x, which at the time of this writing would be PHP 7. | This is proposed for the next PHP x, which at the time of this writing would be PHP 7. | ||
+ | |||
+ | ===== Votes ===== | ||
+ | |||
+ | This voting requires a 2/3 majority. The implementation will be evaluated on internals mailing list and will only be merged if it's | ||
+ | considered good enough, independently of the voting results. The RCF author encourages voting for the feature. | ||
+ | |||
+ | <doodle title=" | ||
+ | * Yes | ||
+ | * No | ||
+ | </ | ||
+ | |||
+ | Voting started on 2015-02-28 and ends on 2015-03-14. | ||
===== Patch ===== | ===== Patch ===== | ||
- | | + | ==== Patch 1 - Discarded ==== |
- | - Pull request with all the tests and regenerated ext tokenizer | + | |
+ | - Pull request with all the tests and regenerated ext tokenizer | ||
+ | |||
+ | ==== Patch 2 ==== | ||
+ | |||
+ | - Pull request with all the tests is at [[https:// | ||
+ | |||
+ | ==== Later Changes === | ||
+ | |||
+ | The *Patch 2* was merged and, later, method modifiers were allowed as class member names. This was a limitation from the older implementation candidate - Patch 1 - and there was no reason to keep it. The **Limitations** section was updated accordingly. Only the keyword **class** for class constants is reserved now. | ||
===== References ===== | ===== References ===== | ||
Line 229: | Line 248: | ||
===== Rejected Features ===== | ===== Rejected Features ===== | ||
- | None so far. | + | * Prior to voting, the support for '' |
+ | |||
+ | => The RFC author will try to solve the wider problem on PHP 7.1 | ||
===== Changelog ===== | ===== Changelog ===== | ||
Line 236: | Line 257: | ||
* 0.3: Oops. Add forgotten support for typehints | * 0.3: Oops. Add forgotten support for typehints | ||
* 0.4: Reverts to 0.1 feature set because class name support created undesired situations regarding the future addition of a future short lambda syntax and possibly block other language changes. | * 0.4: Reverts to 0.1 feature set because class name support created undesired situations regarding the future addition of a future short lambda syntax and possibly block other language changes. | ||
+ | * 0.4.1: A new compatible implementation has been introduced | ||
===== Acknowledgements ===== | ===== Acknowledgements ===== |
rfc/context_sensitive_lexer.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1