rfc:context_sensitive_lexer

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:context_sensitive_lexer [2015/03/02 04:23] – typos typos typos marciorfc:context_sensitive_lexer [2017/09/22 13:28] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== PHP RFC: Context Sensitive Lexer ====== ====== PHP RFC: Context Sensitive Lexer ======
-  * Version: 0.4+  * Version: 0.4.1
   * Date: 2015-02-15   * Date: 2015-02-15
   * Author: Márcio Almada   * Author: Márcio Almada
-  * Status: Voting (previously Under Discussion)+  * Status: Implemented (in PHP 7.0)
   * First Published at: http://wiki.php.net/rfc/context_sensitive_lexer   * First Published at: http://wiki.php.net/rfc/context_sensitive_lexer
  
Line 31: Line 31:
  
 This RFC revisits the topic of [[https://wiki.php.net/rfc/keywords_as_identifiers|Keywords as Identifiers]] RFC. But this time This RFC revisits the topic of [[https://wiki.php.net/rfc/keywords_as_identifiers|Keywords as Identifiers]] RFC. But this time
-presenting a minimal and maintainable [[https://github.com/marcioAlmada/php-src/commit/d9d6f0c7e325dcd0d0ff3c3f2dc73c2364c3ad5f|patch]], +presenting a minimal and maintainable [[https://github.com/php/php-src/pull/1221|patch]], restricted to OO scope only, consistently comprehending:
-restricted to OO scope only, consistently comprehending:+
  
   * Properties, constants and methods defined on classes, interfaces and traits   * Properties, constants and methods defined on classes, interfaces and traits
Line 52: Line 51:
 ==== Limitations ==== ==== Limitations ====
  
-On purporse, it's still forbidden to define ''class|object'' constants and methods named as: +On purpose, it's still forbidden to define a **class constant** named as ''class'' because of the class name resolution ''::class'':
- +
-  * ''public'' +
-  * ''protected'' +
-  * ''private'' +
-  * ''abstract'' +
-  * ''final'' +
-  * ''static'' +
- +
-So the following code would still be invalid: +
- +
-<code php> +
-class Foo { +
-  const public|protected|private|static|abstract|final|class = 'foo'; // Fatal error +
-  function public|protected|private|static|abstract|final(){} // Fatal error +
-+
- +
-// Fatal error: Cannot declare a class const named as %d as it is reserved in %s on line %d +
-// Fatal error: Cannot declare a class method named as %d as it is reserved in %s on line %d +
-</code> +
- +
-On purporse, it's still forbidden to define a **class constant** named as ''class'' because of the class name resolution operator ''::class'':+
  
 <code php> <code php>
Line 82: Line 60:
 // Fatal error: Cannot redefine class constant Foo::CLASS as it is reserved in %s on line %d // Fatal error: Cannot redefine class constant Foo::CLASS as it is reserved in %s on line %d
 </code> </code>
 +
 +In practice, it means that we would drop from **64** to only **1** reserved word that affects only class constant names.
  
 ''class|object'' properties **can** have any name because PHP has sigils and code like the following has always been allowed: ''class|object'' properties **can** have any name because PHP has sigils and code like the following has always been allowed:
Line 92: Line 72:
 (new Foo)->list; (new Foo)->list;
 </code> </code>
- 
-In practice, it means that we would drop from **64** to only **6** **globally** reserved words. 
  
 ===== Practical Examples ===== ===== Practical Examples =====
Line 180: Line 158:
  
 ===== Implementation Details ===== ===== Implementation Details =====
 +
 +==== Patch 1 - Discarded ====
  
 The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable. The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable.
Line 205: Line 185:
 </code> </code>
  
-Current proposed patch:+==== Patch 2 ====
  
-  * Doesn't require ''lexical feedback'' (passing information from parser to lexer) +A new patch has been added during the voting phase. It's a different approach that proved to have many advantages over the first patch and therefore it is intended to supersede it.
-  * Keeps ext tokenizer functional +
-  * Introduces no maintenance issues +
-  * Has no performance impact +
-  * Introduces a minimal amount of changes on lexer+
  
-=> Many experiments with parsing were done before the current proposed patch which involves only lexing changesBut turns out the patches involving parsing had too many disadvantages and maintenance issues.\\+The new patch just requires the maintenance of a single inclusive parser rule listing all tokens that should be matched as a ''T_STRING'' on specific places: 
 + 
 +  - It offers no regression | forward compatibility risks and is highly predictable 
 +  - It has a very small footprint when compared to the previous attempt involving a pure lexical approach 
 +  - Requires no compile time checks 
 +  - Is highly configurable, to make a word semi-reserved you only have to edit an inclusive parser rule. 
 + 
 +In order to send information to the lexer about the context change, we just have to use ''identifier'' instead of ''T_STRING'' when applicableFor instance this is the needed changes on the parser grammar to allow semi reserved words on method names: 
 + 
 +<code c> 
 +// before 
 +method_modifiers function returns_ref T_STRING '(' parameter_list ')' //... 
 + 
 +// after 
 +method_modifiers function returns_ref identifier '(' parameter_list ')' //... 
 +</code>
  
 ===== Future Work And Maintenance ===== ===== Future Work And Maintenance =====
  
-All php-src tests are passing and the patch was also verified against great number of open source libraries while trying to detect possible syntax regressionstherefore it's considered stableBut in case any bug appears, the RFC author will be available to pull request any necessary fix in time for PHP7 release.+  * All php-src tests are passing with the new patch, some work still has to be done. There is better possibility to expand semi reserved words support to namespaces and class names with the new patchbut this more ambitious proposal will be tailored only for PHP 7.1 by the RFC author
 + 
 +=> The first patch has been discarded during discussion on voting phase. It was considered too "ad-hoc" and could cause issues for PHP 7.1 and ahead.
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
Line 225: Line 218:
 ===== Votes ===== ===== Votes =====
  
-This voting requires a 2/3 majority.+This voting requires a 2/3 majority. The implementation will be evaluated on internals mailing list and will only be merged if it's 
 +considered good enough, independently of the voting results. The RCF author encourages voting for the feature.
  
-<doodle title="Should PHP7 have a context sensitive lexer?" auth="marcio" voteType="single" closed="false">+<doodle title="Should PHP7 have a context sensitive lexer?" auth="marcio" voteType="single" closed="true">
    * Yes    * Yes
    * No    * No
Line 236: Line 230:
 ===== Patch ===== ===== Patch =====
  
-  Most relevant commit is [[https://github.com/marcioAlmada/php-src/commit/5405e69b7c33885bb13231c60146d6ba95103afb|5405e69]], in case you would like to focus only on the important changes and skip the long tests. +==== Patch 1 Discarded ==== 
-  - Pull request with all the tests and regenerated ext tokenizer is at [[https://github.com/php/php-src/pull/1054/files]]+ 
 +  - Pull request with all the tests and regenerated ext tokenizer is at [[https://github.com/php/php-src/pull/1054]] 
 + 
 +==== Patch 2 ==== 
 + 
 +  - Pull request with all the tests is at [[https://github.com/php/php-src/pull/1221/]] 
 + 
 +==== Later Changes === 
 + 
 +The *Patch 2* was merged and, later, method modifiers were allowed as class member names. This was a limitation from the older implementation candidate - Patch 1 - and there was no reason to keep it. The **Limitations** section was updated accordingly. Only the keyword **class** for class constants is reserved now.
  
 ===== References ===== ===== References =====
Line 245: Line 248:
 ===== Rejected Features ===== ===== Rejected Features =====
  
-None so far.+ * Prior to voting, the support for ''namespaces|classes|traits|interfaces'' names has been removed from the first patch as it could create some possible issues. 
 + 
 +=> The RFC author will try to solve the wider problem on PHP 7.1
  
 ===== Changelog ===== ===== Changelog =====
Line 252: Line 257:
   * 0.3: Oops. Add forgotten support for typehints   * 0.3: Oops. Add forgotten support for typehints
   * 0.4: Reverts to 0.1 feature set because class name support created undesired situations regarding the future addition of a future short lambda syntax and possibly block other language changes.   * 0.4: Reverts to 0.1 feature set because class name support created undesired situations regarding the future addition of a future short lambda syntax and possibly block other language changes.
 +  * 0.4.1: A new compatible implementation has been introduced
  
 ===== Acknowledgements ===== ===== Acknowledgements =====
rfc/context_sensitive_lexer.1425270234.txt.gz · Last modified: 2017/09/22 13:28 (external edit)