This is an old revision of the document!
PHP RFC: Context Sensitive Lexer
- Version: 0.4
- Date: 2015-02-15
- Author: Márcio Almada
- Status: Draft
- First Published at: http://wiki.php.net/rfc/context_sensitive_lexer
Introduction
PHP currently has around 64 globally reserved words. Not infrequently, these reserved words end up clashing with legit alternatives to userland API declarations. This RFC proposes a partial solution to this by adding minimal changes to have a context sensitive lexer with support for semi-reserved words.
For instance, if the RFC gets accepted, code like the following would become possible:
class Collection { public function forEach(callable $callback) { /* */ } public function list() { /* */ } }
Notice that it's currently not possible to have the foreach
and list
method delcared without having a syntax error:
PHP Parse error: Syntax error, unexpected T_FOREACH, expecting T_STRING on line 2 PHP Parse error: Syntax error, unexpected T_LIST, expecting T_STRING on line 3
Proposal
This RFC revisits the topic of Keywords as Identifiers RFC. But this time presenting a minimal and maintainable patch, restricted to OO scope only, consistently comprehending:
- Properties, constants and methods defined on classes, interfaces and traits
- Access of properties, constants and methods from objects and classes
The proposed changes could be especially useful to:
- Reduce the surface of BC breaks whenever new keywords are introduced
- Avoid restricting userland APIs. Dispensing the need for hacks like unecessary magic method calls, prefixed identifiers or the usage of a thesaurus to avoid naming conflicts.
This is a list of currently globally reserved words that will become semi-reserved in case proposed change gets approved:
callable class trait extends implements static abstract final public protected private const enddeclare endfor endforeach endif endwhile and global goto instanceof insteadof interface namespace new or xor try use var exit list clone include include_once throw array print echo require require_once return else elseif default break continue switch yield function if endswitch finally for foreach declare case do while as catch die self
Limitations
On purporse, it's still forbidden to define class|object
constants and methods named as:
public
protected
private
abstract
final
static
So the following code would still be invalid:
class Foo { const public|protected|private|static|abstract|final|class = 'foo'; // Fatal error function public|protected|private|static|abstract|final(){} // Fatal error } // Fatal error: Cannot decalre a class const named as %d as it is reserved in %s on line %d // Fatal error: Cannot declare a class method named as %d as it is reserved in %s on line %d
On purporse, it's still forbidden to define a class constant named as class
because of the class name resolution operator ::class
:
class Foo { const class = 'Foo'; // Fatal error } // Fatal error: Cannot redefine class constant Foo::CLASS as it is reserved in %s on line %d
class|object
properties can have any name because PHP has sigils and code like the followiing has always been allowed:
class Foo { public $list = 'list'; } (new Foo)->list;
In practice, it means that we would drop from 64 to only 6 globally reserved words.
Practical Examples
Some practical examples related to the impact this RFC could have on user space code:
The proposed change, if approved, gives more freedom to userland fluent interfaces or DSL like APIs.
// the following example works with patch // but currently fails because 'for', 'and', 'or', 'list' are globally reserved words: $projects = Finder::for('project') ->where('name')->like('%secret%') ->and('priority', '>', 9) ->or('code')->in(['4', '5', '7']) ->and()->not('created_at')->between([$time1, $time2]) ->list($limit, $offset);
// the following example works with the patch // but currently fails because 'foreach', 'list' and 'new' are globally reserved words: class Collection extends \ArrayAccess, \Countable, \IteratorAggregate { public function forEach(callable $callback) { //... } public function list() { //... } public static function new(array $itens) { return new self($itens); } } Collection::new(['foo', 'bar'])->forEach(function($index, $item){ /* callback */ })->list();
Globally reserved words end up limiting userland implementations on being the most expressive and semantic as possible:
// the following example works with the patch // but currently fails because 'include' is a globally reserved word: class View { public function include(View $view) { //... } } $viewA = new View('a.view'); $viewA->include(new View('b.view'));
Sometimes there is simply no better name for a class constant. One might want to define an HTTP agent class and would like to have some HTTP status constants:
class HTTP { const CONTINUE = 100; // works with patch // but currently fails because 'continue' is a globally reserved word const SWITCHING_PROTOCOLS = 101; /* ... */ }
Impact On Other RFCs
Some RFCs are proposing to reserve new keywords in order to add features or reserve typehints names:
With the approval of the current RFC, BC breaks surface would be much smaller in such cases.
One notable example is the in operator RFC. Without a context sensitive lexer, proposed here, the new operator would create a BC break on Doctrine library and pretty much many other SQL writers or ORMs out there:
https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/Query/Expr.php#L443
Implementation Details
The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable.
For instance, the lexing rules to disambiguate ::class
(class name resolution operator) from a class constant
or static method
access is:
<ST_IN_SCRIPTING>"::"/{OPTIONAL_WHITESPACE}"class" { return T_PAAMAYIM_NEKUDOTAYIM; } <ST_IN_SCRIPTING>"::"/{OPTIONAL_WHITESPACE}("$"|{LABEL}){OPTIONAL_WHITESPACE}"("? { yy_push_state(ST_LOOKING_FOR_SEMI_RESERVED_NAME); return T_PAAMAYIM_NEKUDOTAYIM; }
A few additional compile time check were created:
if(ZEND_NOT_RESERVED != zend_check_reserved_method_name(decl->name)) { zend_error_noreturn(E_COMPILE_ERROR, "Cannot use '%s' as class method name as it is reserved", decl->name->val); }
Current proposed patch:
- Doesn't require
lexical feedback
(passing information from parser to lexer) - Keeps ext tokenizer functional
- Introduces no maintenance issues
- Has no performance impact
- Introduces a minimal amount of changes on lexer
=> Many experiments with parsing were done before the current proposed patch which involves only lexing changes. But turns out the patches involving parsing had too many disadvantages and maintence issues.
Proposed PHP Version(s)
This is proposed for the next PHP x, which at the time of this writing would be PHP 7.
Patch
- Most relevant commit is 5405e69, in case you would like to focus only on the important changes and skip the long tests.
- Pull request with all the tests and regenerated ext tokenizer is at https://github.com/php/php-src/pull/1054/files
References
This is the previous rejected RFC that attempted to remove reserved words on all contexts: https://wiki.php.net/rfc/keywords_as_identifiers.
Rejected Features
None so far.
Changelog
- 0.1: Initial draft with support for class, interfaces and trait members
- 0.2: Additional support to namespaces, classes, interafces and traits names
- 0.3: Oops. Add forgotten support for typehints
- 0.4: Reverts to 0.1 feature set because class name support created undesired situations regarding the future addition of a future short lambda syntax and possibly block other language changes.
Acknowledgements
Thanks to:
- Bob Weinand, author of the last rejected RFC on the same topic, for giving honest feedback and being cooperative all the time.
- Nikita Popov for providing accurate information about the PHP implementation and constructive criticism.
- Anthony Ferrara, Joe Watkins and Daniel Ackroyd for the quick reviews.
- All people on http://chat.stackoverflow.com/rooms/11/php