rfc:context_sensitive_lexer

This is an old revision of the document!


PHP RFC: Context Sensitive Lexer

Introduction

PHP currently has around 64 globally reserved words. Not infrequently, these reserved words end up clashing with legit alternatives to userland API declarations. This RFC proposes a partial solution to this by adding minimal changes to have a context sensitive lexer with support for semi-reserved words.

For instance, if the RFC gets accepted, code like the following would become possible:

class Collection {
    public function forEach(callable $callback) { /* */ }
}
 
class List {
    public function append(List $list) { /* */ }
}

Notice that it's currently not possible to have the foreach method and List class delcared without having a syntax error:

PHP Parse error: Syntax error, unexpected T_FOREACH, expecting T_STRING on line 2
PHP Parse error: Syntax error, unexpected T_LIST, expecting T_STRING on line 5

Proposal

This RFC revisits the topic of Keywords as Identifiers RFC. But this time presenting a minimal and maintainable patch, restricted to OO scope only, consistently comprehending:

  • Namespace, class, trait and interface names
  • Properties, constants and methods defined on classes, interfaces and traits
  • Access of properties, constants and methods from objects and classes

The proposed changes could be especially useful to:

  1. Reduce the surface of BC breaks whenever new keywords are introduced
  2. Avoid restricting userland APIs. Dispensing the need for hacks like unecessary magic method calls, prefixed identifiers or the usage of a thesaurus to avoid naming conflicts.

This is a list of currently globally reserved words that will become semi-reserved in case proposed change gets approved:

callable  class  trait  extends  implements  static  abstract  final  public  protected  private  const
enddeclare  endfor  endforeach  endif  endwhile  and  global  goto  instanceof  insteadof  interface
namespace  new  or  xor  try  use  var  exit  list  clone  include  include_once  throw  array
print  echo  require  require_once  return  else  elseif  default  break  continue  switch  yield
function  if  endswitch  finally  for  foreach  declare  case  do  while  as  catch  die  self

Limitations

On purporse, it's still forbidden to define a namespace, class, interface or trait named as:

  • namespace
  • self
  • static
  • parent
  • array
  • callable
namespace|class|interface|trait Namespace {}  // Fatal error
namespace|class|interface|trait Self {}       // Fatal error
namespace|class|interface|trait Static {}     // Fatal error
namespace|class|interface|trait Parent {}     // Fatal error
namespace|class|interface|trait Array {}      // Fatal error
namespace|class|interface|trait Callable {}   // Fatal error
 
// Fatal error: Cannot use %s as %s name as it is reserved in %s on line %d

On purporse, it's still forbidden to define a class constant named as class because of the class name resolution operator ::class:

class Foo {
  const class = 'Foo'; // Fatal error
}
 
// Fatal error: Cannot redefine Foo::class, ::class is reserved in %s on line 2

Practical Examples

Some practical examples related to the impact this RFC could have on user space code:

The proposed change, if approved, gives more freedom to userland fluent interfaces or DSL like APIs.

// the following example works with patch
// but currently fails because 'for', 'and', 'or', 'list' are globally reserved words:
 
$projects =
    Finder::for('project')
        ->where('name')->like('%secret%')
        ->and('priority', '>', 9)
        ->or('code')->in(['4', '5', '7'])
        ->and()->not('created_at')->between([$time1, $time2])
        ->list($limit, $offset);
// the following example works with the patch
// but currently fails because 'foreach', 'list' and 'new' are globally reserved words:
 
class Collection extends \ArrayAccess, \Countable, \IteratorAggregate {
 
    public function forEach(callable $callback) {
        //...
    }
 
    public function list() {
        //...
    }
 
    public static function new(array $itens) {
        return new self($itens);
    }
}
 
Collection::new(['foo', 'bar'])->forEach(function($index, $item){
  /* callback */
})->list();

Globally reserved words end up limiting userland implementations on being the most expressive and semantic as possible:

// the following example works with the patch
// but currently fails because 'include' is a globally reserved word:
 
class View {
    public function include(View $view) {
        //...
    }
}
 
$viewA = new View('a.view');
$viewA->include(new View('b.view'));

Sometimes there is simply no better name for a class constant. One might want to define an HTTP agent class and would like to have some HTTP status constants:

class HTTP {
    const CONTINUE = 100; // works with patch
                          // but currently fails because 'continue' is a globally reserved word
    const SWITCHING_PROTOCOLS = 101;
    /* ... */
}

Implementation Details

The lexer now keeps track of the context needed to have unreserved words on OO scope and makes use of a minimal amount of RE2C lookahead capabilities when disambiguation becomes inevitable.

For instance, the lexing rules to disambiguate ::class (class name resolution operator) from a class constant, static variable or static method access is:

<ST_IN_SCRIPTING>"::"/{OPTIONAL_WHITESPACE}"class" {
  return T_PAAMAYIM_NEKUDOTAYIM;
}
 
<ST_IN_SCRIPTING>"::"/{OPTIONAL_WHITESPACE}("$"|{LABEL}){OPTIONAL_WHITESPACE}"("? {
  yy_push_state(ST_LOOKING_FOR_SEMI_RESERVED_NAME);
  return T_PAAMAYIM_NEKUDOTAYIM;
}

One additional compile time check was created:

if (zend_string_equals_literal_ci(name, "class")) {
  zend_error_noreturn(E_COMPILE_ERROR, "Cannot redefine %s::%s as ::%s is reserved",
    ce->name->val, name->val, name->val);
}

Others were just adapted because, surprisingly, most of the necessary compile time checks were already in place and just needed adjustments to restrict namespace, array and callable as names. For instance the trait name validation:

// before
if(ZEND_FETCH_CLASS_DEFAULT != zend_get_class_fetch_type(name)) {
  zend_error_noreturn(E_COMPILE_ERROR, "Cannot use '%s' as trait name as it is reserved", name->val);
}
// after
if(ZEND_FETCH_CLASS_DEFAULT != zend_check_reserved_name(name)) {
  zend_error_noreturn(E_COMPILE_ERROR, "Cannot use '%s' as trait name as it is reserved", name->val);
}

Current proposed patch:

  • Doesn't require lexical feedback (passing information from parser to lexer)
  • Keeps ext tokenizer functional
  • Introduces no maintenance issues
  • Has no performance impact
  • Introduces a minimal amount of changes on lexer

=> Many experiments with parsing were done before the current proposed patch which involves only lexing changes. But turns out the patches involving parsing had too many disadvantages and maintence issues.
=> No performance loss was noticed but maybe the patch requires a better benchmark.

Impact on performance

No loss noticed.

-- Add benchmark here if asked on discussion phase. --

Proposed PHP Version(s)

This is proposed for the next PHP x, which at the time of this writing would be PHP 7.

Open Issues

The patch may still contain small bugs related to the topics below, but this can be addressed during discussion phase:

  • I still have to add more tests involving traits and trait conflict resolution syntax
  • I still have to add more tests involving use X as Y syntax and entities with semi-reserved names

The patch is 98% implemented and complexity around it will not grow. We could even vote the RFC before finishing these small details without impact on end quality.

Patch

  1. Most relevant commit is c01014f, in case you would like to focus only on the important changes and skip the long tests.
  2. Pull request with all the tests and regenerated ext tokenizer is at https://github.com/php/php-src/pull/1054/files

References

This is the previous rejected RFC that attempted to remove reserved words on all contexts: https://wiki.php.net/rfc/keywords_as_identifiers.

Rejected Features

None so far.

Changelog

  • 0.1: Initial draft with support for class, interfaces and trait members
  • 0.2: Additional support to namespaces, classes, interafces and traits names
  • 0.3: Oops. Add forgotten support for typehints

Acknowledgements

Thanks to:

  • Bob Weinand, author of the last rejected RFC on the same topic, for giving honest feedback and being cooperative all the time.
  • Nikita Popov for providing accurate information about the PHP implementation and constructive criticism.
  • Anthony Ferrara, Joe Watkins and Daniel Ackroyd for the quick reviews.
rfc/context_sensitive_lexer.1424416176.txt.gz · Last modified: 2017/09/22 13:28 (external edit)