rfc:token_as_object

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
rfc:token_as_object [2020/02/15 15:46] nikicrfc:token_as_object [2020/04/23 10:50] nikic
Line 2: Line 2:
   * Date: 2020-02-13   * Date: 2020-02-13
   * Author: Nikita Popov <nikic@php.net>   * Author: Nikita Popov <nikic@php.net>
-  * Status: Under Discussion+  * Status: Implemented
   * Target Version: PHP 8.0   * Target Version: PHP 8.0
   * Implementation: https://github.com/php/php-src/pull/5176   * Implementation: https://github.com/php/php-src/pull/5176
Line 32: Line 32:
          
     final public function __construct(int $id, string $text, int $line = -1, int $pos = -1);     final public function __construct(int $id, string $text, int $line = -1, int $pos = -1);
 +
 +    /** Get the name of the token. */
 +    public function getTokenName(): ?string;
 +    
 +    /**
 +     * Whether the token has the given ID, the given text,
 +     * or has an ID/text part of the given array.
 +     
 +     * @param int|string|array $kind
 +     */
 +    public function is($kind): bool;
 +
 +    /** Whether this token would be ignored by the PHP parser. */
 +    public function isIgnorable(): bool;
 } }
 </PHP> </PHP>
Line 73: Line 87:
  
 To guarantee a well-defined construction behavior, the ''PhpToken'' constructor is final and cannot be overridden by child classes. This matches the extension approach of the ''SimpleXMLElement'' class. To guarantee a well-defined construction behavior, the ''PhpToken'' constructor is final and cannot be overridden by child classes. This matches the extension approach of the ''SimpleXMLElement'' class.
- 
-===== Open Questions ===== 
  
 ==== Additional methods ==== ==== Additional methods ====
  
-There are a few useful helper methods that could be added to the ''PhpToken'' class. Three suggestions are given as PHP code below. The ''is()'' method is useful helpervariations of which will be found in many libraries processing token streams. ''isIgnorable()'' helps the particularly common case of skipping whitespace-like tokens. ''getTokenName()'' avoids going through ''token_name()'' for debug output.+The ''PhpToken'' class defines few additional methods, which are defined in terms of the reference-implementations given below.
  
 <PHP> <PHP>
-class PhpToken +public function getTokenName(): ?string 
-    /** Whether the token has the given ID, the given text+    if ($this->id < 256) { 
-     *  or has an ID/text part of the given array*/ +        return chr($this->id); 
-    public function is($kind): bool { +    } elseif ('UNKNOWN' !== $name = token_name($this->id)) { 
-        if (is_array($kind)) { +        return $name; 
-            foreach ($kind as $singleKind) { +    } else { 
-                if (is_string($singleKind)) { +        return null; 
-                    if ($this->text === $singleKind) { +    } 
-                        return true; +
-                    +</PHP> 
-                } else if (is_int($singleKind)) { + 
-                    if ($this->id === $singleKind) { +''getTokenName()'' is mainly useful for debugging purposes. For single-char tokens with IDs below 256, it returns the extended ASCII character corresponding to the ID. For known tokensit returns the same result as ''token_name()''. For unknown tokensit returns null. 
-                        return true+ 
-                    } +It should be noted that tokens that are not known to PHP are commonly used, for example when emulating lexer behavior from future PHP versions. In this case custom token IDs are used, so they should be handled gracefully
-                } else { + 
-                    throw new TypeError("Kind array must have elements of type int or string");+<PHP> 
 +public function is($kind): bool { 
 +    if (is_array($kind)) { 
 +        foreach ($kind as $singleKind) { 
 +            if (is_string($singleKind)) { 
 +                if ($this->text === $singleKind) { 
 +                    return true;
                 }                 }
 +            } else if (is_int($singleKind)) {
 +                if ($this->id === $singleKind) {
 +                    return true;
 +                }
 +            } else {
 +                throw new TypeError("Kind array must have elements of type int or string");
             }             }
-            return false; 
-        } else if (is_string($kind)) { 
-            return $this->text === $kind; 
-        } else if (is_int($kind)) { 
-            return $this->id === $kind, 
-        } else { 
-            throw new TypeError("Kind must be of type int, string or array"); 
         }         }
 +        return false;
 +    } else if (is_string($kind)) {
 +        return $this->text === $kind;
 +    } else if (is_int($kind)) {
 +        return $this->id === $kind,
 +    } else {
 +        throw new TypeError("Kind must be of type int, string or array");
     }     }
 +}
 +</PHP>
  
-    /** Whether this token would be ignored by the PHP parser. */ +The ''is()'' method allows checking for certain tokens, while abstracting over whether it is a single-char token ''%%$token->is(';')%%'', a multi-char token ''%%$token->is(T_FUNCTION)%%'', or whether multiple tokens are allowed ''%%$token->is([T_CLASST_TRAITT_INTERFACE])%%''.
-    public function isIgnorable(): bool { +
-        return $this->is([ +
-            T_WHITESPACE, +
-            T_COMMENT, +
-            T_DOC_COMMENT, +
-            T_OPEN_TAG, +
-        ])+
-    }+
  
-    /** Get the name of the token. *+While non-generic code can easily check the appropriate property, such as ''%%$token->text == ';'%%'' or ''%%$token->id == T_FUNCTION%%'', token stream implementations commonly need to be generic over different token kinds and need to support specification of multiple token kindsFor example: 
-    public function getTokenName(): string + 
-        if ($this->id 256) { +<PHP> 
-            return chr($this->id)+// An example, NOT part of the PhpToken interface. 
-        } else +public function findRight($pos, $findTokenKind) { 
-            return token_name($this->id);+    $tokens = $this->tokens; 
 +    for ($count = \count($tokens); $pos $count; $pos++) { 
 +        if ($tokens[$pos]->is($findTokenKind)) { 
 +            return $pos;
         }         }
     }     }
 +    return -1;
 } }
 </PHP> </PHP>
 +
 +These kinds of search/skip/check APIs benefit from having an efficient native implementation of ''is()''.
 +
 +<PHP>
 +public function isIgnorable(): bool {
 +    return $this->is([
 +        T_WHITESPACE,
 +        T_COMMENT,
 +        T_DOC_COMMENT,
 +        T_OPEN_TAG,
 +    ]);
 +}
 +</PHP>
 +
 +As a special case, it is very common that whitespace and comments need to be skipped during token processing. The ''isIgnorable()'' method determines whether a token is ignored by the PHP parser.
  
 ===== Rejected Features ===== ===== Rejected Features =====
Line 144: Line 181:
 ===== Vote ===== ===== Vote =====
  
-Yes No.+Voting opened 2020-03-06 and closes 2020-03-20. 
 + 
 +<doodle title="Add object-based token_get_all() alternative?" auth="nikic" voteType="single" closed="true"> 
 +   Yes 
 +   No 
 +</doodle>
  
rfc/token_as_object.txt · Last modified: 2020/11/12 13:33 by nikic