PHP RFC: token_get_all() flag to return consistent elements
- Version: 1.1
- Date: 2016-01-04
- Author: Sara Golemon pollita@php.net
- Status: Under Discussion
- First Published at: http://wiki.php.net/rfc/token-get-always-tokens
Introduction
token_get_all() returns an array of tokens where each token element is either a single-character (for single-character tokens), or an array describing the token's ID, line number, and text content. For example, token_get_all(“<?php ;”) returns:
Array ( [0] => Array ( [0] => int(374) [1] => string(6)"<?php " [2] => int(1) ) [1] => string(1)";" )
This makes writing tools which use the scanner awkward, and it actually hides scanner info (The line number, stored in sub-element [2]).
Proposal
This proposal aims to normalize the output of token_get_all (when requested) by always using associative arrays as the sub-elements in the output. For example, token_get_all(“<?php ;”, TOKEN_ASSOC) would output:
Array ( [0] => Array ( [id] => int(374) [text] => string(6)"<?php " [line] => int(1) ) [1] => Array ( [id] => int(59) // 59 == ord(';') [text] => string(1) ";" [line] => int(1) ) )
Note the use of a new constant TOKEN_ASSOC to be used with the flags parameter introduced in PHP 7.0
Additional changes
In order to reduce boilerplate in code which uses token_get_all(), the token_name() function will be updated to so that token_name($element['token']) is always a valid call. That is, single-character token values will return the character value for that ordinal.
In terms of psuedo-code:
function token_name($id) { if ($id < 256) { return chr($id); } return current_token_name($id); }
New Constants
TOKEN_ASSOC - When present, token_get_all() will use the new format
Future Scope
Possibly add additional fields such as character position, tokenizer state, etc...
Proposed Voting Choices
Introduce TOKEN_ASSOC and new scanner output format? 50% majority required