token_get_all() returns an array of tokens where each token element is either a single-character (for single-character tokens), or an array describing the token's ID, line number, and text content. For example, token_get_all(“<?php ;”) returns:
Array (
[0] => Array (
[0] => int(374)
[1] => string(6)"<?php "
[2] => int(1)
)
[1] => string(1)";"
)
This makes writing tools which use the scanner awkward, and it actually hides scanner info (The line number, stored in sub-element [2]).
This proposal aims to normalize the output of token_get_all (when requested) by always using associative arrays as the sub-elements in the output. For example, token_get_all(“<?php ;”, TOKEN_ASSOC) would output:
Array (
[0] => Array (
[id] => int(374)
[text] => string(6)"<?php "
[line] => int(1)
)
[1] => Array (
[id] => int(59) // 59 == ord(';')
[text] => string(1) ";"
[line] => int(1)
)
)
Note the use of a new constant TOKEN_ASSOC to be used with the flags parameter introduced in PHP 7.0
In order to reduce boilerplate in code which uses token_get_all(), the token_name() function will be updated to so that token_name($element['token']) is always a valid call. That is, single-character token values will return the character value for that ordinal.
In terms of psuedo-code:
function token_name($id) {
if ($id < 256) {
return chr($id);
}
return current_token_name($id);
}
TOKEN_ASSOC - When present, token_get_all() will use the new format
Possibly add additional fields such as character position, tokenizer state, etc...
Introduce TOKEN_ASSOC and new scanner output format? 50% majority required