token_get_all() returns an array of tokens where each token element is either a single-character (for single-character tokens), or an array describing the token's ID, line number, and text content. For example, token_get_all(“<?php ;”) returns:
Array ( [0] => Array ( [0] => int(374) [1] => string(6)"<?php " [2] => int(1) ) [1] => string(1)";" )
This makes writing tools which use the scanner awkward, and it actually hides scanner info (The line number, stored in sub-element [2]).
This proposal aims to normalize the output of token_get_all (when requested) by always using associative arrays as the sub-elements in the output. For example, token_get_all(“<?php ;”, TOKEN_ASSOC) would output:
Array ( [0] => Array ( [id] => int(374) [text] => string(6)"<?php " [line] => int(1) ) [1] => Array ( [id] => int(59) // 59 == ord(';') [text] => string(1) ";" [line] => int(1) ) )
Note the use of a new constant TOKEN_ASSOC to be used with the flags parameter introduced in PHP 7.0
In order to reduce boilerplate in code which uses token_get_all(), the token_name() function will be updated to so that token_name($element['token']) is always a valid call. That is, single-character token values will return the character value for that ordinal.
In terms of psuedo-code:
function token_name($id) { if ($id < 256) { return chr($id); } return current_token_name($id); }
TOKEN_ASSOC - When present, token_get_all() will use the new format
Possibly add additional fields such as character position, tokenizer state, etc...
Introduce TOKEN_ASSOC and new scanner output format? 50% majority required