====== PHP RFC: token_get_all() flag to return consistent elements ====== * Version: 1.1 * Date: 2016-01-04 * Author: Sara Golemon * Status: Under Discussion * First Published at: http://wiki.php.net/rfc/token-get-always-tokens ===== Introduction ===== token_get_all() returns an array of tokens where each token element is either a single-character (for single-character tokens), or an array describing the token's ID, line number, and text content. For example, token_get_all(" Array ( [0] => int(374) [1] => string(6)" int(1) ) [1] => string(1)";" ) This makes writing tools which use the scanner awkward, and it actually hides scanner info (The line number, stored in sub-element [2]). ===== Proposal ===== This proposal aims to normalize the output of token_get_all (when requested) by always using associative arrays as the sub-elements in the output. For example, token_get_all(" Array ( [id] => int(374) [text] => string(6)" int(1) ) [1] => Array ( [id] => int(59) // 59 == ord(';') [text] => string(1) ";" [line] => int(1) ) ) Note the use of a new constant TOKEN_ASSOC to be used with the flags parameter introduced in PHP 7.0 ==== Additional changes ==== In order to reduce boilerplate in code which uses token_get_all(), the token_name() function will be updated to so that token_name($element['token']) is always a valid call. That is, single-character token values will return the character value for that ordinal. In terms of psuedo-code: function token_name($id) { if ($id < 256) { return chr($id); } return current_token_name($id); } ==== New Constants ==== TOKEN_ASSOC - When present, token_get_all() will use the new format ===== Future Scope ===== Possibly add additional fields such as character position, tokenizer state, etc... ===== Proposed Voting Choices ===== Introduce TOKEN_ASSOC and new scanner output format? 50% majority required