====== PHP RFC: var_type ====== * Version: 1.0 * Date: 2016-06-25 * Author: Richard Fussenegger * Status: Declined * First Published at: https://wiki.php.net/rfc/var_type ===== Introduction ===== The idiomatic way to retrieve type information of a variable in PHP is via the ''[[https://secure.php.net/gettype|gettype()]]'' function. This function suffers from upholding backwards compatibility promises, and returns unexpected results for some types. Hence, it does not reflect the current state of our type system. This RFC proposes to introduce a new function called ''var_type()'' that addresses these and more issues. ===== Proposal ===== ==== Status Quo ==== The existing ''gettype()'' function has the following mode of operation: echo gettype([]); // array echo gettype(true); // boolean echo gettype(0.0); // double echo gettype(0); // integer echo gettype(null); // NULL echo gettype(new stdClass); // object echo gettype(STDIN); // resource echo gettype(''); // string Most apparent is the usage of ''double'' instead of ''float'', as well as the longer alias versions of ''boolean'' and ''integer'' instead of ''bool'' and ''int''. These are highly unexpected results for users, who do not know the history of PHP's type system. Especially new users, who are going to learn PHP with the new and current type [[https://secure.php.net/functions.arguments#functions.arguments.type-declaration|declaration system]], will be surprised by those results. Note well that this group of users will grow in the future. Another anomaly is the all caps ''NULL'' that stands alone between all the other results. The usage of ''null'' vs. ''NULL'' is a topic that goes hand in hand with ''false'' vs. ''FALSE'' and ''true'' vs. ''TRUE''. The reason why ''NULL'' is written in all caps is most probably because it is good practice to do so in C.(([[http://pubs.opengroup.org/onlinepubs/7908799/xsh/stddef.h.html|IEEE Std 1003.1 stddef]])) However, this contradicts with the usage of ''false'' and ''true'' in lowercase everywhere in PHP, since they are written in all caps in C, too.(([[http://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdbool.h.html|IEEE Std 1003.1 stdbool]])) Note that this is not meant to say anything about the usage of **NULL**, **FALSE**, and **TRUE** in documentation, like the PHP manual. It is a good idea to keep them uppercase there, as they are right now, to make them stand out from the surrounding text. $ php -r 'var_dump(null, true, false);' NULL bool(true) bool(false) These differences already casted various discussions (e.g. [[https://www.drupal.org/node/217379|this]] or [[https://groups.drupal.org/node/3308|this]] Drupal discussion). The fact that it was possible to redefine ''FALSE'' and ''TRUE'' in PHP in the past worsens the situation further. Additionally, userland seems to have settled with all lower caps already (e.g. [[http://www.php-fig.org/psr/psr-2/#2-5-keywords-and-true-false-null|PSR-2]]). It is believed that the casing of the returned values should be consistent in order to allow users to apply an appropriate transformation to fit their needs. This is not possible with results of mixed case since they always require special conditions to account for. === Unknown Type === $fh = tmpfile(); fclose($fh); echo gettype($fh); // unknown type Trying to get the type of a closed resource results in a very unhelpful ''unknown type'' result. This is due to the fact that the implementation only considers resources that have a known resource type to be of type resource. This is consistent with the result of ''is_resource()'', but that function serves a different purpose. Note that ''gettype()'' is at the same time not consistent with ''is_object()'', when an incomplete object is passed in (''is_object()'' returns **FALSE** in such a case, whereas ''gettype()'' returns ''object''). The result of ''unknown type'' is furthermore not helpful to the user, who is trying to determine what type a variable is of. Especially not, if the result of ''gettype()'' is used as part of an error message in a complex system. It is believed that consistency with ''is_resource()'' and ''is_object()'' is not desired nor required because these functions serve completely different purposes: determining if a given variable contains a valid value of certain type. A last, admittedly weak, point is the naming of the function that does not follow the coding standards, and is bad for auto-completion in editors and IDEs. According to the coding standard, the function should be called ''get_type()''. But then it would still not auto-complete effectively. The same is true for the complement function ''settype()''. Both are consistently named to each other, however, both suffer from the same problem, as described above. Please refer to the [[#future_scope|future scope]] section in regards to ''settype()''. ==== A Successor ==== The introduction of a new function to fix these inconsistencies has the advantage that it does not affect existing code, and users have time to adjust. The name //var_type// was chosen to expand on an already existing function prefix in the PHP language that is meant for operations on variables. The signature of the new ''var_type()'' function is as follows: /** * Get the type of a variable’s current value. * * The returned value is a string that corresponds to one of the `TYPE_*` * constants: * * - {@see TYPE_ARRAY} for [arrays](https://secure.php.net/language.types.array), * - {@see TYPE_BOOL} for [booleans](https://secure.php.net/language.types.boolean), * - {@see TYPE_FLOAT} for [floats](https://secure.php.net/language.types.float), * - {@see TYPE_INT} for [integers](https://secure.php.net/language.types.integer), * - {@see TYPE_NULL} for [null](https://secure.php.net/language.types.null), * - {@see TYPE_OBJECT} for [objects](https://secure.php.net/language.types.object), * - {@see TYPE_RESOURCE} for [resources](https://secure.php.net/language.types.resource), and * - {@see TYPE_STRING} for [strings](https://secure.php.net/language.types.string). * * There is the possibility that the returned value is `unknown`, however, * this should be impossible and a bug should be filed if this situation * is encountered. * * ## Differences to Other Functions * This function will return `TYPE_OBJECT` for incomplete objects (refer * to {@see unserialize}) whereas {@see is_object} returns **FALSE**. * This is because the type of an incomplete object is an object, however, * it is not an object that should be worked with and that is the reason * why {@see is_object} correctly returns **FALSE**. * * The same is true for {@see is_resource} which returns **FALSE** for * invalid or unknown resource types ({@see get_resource_type}) while * _var_type_ returns `TYPE_RESOURCE` for the same reason as above. * * These differences also illustrate the different purposes of them. * Functions like {@see is_object} and {@see is_resource} are meant for * validating if a given variable is of a legal type that can be used to * work with. The result of _var_type_ is meant for debugging purposes and * not type checks. Hence, conditions like * `if (var_type($var) === TYPE_RESOURCE)` are not encouraged and better * replaced with `if (is_resource($var))`. * * @param mixed $var * Variable to get the type for. * @return string * Type of the variable. */ function var_type($var): string; The mode of operation is similar to the existing ''gettype()'' function, as can be seen in the extensive documentation above. However, the returned values reflect the current state of the names used for regular data types, as well as the type declarations useable in function and method signatures: echo var_type([]); // array echo var_type(true); // bool echo var_type(0.0); // float echo var_type(0); // int echo var_type(null); // null echo var_type(new stdClass); // object echo var_type(STDIN); // resource echo var_type(''); // string $fh = tmpfile(); fclose($fh); echo var_type($fh); // resource === Differences to Other Functions === The ''var_type()'' function returns ''object'' for incomplete objects which are an instance of ''__PHP_Incomplete_Class'' (see [[https://secure.php.net/unserialize|unserialize()]]), whereas ''is_object()'' returns **FALSE**. The same is true for resources of unknown resource type where ''is_resource()'' returns **FALSE**. The reason for this difference is simple: these functions serve different purposes. The ''is_*'' functions are meant for flow control and validation, while the ''var_type()'' function is meant to retrieve type information in string form for later usage. In other words, it is not encouraged nor was it ever encouraged to implement something like ''if (var_type($var) === 'resource')''. The idiomatic way to perform this kind of check is ''if (is_resource($var))''. Consistency between these functions, in this regard, is not desirable because it makes it unclear to users when to use which function, as was already mentioned earlier. At the same time, it would be confusing for users if ''var_type()'' would return unknown for incomplete objects and/or closed/invalid resources. This is because the result might be used in debugging information and/or error messages, and the user reading this information might not know at that point what went wrong. Getting unknown is by far the most unhelpful result that we could present to the users at that point; like an unknown error occurred. Especially since the type is actually **not** unknown. === Unknown Type === There is still a default path in the C code that would result in the returned type being ''unknown''. This happens if none of the existing type checks available in C results in a positive check. In other words: this should never happen. The documentation above clearly states that this is actually an impossible situation, and encourages users to file a bug with a detailed description for us to account for it, if they manage to provoke such a situation. ==== Prefix Choice ==== The function prefix ''var_'' was chosen on purpose because another possibly more suitable prefix like ''val_'' or ''value_'' would introduce a new prefix to the PHP ecosystem. It is true that this function can be used with the return value of functions too as well as with literal values, however, exactly the same argument is true for ''var_dump()'' and ''var_export()''. The goal of this RFC is it to improve consistency and not to introduce more inconsistencies. The assumption that the type of a variable is the type of its current value is furthermore logical and comprehensible. Last but not least, this function is probably not going to be used with literal values at all, since the type of them is definitely known to the developer writing the code. It might be used for return values of functions but the use cases seem very, very limited without storing the actual value to a variable first. ==== Performance ==== The new ''​var_type()''​ function is faster if full utilization of interned strings is possible because it can utilize globally cached strings for the type names. It makes no sense to optimize the old ''gettype()'' function in the same manner since caching of the old type names makes no sense, they are not useful in any other context and would occupy additional memory globally for a single function. Note well that the various ''is_*'' functions are still faster and ''var_type()'' is not meant to compete with them, again, they serve different purposes. ==== Phasing Out of gettype ==== This RFC does **not** propose a deprecation of ''gettype()'' in PHP 7.x because it is a widely used function and there are currently no plans on how to deal with its counterpart ''settype()''. Even if there would be plans, a true deprecation of ''gettype()'' would be a bad idea in any context because libraries and applications need the ability to offer their users an upgrade path. This path would be very bumpy if PHP is emitting deprecation errors upon the usage of ''gettype()''. Such a library or application fallback might be necessary in situations like we see [[https://github.com/doctrine/annotations/blob/master/lib/Doctrine/Common/Annotations/DocParser.php#L756-L789|here it in Doctrine annotations]]. The solution for such situations would be something like: if ((var_type($var) === $user_type || gettype($var) === $user_type) || $var instanceof $user_type) { // ... } However, a //soft deprecation// of ''gettype()'' is recommended, this means that the documentation page of ''gettype()'' will be updated with an informational box that recommends the usage of ''var_type()'' in favor of ''gettype()'' and any references to ''gettype()'' in the manual should be replaced with ''var_type()''. It is further recommended to tackle the deprecation and a possible removal of the ''gettype()'' function together with a proper solution for ''settype()'' in a future major release. A possible approach could be to emit an ''E_STRICT'' in PHP 8, deprecate in PHP 9, and remove in PHP 10; or any other combination that results in a long adoption period. ===== Backward Incompatible Changes ===== None ===== Proposed PHP Version(s) ===== This RFC targets the next feature release, currently 7.1.0. ===== RFC Impact ===== ==== To SAPIs ==== None ==== To Existing Extensions ==== None but the usage of the new type name constants is highly encouraged to avoid typos or the usage of discouraged names (see also [[#future_scope|Future Scope]]). ==== To Opcache ==== None ==== New Constants ==== The introduction of ''TYPE_*'' constants for the various data types of PHP is a logical additional extension to minimize magic strings in userland software an to avoid typos that might lead to bugs. That being said, the existence of these constants is not essential to this feature and has a separate voting poll. Note that the usage of the constants within an e.g. ''switch'' actually makes the code slower due to the additional look ups for these constants. Their usage is only of interest to remove magic strings from userland code and improve the design of software in general. /** * Name of the regular compound data type array. * * @link https://secure.php.net/language.types.array */ const TYPE_ARRAY = 'array'; /** * Name of the regular scalar data type bool. * * @link https://secure.php.net/language.types.boolean */ const TYPE_BOOL = 'bool'; /** * Name of the pseudo data type callable. * * @link https://secure.php.net/language.types.callable */ const TYPE_CALLABLE = 'callable'; /** * Name of the regular scalar data type float. * * @link https://secure.php.net/language.types.float */ const TYPE_FLOAT = 'float'; /** * Name of the regular scalar data type int. * * @link https://secure.php.net/language.types.integer */ const TYPE_INT = 'int'; /** * Name of the pseudo data type iterable. * * @link https://secure.php.net/language.types.iterable */ const TYPE_ITERABLE = 'iterable'; /** * Name of the regular special data type null. * * @link https://secure.php.net/language.types.null */ const TYPE_NULL = 'null'; /** * Name of the regular compound data type object. * * @link https://secure.php.net/language.types.object */ const TYPE_OBJECT = 'object'; /** * Name of the regular special data type resource. * * @link https://secure.php.net/language.types.resource */ const TYPE_RESOURCE = 'resource'; /** * Name of the regular scalar data type string. * * @link https://secure.php.net/language.types.string */ const TYPE_STRING = 'string'; ==== php.ini Defaults ==== None ===== Open Issues ===== * Update of the ''[[https://secure.php.net/gettype|gettype()]]'' manual page with an informational box that recommends the usage of ''var_type()'' in favor of ''gettype()''. * Update the [[https://secure.php.net/language.types.intro|Types Introduction]] manual page with ''var_type()'' information, and remove ''gettype()''. * Update the [[https://secure.php.net/types.comparisons|PHP type comparison tables]] manual page with ''var_type()'' information, and remove ''gettype()''. * Update the ''[[https://secure.php.net/is_object|is_object()]]'' manual page to explain why it does not consider incomplete objects as valid, while ''var_type()'' reports them as such. * Update the ''[[https://secure.php.net/is_resource|is_resource()]]'' manual page to explain why it does not consider closed/invalid resources as valid, while ''var_type()'' reports them as such. ===== Unaffected PHP Functionality ===== Everything ===== Future Scope ===== ==== Userland ==== * New ''[[rfc:var_info|var_info()]]'' function that returns a human readable explanation of the variable in plain English for inclusion in error messages. * New ''resource_is_closed()'' function that allows direct checks whether a resource is closed/invalid to avoid constructs like: if (is_resource($var) === false && var_type($var) === 'resource') { // ... } ==== Internals ==== * All messages should be refactored to use the new type name constants. This ensures that all messages are consistent, and to avoid confusing messages like the following:Fatal error: Uncaught TypeError: Argument 1 passed to test() must be an instance of boolean, boolean given, called in - on line 1 and defined in -:1The combination of //boolean, boolean given// is confusing for users, especially new ones, and should be avoided. With the usage of the new constants the message would instead read as:Fatal error: Uncaught TypeError: Argument 1 passed to test() must be an instance of boolean, bool given, called in - on line 1 and defined in -:1 * Another topic that should be thought about is the usage of //double// and //long// to refer to the userland types of //float// and //int// in internals. For instance ''IS_DOUBLE'' and ''IS_LONG'' would be better defined as ''IS_FLOAT'' and ''IS_INT'' to avoid confusion. There are many more places where this could be refactored in order to increase readability, and lower confusion over what something refers to. ===== Proposed Voting Choices ===== This RFC will have two polls, one for the introduction of the ''var_type()'' function and one for the new ''TYPE_*'' constants in userland. Both require a 50%+1 majority to be accepted as they do not change the language's syntax. Voting opened on 2016-07-08 and will end on 2016-07-22 for both votes. == Function == * Yes * No == Constants == * Yes * No ===== Patches and Tests ===== The [[https://github.com/php/php-src/pull/1935|GitHub Pull Request #1935]] contains the implementation as well as tests for the new function. The changes in the PR are considered final, however, a thorough code review would be much appreciated and might result in minor changes. ===== References ===== * [[http://news.php.net/php.internals/93900|php-internals thread]] for this RFC. * [[http://news.php.net/php.internals/93762|php-internals thread]] for the initial proposal as ''typeof()''. ===== Rejected Features ===== The initial idea for a successor function of ''gettype()'' was named ''typeof()'' and not ''var_type()''. That name was chosen due to its popularity in other programming languages but dismissed in order to allow future usage of that name as an operator like ''instanceof''.