rfc:stricter_implicit_boolean_coercions

PHP RFC: Stricter implicit boolean coercions

Introduction

When not using strict_types in PHP, scalar type coercions have become less lossy/surprising in the last years - non-number-strings cannot be passed to an int type (leads to a TypeError), floats (or float-strings) with a fractional part cannot be passed to an int type (leads to a deprecation notice since 8.1 because it loses information). The big exception so far are booleans: you can give a typed boolean any scalar value and it will convert any non-zero (and non-empty-string) value to true without any notice.

Some examples how this might lead to surprising behavior and loss of information:

function toBool(bool $a) { var_dump($a); }
 
toBool('0'); // bool(false)
toBool(-0); // bool(false)
toBool('-0'); // bool(true)
toBool(0.0); // bool(false)
toBool('0.0'); // bool(true)
toBool(0.1); // bool(true)
toBool(-37593); // bool(true)
toBool('inactive'); // bool(true)
toBool('false'); // bool(true)

Proposal

In coercive typing mode, limit the allowed scalar values for typed boolean arguments, boolean return types and boolean class properties to the following:

  • 0 (and -0) integer (= false)
  • 0.0 (and -0.0) float (= false)
  • “0” string (= false)
  • “” (empty) string (= false)
  • 1 integer (= true)
  • 1.0 float (= true)
  • “1” string (= true)

Any other integers, floats and strings are always coerced to true (no behavior change) but will emit an E_DEPRECATED notice:

  • For coercions from string the deprecation notice is: Implicit conversion from string “%s” to true, only “”, “0” or “1” are allowed
  • For coercions from int the deprecation notice is: Implicit conversion from int %d to true, only 0 or 1 are allowed
  • For coercions from float the deprecation notice is: Implicit conversion from float %f to true, only 0 or 1 are allowed

These would be the notices generated for the examples in the introduction:

toBool('0');
toBool(-0);
toBool('-0'); // Implicit conversion from string "-0" to true, only "", "0" or "1" are allowed
toBool(0.0);
toBool('0.0'); // Implicit conversion from string "0.0" to true, only "", "0" or "1" are allowed
toBool(0.1); // Implicit conversion from float 0.1 to true, only 0 or 1 are allowed
toBool(-37593); // Implicit conversion from int -37593 to true, only 0 or 1 are allowed
toBool('inactive'); // Implicit conversion from string "inactive" to true, only "", "0" or "1" are allowed
toBool('false'); // Implicit conversion from string "false" to true, only "", "0" or "1" are allowed

In the long-term these deprecations should be raised to a warning or to a TypeError. The earliest for that is the next major version (PHP 9.0). At that time there will have been multiple years of experience with these new deprecation notices, making it easier to decide on how to continue and the impact in the PHP ecosystem.

Rationale

This RFC boils down to these questions:

  • Are you losing information when you reduce a value like -375, “false” or 0.01 to true for a typed boolean?
  • Would you want to know when a value like -375, “false” or 0.01 is given to a typed boolean in a codebase?
  • How likely is it that such a coercion is unintended?
  • What about other boolean coercions in PHP? (handled in the next section)
  • How does this RFC fit in with the existing scalar type coercions? (handled in the Overview of scalar type coercions section)

The main motivation for this RFC is to reduce the possibility of errors when using the boolean type in a similar way that you cannot give a typed int a non-number string - if you provide “false” or “49x” to an int argument it will result in a TypeError. It will not be silently coerced to 0 (or 49), as that loses information and can lead to subtle bugs. This RFC does the same thing for boolean types:

  • Avoid losing information when an unusual value is coerced to a boolean type
  • Make the boolean type and the type juggling system clearer and less surprising
  • Setting up only 7 scalar values as unambiguous boolean values is easy to document and reason about

When implementing this feature I found two bugs in php-src tests and the test runner that are most likely typical cases:

  • In the PHP test runner the strings “success” and “failed” were given to a boolean function argument called $status. Maybe that argument was a string previously and changed to a boolean by mistake, but it clearly was a bug that has never been noticed so far.
  • In an IMAP test a boolean argument $simpleMessages always got the string “multipart”. I found out that there was another function definition which had the argument $new_mailbox at that position. This was most likely a copy-paste error or the wrong function was looked up when writing the test.

Changing the type of an argument, return or property in a codebase happens often, and because the boolean type accepts everything with no complaint it makes it easy to miss problems when changing a type to bool. In current PHP codebases there are likely a few of these unintended coercions to booleans which would be easy to fix if a developer noticed that an unusual value is coerced to true.

While using strict_types is an option to avoid unintended type coercions, the goal of this RFC is to make coercions less error-prone when not using strict_types. Silently coercing “failed” (or -37486, or 0.01) to true seems like an invitation to unexpected behavior. By introducing this deprecation notice users will have the chance of finding surprising boolean coercions in their code while the coercion behavior will remain the same.

Other boolean coercions in PHP

Typed booleans (arguments, returns, properties) as discussed in this RFC are not the only part of PHP where implicit boolean coercions happen. They also occur in expressions like if, the ternary operator ?:, or logical operators && / ||. Whenever an expression in that context is not clearly true or false it is implicitly coerced to true or false.

Using strict_types is an established way to change how scalar type coercions work (by prohibiting any coercions) but it does not affect implicit boolean coercions in expressions. But even in coercive mode there is a big difference between boolean expressions and boolean type coercions:

if ($variable) { // identical to if ($variable == true)
  // the $variable in the if statement is coerced in the following way:
  // - true for a string if it is not empty and not '0'
  // - true for an int if it is not zero
  // - true for a float if it is not zero
  // - true for an array if it is not empty
  // - always true for a resource
  // - always true for an object
  // - always false for null
}
 
if ($array) {
  // executed for a non-empty array
}
 
toBool($array); // TypeError, must be of type bool, array given

Typed booleans behave differently compared to boolean expressions because they do not accept arrays, resources, objects and null. Further restricting typed booleans is therefore not a change which makes the language more inconsistent, on the contrary, it could be an opportunity to differentiate these two use cases more clearly from each other, as they often have different expectations already:

// often used to check if $string is not empty, and it is reasonably clear
if ($string) {
  // do something with $string here
}
 
$obj->boolProperty = $string; // did you want to check if $string is not empty here?
                              // is it a value from a form, API or DB that should be '', '0' or '1'?
                              // or is it a mistake because something is missing?

When giving a typed boolean a scalar value you are reducing an int, float or string to a boolean, possibly losing information, and not evaluating an expression where there is follow-up code to do something more as is the case with if, ?: or && / ||. By limiting the values of a typed boolean the previous example becomes less ambiguous:

$obj->boolProperty = $string; // $string must be '', '0' or '1', otherwise we get a deprecation notice
$obj->boolProperty = strlen($string) > 0; // instead check that $string is not empty

filter extension

The filter extension has its own way to validate booleans (FILTER_VALIDATE_BOOLEAN):

  • “1”, “true”, “on” and “yes” evaluate to true, everything else to false
  • if FILTER_NULL_ON_FAILURE is also used, only “0”, “false”, “off”, “no” and “” evaluate to false, everything else to null

This behavior is incompatible with how PHP handles boolean coercions, making it impossible to resolve the behaviors without massive BC breaks. But it does add another argument in favor of this RFC - somebody switching from the filter extension to built-in boolean types has a big chance of accidentally introducing behavior changes in their application:

  • PHP converts most values to true, while the filter extension converts these values to false (or null) - for example “success”, “false”, “off”, “5”, or “-30”
  • The deprecation notice would make all these occurences visible and easy to fix

Usages of FILTER_VALIDATE_BOOLEAN are otherwise not affected by this RFC - that behavior remains unchanged.

Considered alternatives

It was briefly considered to allow more values for typed booleans instead of only 0, 1 and an empty string - for example the string “on”. But it would be difficult and a bit arbitrary to determine where to draw the line for possible values, and an important goal of this RFC is for the coercion behavior to be simple and intuitive to understand. 0 and 1 are common alternative values to express a boolean in many programming languages, in databases and in APIs. Other values are not as widely used and would only make the coercion behavior more difficult to understand.

Another possibility would have been to also change the behavior of boolean coercions, for example coerce the string “false” to false instead of true. Yet this would be quite a substantial BC break with no obvious benefits. With this RFC there will be a deprecation notice when coercing “false” to true, therefore such behavior can be noticed instead of having to change it.

Overview of scalar type coercions

This is the status of coercions if this RFC passes (and any deprecation notices and TypeErrors are avoided) - the new behavior can be seen in the last column (To bool), it is quite symmetrical to the existing behavior in the “From bool” row:

To string To int To float To bool
From string only allowed for numeric values with no fractional part only allowed for numeric values only allowed for “”, “0” and “1”
From int always possible always possible only allowed for 0 and 1
From float always possible only allowed if there is no fractional part only allowed for 0 and 1
From bool always possible (coerced to “” or “1”) always possible (coerced to 0 or 1) always possible (coerced to 0 or 1)

This RFC would further reduce the gap between strict mode and coercive mode, as even in coercive mode no information would be lost when coercing a scalar value and only values that are reasonable are accepted (otherwise a deprecation notice is emitted). All allowed coercions can be reversed to end up with the original value or almost the same (“0” can become “”) - that is something this RFC makes possible, as without this RFC reversing a coercion to boolean will often not lead back to the original value. These examples illustrate reversibility and the loss of information:

function toBool(bool $a) { return $a; }
function toString(string $a) { return $a; }
function toInt(int $a) { return $a; }
function toFloat(float $a) { return $a; }
 
toString(toBool('')); // '' is coerced to false and then back to ''
toInt(toBool(0)); // 0 coerced to false and then back to 0
toFloat(toBool(0.0)); // 0.0 coerced to false and then back to 0.0
 
toString(toBool('success')); 
// => 'success' is coerced to true and then back to '1'
// the new deprecation notice of this RFC points out the loss of information
 
toInt(toBool(-33));
// => -33 is coerced to true and then back to 1
// the new deprecation notice of this RFC points out the loss of information
 
toFloat(toBool(0.01)); 
// => 0.01 is coerced to true and then back to 1
// the new deprecation notice of this RFC points out the loss of information
 
// Existing behavior leading to TypeErrors and deprecation notices:
toFloat('success'); // TypeError, not a numeric string
toInt('1.6'); // Deprecation notice because fractional part is lost
toString(['']); // TypeError, array cannot be implicitly coerced to string
toBool(null); // TypeError, null cannot be implicitly coerced to bool

Having as little information loss as possible when coercing scalar types makes them safer to use and more predictable.

Implementation notes

As this is my first RFC and my first contribution to php-src, I mimicked the code from the “Deprecate implicit non-integer-compatible float to int conversions” RFC (https://github.com/php/php-src/pull/6661). I added some tests and made sure the existing tests still pass. There might be some room for improvements on my implementation though, so any feedback is welcome!

Backward Incompatible Changes

The following operations will now emit an E_DEPRECATED if any scalar value other than “”, “0”, “1”, 0, 1, 0.0, 1.0 is used:

  • Assignment to a typed property of type bool in coercive typing mode
  • Argument for a parameter of type bool for both internal and userland functions in coercive typing mode
  • Returning such a value for userland functions declared with a return type of bool in coercive typing mode

The actual conversion to a boolean value remains unchanged - anything that was coerced to false before will still be coerced to false, and anything coerced to true will still be coerced to true.

The following shows typical ways to avoid a deprecation notice:

// Resolution 1: Check for an expected value or range
toBool($number > 0);
toBool($int === 5);
toBool($string === 'success');
toBool(strlen($string) > 0);
 
// Resolution 2: Check for truthiness
toBool($scalar == true);
 
// Resolution 3: Explicitly cast the argument
toBool((bool) $scalar);

With the many deprecation notices that appeared in PHP 8.0 and 8.1 there is some wariness if new deprecation notices are worth it. These are the arguments why the RFC author thinks it will not cause too much pain:

  • Each individual case is easy to fix, the easiest (but also least useful) is to loosly compare a value to true ($value == true) instead of directly giving the value to a typed bool
  • Most of the coercions that will lead to a deprecation notice are likely to be unintended and the information given in the notice should make it reasonably clear to a developer whether it is a bug and how to fix it
  • bool arguments for internal functions are usually optional, less numerous and are much more likely to be set by a constant expression than a variable, reducing the surface area where this deprecation notice will appear
  • deprecation notices do not demand immediate attention, and the “disadvantage” of the high number of deprecation notices with 8.0 and 8.1 should be that most tooling and codebases have gotten more used to dealing with them in their own time and not see them as an immediate call to action

Future Scope

While this RFC only targets boolean coercions when not using strict_types, this is just the last missing piece for the overall goal of having a solid and easy-to-understand foundation of type coercions between scalar values.

One benefit of these well-developed coercions could be to make them available in an explicit way to PHP developers. Having functions like is_coerceable_to_bool and coerce_to_bool (and with similar functions for int, float and string) that behave exactly as giving a value to a boolean argument could be useful when receiving input from a form or database. Compared to the current explicit type coercions ((bool), boolval, (int) or (float)) this would allow only a certain subset of values instead of coercing any value, giving developers an effective way to make sure they are dealing with values that make sense - or fail early if an unexpected value is encountered. And because it is based on the type coercion behavior of PHP the learning curve would be low and the knowledge would be universally useful within the language.

An example of how these functions could look like can be found on Github in squirrelphp/scalar-types (written in PHP). This is just a preliminary example that would need to be discussed further with a follow-up RFC.

Proposed PHP Version

Next minor version: PHP 8.2.

Unaffected PHP Functionality

  • Manually casting to boolean will not raise a notice.
  • Strict mode (strict_types) behavior is unaffected.
  • Implicit boolean expressions (as used in if, ternary, logic operators) are not affected.
  • FILTER_VALIDATE_BOOLEAN in the filter extension is not affected.

Vote

Voting started on 2022-06-06 and will end on 2022-06-20.

Accept Stricter implicit boolean coercions RFC as proposed?
Real name Yes No
aaronjunker (aaronjunker)  
cmb (cmb)  
danack (danack)  
didou (didou)  
galvao (galvao)  
girgias (girgias)  
jbnahan (jbnahan)  
kalle (kalle)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
pierrick (pierrick)  
reywob (reywob)  
sergey (sergey)  
svpernova09 (svpernova09)  
theodorejb (theodorejb)  
Final result: 3 14
This poll has been closed.

Patches and Tests

References

Initial mailing list discussion: <https://externals.io/message/117608>
RFC mailing list discussion: <https://externals.io/message/117732>

rfc/stricter_implicit_boolean_coercions.txt · Last modified: 2022/06/20 16:00 by iquito