Table of Contents

PHP RFC: Stricter implicit boolean coercions

Introduction

When not using strict_types in PHP, scalar type coercions have become less lossy/surprising in the last years - non-number-strings cannot be passed to an int type (leads to a TypeError), floats (or float-strings) with a fractional part cannot be passed to an int type (leads to a deprecation notice since 8.1 because it loses information). The big exception so far are booleans: you can give a typed boolean any scalar value and it will convert any non-zero (and non-empty-string) value to true without any notice.

Some examples how this might lead to surprising behavior and loss of information:

function toBool(bool $a) { var_dump($a); }
 
toBool('0'); // bool(false)
toBool(-0); // bool(false)
toBool('-0'); // bool(true)
toBool(0.0); // bool(false)
toBool('0.0'); // bool(true)
toBool(0.1); // bool(true)
toBool(-37593); // bool(true)
toBool('inactive'); // bool(true)
toBool('false'); // bool(true)

Proposal

In coercive typing mode, limit the allowed scalar values for typed boolean arguments, boolean return types and boolean class properties to the following:

Any other integers, floats and strings are always coerced to true (no behavior change) but will emit an E_DEPRECATED notice:

These would be the notices generated for the examples in the introduction:

toBool('0');
toBool(-0);
toBool('-0'); // Implicit conversion from string "-0" to true, only "", "0" or "1" are allowed
toBool(0.0);
toBool('0.0'); // Implicit conversion from string "0.0" to true, only "", "0" or "1" are allowed
toBool(0.1); // Implicit conversion from float 0.1 to true, only 0 or 1 are allowed
toBool(-37593); // Implicit conversion from int -37593 to true, only 0 or 1 are allowed
toBool('inactive'); // Implicit conversion from string "inactive" to true, only "", "0" or "1" are allowed
toBool('false'); // Implicit conversion from string "false" to true, only "", "0" or "1" are allowed

In the long-term these deprecations should be raised to a warning or to a TypeError. The earliest for that is the next major version (PHP 9.0). At that time there will have been multiple years of experience with these new deprecation notices, making it easier to decide on how to continue and the impact in the PHP ecosystem.

Rationale

This RFC boils down to these questions:

The main motivation for this RFC is to reduce the possibility of errors when using the boolean type in a similar way that you cannot give a typed int a non-number string - if you provide “false” or “49x” to an int argument it will result in a TypeError. It will not be silently coerced to 0 (or 49), as that loses information and can lead to subtle bugs. This RFC does the same thing for boolean types:

When implementing this feature I found two bugs in php-src tests and the test runner that are most likely typical cases:

Changing the type of an argument, return or property in a codebase happens often, and because the boolean type accepts everything with no complaint it makes it easy to miss problems when changing a type to bool. In current PHP codebases there are likely a few of these unintended coercions to booleans which would be easy to fix if a developer noticed that an unusual value is coerced to true.

While using strict_types is an option to avoid unintended type coercions, the goal of this RFC is to make coercions less error-prone when not using strict_types. Silently coercing “failed” (or -37486, or 0.01) to true seems like an invitation to unexpected behavior. By introducing this deprecation notice users will have the chance of finding surprising boolean coercions in their code while the coercion behavior will remain the same.

Other boolean coercions in PHP

Typed booleans (arguments, returns, properties) as discussed in this RFC are not the only part of PHP where implicit boolean coercions happen. They also occur in expressions like if, the ternary operator ?:, or logical operators && / ||. Whenever an expression in that context is not clearly true or false it is implicitly coerced to true or false.

Using strict_types is an established way to change how scalar type coercions work (by prohibiting any coercions) but it does not affect implicit boolean coercions in expressions. But even in coercive mode there is a big difference between boolean expressions and boolean type coercions:

if ($variable) { // identical to if ($variable == true)
  // the $variable in the if statement is coerced in the following way:
  // - true for a string if it is not empty and not '0'
  // - true for an int if it is not zero
  // - true for a float if it is not zero
  // - true for an array if it is not empty
  // - always true for a resource
  // - always true for an object
  // - always false for null
}
 
if ($array) {
  // executed for a non-empty array
}
 
toBool($array); // TypeError, must be of type bool, array given

Typed booleans behave differently compared to boolean expressions because they do not accept arrays, resources, objects and null. Further restricting typed booleans is therefore not a change which makes the language more inconsistent, on the contrary, it could be an opportunity to differentiate these two use cases more clearly from each other, as they often have different expectations already:

// often used to check if $string is not empty, and it is reasonably clear
if ($string) {
  // do something with $string here
}
 
$obj->boolProperty = $string; // did you want to check if $string is not empty here?
                              // is it a value from a form, API or DB that should be '', '0' or '1'?
                              // or is it a mistake because something is missing?

When giving a typed boolean a scalar value you are reducing an int, float or string to a boolean, possibly losing information, and not evaluating an expression where there is follow-up code to do something more as is the case with if, ?: or && / ||. By limiting the values of a typed boolean the previous example becomes less ambiguous:

$obj->boolProperty = $string; // $string must be '', '0' or '1', otherwise we get a deprecation notice
$obj->boolProperty = strlen($string) > 0; // instead check that $string is not empty

filter extension

The filter extension has its own way to validate booleans (FILTER_VALIDATE_BOOLEAN):

This behavior is incompatible with how PHP handles boolean coercions, making it impossible to resolve the behaviors without massive BC breaks. But it does add another argument in favor of this RFC - somebody switching from the filter extension to built-in boolean types has a big chance of accidentally introducing behavior changes in their application:

Usages of FILTER_VALIDATE_BOOLEAN are otherwise not affected by this RFC - that behavior remains unchanged.

Considered alternatives

It was briefly considered to allow more values for typed booleans instead of only 0, 1 and an empty string - for example the string “on”. But it would be difficult and a bit arbitrary to determine where to draw the line for possible values, and an important goal of this RFC is for the coercion behavior to be simple and intuitive to understand. 0 and 1 are common alternative values to express a boolean in many programming languages, in databases and in APIs. Other values are not as widely used and would only make the coercion behavior more difficult to understand.

Another possibility would have been to also change the behavior of boolean coercions, for example coerce the string “false” to false instead of true. Yet this would be quite a substantial BC break with no obvious benefits. With this RFC there will be a deprecation notice when coercing “false” to true, therefore such behavior can be noticed instead of having to change it.

Overview of scalar type coercions

This is the status of coercions if this RFC passes (and any deprecation notices and TypeErrors are avoided) - the new behavior can be seen in the last column (To bool), it is quite symmetrical to the existing behavior in the “From bool” row:

To string To int To float To bool
From string only allowed for numeric values with no fractional part only allowed for numeric values only allowed for “”, “0” and “1”
From int always possible always possible only allowed for 0 and 1
From float always possible only allowed if there is no fractional part only allowed for 0 and 1
From bool always possible (coerced to “” or “1”) always possible (coerced to 0 or 1) always possible (coerced to 0 or 1)

This RFC would further reduce the gap between strict mode and coercive mode, as even in coercive mode no information would be lost when coercing a scalar value and only values that are reasonable are accepted (otherwise a deprecation notice is emitted). All allowed coercions can be reversed to end up with the original value or almost the same (“0” can become “”) - that is something this RFC makes possible, as without this RFC reversing a coercion to boolean will often not lead back to the original value. These examples illustrate reversibility and the loss of information:

function toBool(bool $a) { return $a; }
function toString(string $a) { return $a; }
function toInt(int $a) { return $a; }
function toFloat(float $a) { return $a; }
 
toString(toBool('')); // '' is coerced to false and then back to ''
toInt(toBool(0)); // 0 coerced to false and then back to 0
toFloat(toBool(0.0)); // 0.0 coerced to false and then back to 0.0
 
toString(toBool('success')); 
// => 'success' is coerced to true and then back to '1'
// the new deprecation notice of this RFC points out the loss of information
 
toInt(toBool(-33));
// => -33 is coerced to true and then back to 1
// the new deprecation notice of this RFC points out the loss of information
 
toFloat(toBool(0.01)); 
// => 0.01 is coerced to true and then back to 1
// the new deprecation notice of this RFC points out the loss of information
 
// Existing behavior leading to TypeErrors and deprecation notices:
toFloat('success'); // TypeError, not a numeric string
toInt('1.6'); // Deprecation notice because fractional part is lost
toString(['']); // TypeError, array cannot be implicitly coerced to string
toBool(null); // TypeError, null cannot be implicitly coerced to bool

Having as little information loss as possible when coercing scalar types makes them safer to use and more predictable.

Implementation notes

As this is my first RFC and my first contribution to php-src, I mimicked the code from the “Deprecate implicit non-integer-compatible float to int conversions” RFC (https://github.com/php/php-src/pull/6661). I added some tests and made sure the existing tests still pass. There might be some room for improvements on my implementation though, so any feedback is welcome!

Backward Incompatible Changes

The following operations will now emit an E_DEPRECATED if any scalar value other than “”, “0”, “1”, 0, 1, 0.0, 1.0 is used:

The actual conversion to a boolean value remains unchanged - anything that was coerced to false before will still be coerced to false, and anything coerced to true will still be coerced to true.

The following shows typical ways to avoid a deprecation notice:

// Resolution 1: Check for an expected value or range
toBool($number > 0);
toBool($int === 5);
toBool($string === 'success');
toBool(strlen($string) > 0);
 
// Resolution 2: Check for truthiness
toBool($scalar == true);
 
// Resolution 3: Explicitly cast the argument
toBool((bool) $scalar);

With the many deprecation notices that appeared in PHP 8.0 and 8.1 there is some wariness if new deprecation notices are worth it. These are the arguments why the RFC author thinks it will not cause too much pain:

Future Scope

While this RFC only targets boolean coercions when not using strict_types, this is just the last missing piece for the overall goal of having a solid and easy-to-understand foundation of type coercions between scalar values.

One benefit of these well-developed coercions could be to make them available in an explicit way to PHP developers. Having functions like is_coerceable_to_bool and coerce_to_bool (and with similar functions for int, float and string) that behave exactly as giving a value to a boolean argument could be useful when receiving input from a form or database. Compared to the current explicit type coercions ((bool), boolval, (int) or (float)) this would allow only a certain subset of values instead of coercing any value, giving developers an effective way to make sure they are dealing with values that make sense - or fail early if an unexpected value is encountered. And because it is based on the type coercion behavior of PHP the learning curve would be low and the knowledge would be universally useful within the language.

An example of how these functions could look like can be found on Github in squirrelphp/scalar-types (written in PHP). This is just a preliminary example that would need to be discussed further with a follow-up RFC.

Proposed PHP Version

Next minor version: PHP 8.2.

Unaffected PHP Functionality

Vote

Voting started on 2022-06-06 and will end on 2022-06-20.

Accept Stricter implicit boolean coercions RFC as proposed?
Real name Yes No
aaronjunker (aaronjunker)  
cmb (cmb)  
danack (danack)  
didou (didou)  
galvao (galvao)  
girgias (girgias)  
jbnahan (jbnahan)  
kalle (kalle)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
pierrick (pierrick)  
reywob (reywob)  
sergey (sergey)  
svpernova09 (svpernova09)  
theodorejb (theodorejb)  
Final result: 3 14
This poll has been closed.

Patches and Tests

Patch: https://github.com/php/php-src/pull/8565

References

Initial mailing list discussion: <https://externals.io/message/117608>
RFC mailing list discussion: <https://externals.io/message/117732>