rfc:named_params

This is an old revision of the document!


PHP RFC: Stricter implicit boolean coercions

Introduction

When not using strict_types in PHP, scalar type coercions have become less lossy/surprising in the last years - non-number-strings cannot be passed to an int type (leads to a TypeError), floats (or float-strings) with a fractional part cannot be passed to an int type (leads to a deprecation notice since 8.1 because it loses information). The big exception so far are booleans: you can give a typed boolean any value and it will convert any non-zero (and non-empty-string) value to true without any notice.

Some examples where this might lead to unexpected outcomes:

function toBool(bool $a)
{
  var_dump($a);
}
 
toBool('0'); // bool(false)
toBool(-0); // bool(false)
toBool('-0'); // bool(true)
toBool(0.0); // bool(false)
toBool('0.0'); // bool(true)
toBool(0.1); // bool(true)
toBool(-37593); // bool(true)
toBool('inactive'); // bool(true)
toBool('false'); // bool(true)

Proposal

In coercive typing mode, limit the allowed scalar values for typed boolean arguments, boolean return types and boolean class properties to the following:

  • 0 (and -0) integer (= false)
  • 0.0 (and -0.0) float (= false)
  • “0” string (= false)
  • “” (empty) string (= false)
  • 1 integer (= true)
  • 1.0 float (= true)
  • “1” string (= true)

Any other integers, floats and strings are always coerced to true (no behavior change) but will emit an E_DEPRECATED notice:

  • For coercions from string the deprecation notice is: Implicit conversion from string “%s” to true, only “”, “0” or “1” are allowed
  • For coercions from int the deprecation notice is: Implicit conversion from int %d to true, only 0 or 1 are allowed
  • For coercions from float the deprecation notice is: Implicit conversion from float %f to true, only 0 or 1 are allowed

These would be the notices generated for the examples in the introduction:

toBool('0');
toBool(-0);
toBool('-0'); // Implicit conversion from string "-0" to true, only "", "0" or "1" are allowed
toBool(0.0);
toBool('0.0'); // Implicit conversion from string "0.0" to true, only "", "0" or "1" are allowed
toBool(0.1); // Implicit conversion from float 0.1 to true, only 0 or 1 are allowed
toBool(-37593); // Implicit conversion from int -37593 to true, only 0 or 1 are allowed
toBool('inactive'); // Implicit conversion from string "inactive" to true, only "", "0" or "1" are allowed
toBool('false'); // Implicit conversion from string "false" to true, only "", "0" or "1" are allowed

In the long-term these deprecations should be raised to a warning or to a TypeError. The earliest for that is the next major version (PHP 9.0). At that time there will have been multiple years of experience with these new deprecation notices, making it easier to decide on how to continue and the impact in the PHP ecosystem.

Rationale

This RFC boils down to these questions:

  • Are you losing information when you reduce a value like -375, “false” or NaN to true for a typed boolean?
  • Would you want to know when a value like -375, “false” or NaN is given to a typed boolean in a codebase?
  • How likely is it that such a coercion is unintended?
  • What about other boolean coercions in PHP? (this is handled in the next section)

The main motivation for this RFC is to reduce the possibility of errors when using the boolean type in a similar way that you cannot give a typed int a non-number string - if you provide “false” or “49x” to an int argument it will result in a TypeError. It will not be silently coerced to 0 (or 49), as that loses information and can lead to subtle bugs. This RFC does the same thing for boolean types:

  • Avoid losing information when an unusual value is coerced to a boolean type
  • Make the boolean type and the type juggling system safer and more consistent
  • Setting up only 7 scalar values as unambiguous boolean values is easy to document and reason about

When implementing this feature I found two bugs in php-src tests and the test runner that are most likely typical cases:

  • In the PHP test runner the strings “success” and “failed” were given to a boolean function argument called $status. Maybe that argument was a string previously and changed to a boolean by mistake, but it clearly was a bug that has never been noticed so far.
  • In an IMAP test a boolean argument $simpleMessages always got the string “multipart”. I found out that there was another function definition which had the argument $new_mailbox at that position. This was most likely a copy-paste error or the wrong function was looked up when writing the test.

Changing the type of an argument, return or property in a codebase happens often, and because the boolean type accepts everything with no complaint it makes it easy to miss problems when changing a type to bool. In current PHP codebases there are likely a few of these unintended coercions to booleans which would be easy to fix if a developer noticed that an unusual value is coerced to true.

While using strict_types is an option to avoid unintended type coercions, the goal of this RFC is to make coercions less error-prone when not using strict_types. Silently coercing “failed” (or -37486, or 0.01) to true seems like an invitation to unexpected behavior. By introducing this deprecation notice users will have the chance of finding surprising boolean coercions in their code while the coercion behavior will remain the same.

Other boolean coercions in PHP

Typed booleans (arguments, returns, properties) as discussed in this RFC are not the only part of PHP where implicit boolean coercions happen. They also occur in expressions like if, the ternary operator ?:, or logical operators && / ||. Whenever an expression in that context is not clearly true or false it is implicitly coerced to true or false.

However in these expressions you can use any values and are not restricted to scalar types like with typed booleans:

if ($variable) { // identical to if ($variable == true)
  // the $variable in the if statement is coerced in the following way:
  // - true for a string if it is not empty and not '0'
  // - true for an int if it is not zero
  // - true for a float if it is not zero
  // - true for an array if it is not empty
  // - always true for a resource
  // - always true for an object
  // - always false for null
}
 
if ($array) {
  // executed for a non-empty array
}
 
toBool($array); // TypeError, must be of type bool, array given

Typed booleans behave differently compared to these expressions because they do not accept arrays, resources, objects and null. Further restricting typed booleans is therefore not a change which makes the language more inconsistent, on the contrary, it could be an opportunity to differentiate these two use cases from each other, as they often have different expectations already:

// often used to check if $string is not empty, and it is reasonably clear
if ($string) {
  // do something with $string here
}
 
$obj->boolProperty = $string; // did you want to check if $string is not empty here?
                              // is it a value from a form, API or DB that should be '', '0' or '1'?
                              // or is it a mistake because something is missing?

When giving a typed boolean a scalar value you are reducing an int, float or string to a boolean, possibly losing information, and not evaluating an expression where there is follow-up code to do something more as is the case with if, ?: or && / ||. By limiting the values of a typed boolean the previous example becomes less ambiguous:

$obj->boolProperty = $string; // $string must be '', '0' or '1', otherwise we get a deprecation notice
$obj->boolProperty = strlen($string) > 0; // instead check that $string is not empty

filter extension

The filter extension has its own way to validate booleans (FILTER_VALIDATE_BOOLEAN):

  • “1”, “true”, “on” and “yes” evaluate to true, everything else to false
  • if FILTER_NULL_ON_FAILURE is also used, only “0”, “false”, “off”, “no” and “” evaluate to false, everything else to null

This behavior is incompatible with how PHP handles boolean coercions, making it impossible to resolve the behaviors without massive BC breaks. But it does add another argument in favor of this RFC - somebody switching from the filter extension to built-in boolean types has a big chance of accidentally introducing behavior changes in their application:

  • PHP converts most values to true, while the filter extension converts these values to false (or null) - for example “success”, “false”, “off”, “5”, or “-30”
  • The deprecation notice would make all these occurences visible and easy to fix

Usages of FILTER_VALIDATE_BOOLEAN are otherwise not affected by this RFC - that behavior remains unchanged.

Considered alternatives

It was briefly considered to allow more values for typed booleans instead of only 0, 1 and an empty string - for example the string “on”. But it would be difficult and a bit arbitrary to determine where to draw the line for possible values, and an important goal of this RFC is for the coercion behavior to be simple and intuitive to understand. 0 and 1 are common alternative values to express a boolean in many programming languages, in databases and in APIs. Other values are not as widely used and would only make the coercion behavior more difficult to understand.

Another possibility would have been to also change the behavior of boolean coercions, for example coerce the string “false” to false instead of true. Yet this would be quite a substantial BC break with no obvious benefits. With this RFC there will be a deprecation notice when coercing “false” to true in order for such behavior to be noticed instead of having to change it.

Implementation notes

As this is my first RFC and my first contribution to php-src, I mimicked the code from the “Deprecate implicit non-integer-compatible float to int conversions” RFC (https://github.com/php/php-src/pull/6661). I added some tests and made sure the existing tests still pass. There might be some room for improvements on my implementation though, so any feedback is welcome!

Backward Incompatible Changes

The following operations will now emit an E_DEPRECATED if any scalar value other than “”, “0”, “1”, 0, 1, 0.0, 1.0 is used:

  • Assignment to a typed property of type bool in coercive typing mode
  • Argument for a parameter of type bool for both internal and userland functions in coercive typing mode
  • Returning such a value for userland functions declared with a return type of bool in coercive typing mode

The actual conversion to a boolean value remains unchanged - anything that was coerced to false before will still be coerced to false, and anything coerced to true will still be coerced to true.

The following shows typical ways to avoid a deprecation notice:

// Resolution 1: Check for an expected value or range
toBool($number > 0);
toBool($int === 5);
toBool($string === 'success');
toBool(strlen($string) > 0);
 
// Resolution 2: Check for truthiness
toBool($scalar == true);
 
// Resolution 3: Explicitly cast the argument
toBool((bool) $scalar);

With the many deprecation notices that appeared in PHP 8.0 and 8.1 there is some wariness if more deprecation notices are worth it. These are the arguments why the RFC author thinks it will be worth it without too much pain:

  • Each individual case is easy to fix, the easiest (but also least useful) is to loosly compare a value to true ($value == true) instead of directly giving the value to a typed bool
  • Most of the coercions that will lead to a deprecation notice are likely to be unintended and the information given in the notice should make it reasonably clear to a developer whether it is a bug and how to fix it
  • bool arguments for internal functions are usually optional, less numerous and are much more likely to be set by a constant expression than a variable
  • deprecation notices do not demand immediate attention, and the “disadvantage” of the high number of deprecation notices with 8.0 and 8.1 should be that most tooling and codebases have gotten more used to dealing with them in their own time and not see them as an immediate call to action

Proposed PHP Version

Next minor version: PHP 8.2.

Unaffected PHP Functionality

  • Manually casting to boolean will not raise a notice.
  • Strict Type behaviour is unaffected.
  • Implicit boolean expressions (as used in if, ternary, logic operators) are not affected.
  • FILTER_VALIDATE_BOOLEAN in the filter extension is not affected.

Patches and Tests

References

Initial mailing list discussion: <https://externals.io/message/117608>
RFC mailing list discussion: <https://externals.io/message/117732>

rfc/named_params.1653579050.txt.gz · Last modified: 2022/05/26 15:30 by iquito