rfc:union_types_v2
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
rfc:union_types_v2 [2019/10/25 13:00] – Voting nikic | rfc:union_types_v2 [2019/12/18 08:42] (current) – -> implemented nikic | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Union Types 2.0 ====== | ====== PHP RFC: Union Types 2.0 ====== | ||
+ | |||
* Date: 2019-09-02 | * Date: 2019-09-02 | ||
* Author: Nikita Popov < | * Author: Nikita Popov < | ||
- | * Status: | + | * Status: |
- | * Rendered proposal: https:// | + | * Proposed Version: PHP 8.0 |
- | * Pull request discussion: https:// | + | * Pull request discussion: |
- | * Mailing list discussion: https:// | + | * Mailing list thread: https:// |
+ | * Implementation: | ||
- | This RFC is hosted | + | > This proposal was originally introduced and discussed |
+ | |||
+ | ===== Introduction ===== | ||
+ | |||
+ | A "union type" accepts values of multiple different types, rather than a single one. PHP already supports two special union types: | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | However, arbitrary union types are currently not supported by the language. Instead, phpdoc annotations have to be used, such as in the following example: | ||
+ | |||
+ | <code php> | ||
+ | class Number { | ||
+ | /** | ||
+ | * @var int|float $number | ||
+ | */ | ||
+ | private $number; | ||
+ | |||
+ | /** | ||
+ | * @param int|float $number | ||
+ | */ | ||
+ | public function setNumber($number) { | ||
+ | $this-> | ||
+ | } | ||
+ | |||
+ | /** | ||
+ | * @return int|float | ||
+ | */ | ||
+ | public function getNumber() { | ||
+ | return $this-> | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | The [[# | ||
+ | |||
+ | Supporting union types in the language allows us to move more type information from phpdoc into function signatures, with the usual advantages this brings: | ||
+ | |||
+ | * Types are actually enforced, so mistakes can be caught early. | ||
+ | * Because they are enforced, type information is less likely to become outdated or miss edge-cases. | ||
+ | * Types are checked during inheritance, | ||
+ | * Types are available through Reflection. | ||
+ | * The syntax is a lot less boilerplate-y than phpdoc. | ||
+ | |||
+ | After generics, union types are currently the largest " | ||
+ | |||
+ | ===== Proposal ===== | ||
+ | |||
+ | Union types are specified using the syntax '' | ||
+ | |||
+ | <code php> | ||
+ | class Number { | ||
+ | private int|float $number; | ||
+ | |||
+ | public function setNumber(int|float $number): void { | ||
+ | $this-> | ||
+ | } | ||
+ | |||
+ | public function getNumber(): | ||
+ | return $this-> | ||
+ | } | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | ==== Supported Types ==== | ||
+ | |||
+ | Union types support all types currently supported by PHP, with some caveats outlined in the following. | ||
+ | |||
+ | === void type === | ||
+ | |||
+ | The '' | ||
+ | |||
+ | The '' | ||
+ | |||
+ | What is likely intended instead is ''? | ||
+ | |||
+ | === Nullable union types === | ||
+ | |||
+ | The '' | ||
+ | |||
+ | An earlier version of this RFC proposed to use ''? | ||
+ | |||
+ | ''? | ||
+ | |||
+ | The '' | ||
+ | |||
+ | Union types and the ''? | ||
+ | |||
+ | === false pseudo-type === | ||
+ | |||
+ | While we nowadays encourage the use of '' | ||
+ | |||
+ | A classical example is the '' | ||
+ | |||
+ | While it would be possible to model this less accurately as '' | ||
+ | |||
+ | For this reason, support for the '' | ||
+ | |||
+ | The '' | ||
+ | |||
+ | === Duplicate and redundant types === | ||
+ | |||
+ | To catch some simple bugs in union type declarations, | ||
+ | |||
+ | * Each name-resolved type may only occur once. Types like '' | ||
+ | * If '' | ||
+ | * If '' | ||
+ | * If '' | ||
+ | |||
+ | This does not guarantee that the type is " | ||
+ | |||
+ | For example, if '' | ||
+ | |||
+ | <code php> | ||
+ | function foo(): int|INT {} // Disallowed | ||
+ | function foo(): bool|false {} // Disallowed | ||
+ | |||
+ | use A as B; | ||
+ | function foo(): A|B {} // Disallowed (" | ||
+ | |||
+ | class_alias(' | ||
+ | function foo(): X|Y {} // Allowed (redundancy is only known at runtime) | ||
+ | </ | ||
+ | |||
+ | === Type grammar === | ||
+ | |||
+ | Excluding the special '' | ||
+ | |||
+ | < | ||
+ | type: simple_type | ||
+ | | "?" | ||
+ | | union_type | ||
+ | ; | ||
+ | |||
+ | union_type: simple_type " | ||
+ | | union_type " | ||
+ | ; | ||
+ | |||
+ | simple_type: | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | " | ||
+ | | namespaced_name | ||
+ | ; | ||
+ | </ | ||
+ | |||
+ | ==== Variance ==== | ||
+ | |||
+ | Union types follow the existing variance rules: | ||
+ | |||
+ | * Return types are covariant (child must be subtype). | ||
+ | * Parameter types are contravariant (child must be supertype). | ||
+ | * Property types are invariant (child must be subtype and supertype). | ||
+ | |||
+ | The only change is in how union types interact with subtyping, with three additional rules: | ||
+ | |||
+ | * A union '' | ||
+ | * The '' | ||
+ | * The '' | ||
+ | |||
+ | In the following, some examples of what is allowed and what isn't are given. | ||
+ | |||
+ | === Property types === | ||
+ | |||
+ | Property types are invariant, which means that types must stay the same during inheritance. However, the " | ||
+ | |||
+ | Union types expand the possibilities in this area: For example '' | ||
+ | |||
+ | <code php> | ||
+ | class A {} | ||
+ | class B extends A {} | ||
+ | |||
+ | class Test { | ||
+ | public A|B $prop; | ||
+ | } | ||
+ | class Test2 extends Test { | ||
+ | public A $prop; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | In this example, the union '' | ||
+ | |||
+ | Formally, we arrive at this result as follows: First, '' | ||
+ | |||
+ | === Adding and removing union types === | ||
+ | |||
+ | It is legal to remove union types in return position and add union types in parameter position: | ||
+ | |||
+ | <code php> | ||
+ | class Test { | ||
+ | public function param1(int $param) {} | ||
+ | public function param2(int|float $param) {} | ||
+ | |||
+ | public function return1(): int|float {} | ||
+ | public function return2(): int {} | ||
+ | } | ||
+ | |||
+ | class Test2 extends Test { | ||
+ | public function param1(int|float $param) {} // Allowed: Adding extra param type | ||
+ | public function param2(int $param) {} // FORBIDDEN: Removing param type | ||
+ | |||
+ | public function return1(): int {} // Allowed: Removing return type | ||
+ | public function return2(): int|float {} // FORBIDDEN: Adding extra return type | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | === Variance of individual union members === | ||
+ | |||
+ | Similarly, it is possible to restrict a union member in return position, or widen a union member in parameter position: | ||
+ | |||
+ | <code php> | ||
+ | class A {} | ||
+ | class B extends A {} | ||
+ | |||
+ | class Test { | ||
+ | public function param1(B|string $param) {} | ||
+ | public function param2(A|string $param) {} | ||
+ | |||
+ | public function return1(): A|string {} | ||
+ | public function return2(): B|string {} | ||
+ | } | ||
+ | |||
+ | class Test2 extends Test { | ||
+ | public function param1(A|string $param) {} // Allowed: Widening union member B -> A | ||
+ | public function param2(B|string $param) {} // FORBIDDEN: Restricting union member A -> B | ||
+ | |||
+ | public function return1(): B|string {} // Allowed: Restricting union member A -> B | ||
+ | public function return2(): A|string {} // FORBIDDEN: Widening union member B -> A | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Of course, the same can also be done with multiple union members at a time, and be combined with the addition/ | ||
+ | |||
+ | ==== Coercive typing mode ==== | ||
+ | |||
+ | When '' | ||
+ | |||
+ | If the exact type of the value is not part of the union, then the target type is chosen in the following order of preference: | ||
+ | |||
+ | - '' | ||
+ | - '' | ||
+ | - '' | ||
+ | - '' | ||
+ | |||
+ | If the type both exists in the union, and the value can be coerced to the type under PHPs existing type checking semantics, then the type is chosen. Otherwise the next type is tried. | ||
+ | |||
+ | As an exception, if the value is a string and both '' | ||
+ | |||
+ | Types that are not part of the above preference list are not eligible targets for implicit coercion. In particular no implicit coercions to the '' | ||
+ | |||
+ | === Conversion Table === | ||
+ | |||
+ | The following table shows how the above order of preference plays out for different input types, assuming that the exact type is not part of the union: | ||
+ | |||
+ | ^Original type ^1st try ^2nd try ^3rd try ^ | ||
+ | |bool | ||
+ | |int |float | ||
+ | |float | ||
+ | |string | ||
+ | |object | ||
+ | |||
+ | === Examples === | ||
+ | |||
+ | <code php> | ||
+ | // int|string | ||
+ | 42 --> 42 // exact type | ||
+ | " | ||
+ | new ObjectWithToString --> " | ||
+ | // object never compatible with int, fall back to string | ||
+ | 42.0 --> 42 // float compatible with int | ||
+ | 42.1 --> 42 // float compatible with int | ||
+ | 1e100 --> " | ||
+ | INF | ||
+ | true --> 1 // bool compatible with int | ||
+ | [] --> TypeError | ||
+ | |||
+ | // int|float|bool | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | // int numeric string | ||
+ | "" | ||
+ | " | ||
+ | [] --> TypeError // array not compatible with int, float or bool | ||
+ | </ | ||
+ | |||
+ | ==== Alternatives ==== | ||
+ | |||
+ | There are two main alternatives to the preference-based approach used by this proposal: | ||
+ | |||
+ | The first is to specify that union types //always// use strict typing, thus avoiding any complicated coercion semantics altogether. Apart from the inconsistency this introduces in the language, this has two main disadvantages: | ||
+ | |||
+ | The second is to perform the coercions based on the order of types. This would mean that '' | ||
+ | |||
+ | ==== Property types and references ==== | ||
+ | |||
+ | References to typed properties with union types follow the semantics outlined in the [[rfc/ | ||
+ | |||
+ | > If typed properties are part of the reference set, then the value is checked against each property type. If a type check fails, a TypeError is generated and the value of the reference remains unchanged. | ||
+ | > | ||
+ | > There is one additional caveat: If a type check requires a coercion of the assigned value, it may happen that all type checks succeed, but result in different coerced values. As a reference can only have a single value, this situation also leads to a TypeError. | ||
+ | |||
+ | The [[rfc/ | ||
+ | |||
+ | <code php> | ||
+ | class Test { | ||
+ | public int|string $x; | ||
+ | public float|string $y; | ||
+ | } | ||
+ | $test = new Test; | ||
+ | $r = " | ||
+ | $test->x =& $r; | ||
+ | $test->y =& $r; | ||
+ | |||
+ | // Reference set: { $r, $test-> | ||
+ | // Types: { mixed, int|string, float|string } | ||
+ | |||
+ | $r = 42; // TypeError | ||
+ | </ | ||
+ | |||
+ | The basic issue is that the final assigned value (after type coercions have been performed) must be compatible with all types that are part of the reference set. However, in this case the coerced value will be '' | ||
+ | |||
+ | An alternative approach would be to cast the value to the only common type '' | ||
+ | |||
+ | ==== Reflection ==== | ||
+ | |||
+ | To support union types, a new class '' | ||
+ | |||
+ | <code php> | ||
+ | class ReflectionUnionType extends ReflectionType { | ||
+ | /** @return ReflectionType[] */ | ||
+ | public function getTypes(); | ||
+ | |||
+ | /* Inherited from ReflectionType */ | ||
+ | /** @return bool */ | ||
+ | public function allowsNull(); | ||
+ | |||
+ | /* Inherited from ReflectionType */ | ||
+ | /** @return string */ | ||
+ | public function __toString(); | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | |||
+ | For example, the type '' | ||
+ | |||
+ | The '' | ||
+ | |||
+ | The '' | ||
+ | |||
+ | For backwards-compatibility reasons, union types that only include '' | ||
+ | |||
+ | === Examples === | ||
+ | |||
+ | <code php> | ||
+ | // This is one possible output, getTypes() and __toString() could | ||
+ | // also provide the types in the reverse order instead. | ||
+ | function test(): float|int {} | ||
+ | $rt = (new ReflectionFunction(' | ||
+ | var_dump(get_class($rt)); | ||
+ | var_dump($rt-> | ||
+ | var_dump($rt-> | ||
+ | var_dump((string) $rt); // " | ||
+ | |||
+ | function test2(): float|int|null {} | ||
+ | $rt = (new ReflectionFunction(' | ||
+ | var_dump(get_class($rt)); | ||
+ | var_dump($rt-> | ||
+ | var_dump($rt-> | ||
+ | // | ||
+ | var_dump((string) $rt); // " | ||
+ | |||
+ | function test3(): int|null {} | ||
+ | $rt = (new ReflectionFunction(' | ||
+ | var_dump(get_class($rt)); | ||
+ | var_dump($rt-> | ||
+ | var_dump($rt-> | ||
+ | var_dump((string) $rt); // "? | ||
+ | </ | ||
+ | |||
+ | ===== Backwards Incompatible Changes ===== | ||
+ | |||
+ | This RFC does not contain any backwards incompatible changes. However, existing ReflectionType based code will have to be adjusted in order to support processing of code that uses union types. | ||
===== Vote ===== | ===== Vote ===== | ||
Line 18: | Line 411: | ||
</ | </ | ||
+ | ===== Future Scope ===== | ||
+ | |||
+ | The features discussed in the following are **not** part of this proposal. | ||
+ | |||
+ | ==== Intersection Types ==== | ||
+ | |||
+ | Intersection types are logically conjugated with union types. Instead of requiring that (at least) a single type constraints is satisfied, all of them must be. | ||
+ | |||
+ | For example '' | ||
+ | |||
+ | ==== Mixed Type ==== | ||
+ | |||
+ | The '' | ||
+ | |||
+ | We've held off on adding a '' | ||
+ | |||
+ | ==== Literal Types ==== | ||
+ | |||
+ | The '' | ||
+ | |||
+ | <code php> | ||
+ | type ArrayFilterFlags = 0|ARRAY_FILTER_USE_KEY|ARRAY_FILTER_USE_BOTH; | ||
+ | array_filter(array $array, callable $callback, ArrayFilterFlags $flag): array; | ||
+ | |||
+ | </ | ||
+ | A benefit of using a union of literal types instead of an enum, is that it works directly with values of the underlying type, rather than an opaque enum value. As such, it is easier to retrofit without breaking backwards-compatibility. | ||
+ | |||
+ | This RFC intentionally supports the '' | ||
+ | |||
+ | * No values implicitly coerce to '' | ||
+ | * Only '' | ||
+ | |||
+ | ==== Type Aliases ==== | ||
+ | |||
+ | As types become increasingly complex, it may be worthwhile to allow reusing type declarations. There are two general ways in which this could work. One is a local alias, such as: | ||
+ | |||
+ | <code php> | ||
+ | use int|float as number; | ||
+ | |||
+ | function foo(number $x) {} | ||
+ | </ | ||
+ | |||
+ | In this case '' | ||
+ | |||
+ | The second possibility is an exported typedef: | ||
+ | |||
+ | <code php> | ||
+ | namespace Foo; | ||
+ | type number = int|float; | ||
+ | |||
+ | // Usable as \Foo\number from elsewhere | ||
+ | </ | ||
+ | |||
+ | ===== Statistics ===== | ||
+ | |||
+ | To illustrate the use of union types in the wild, the use of union types in '' | ||
+ | |||
+ | In the top two thousand composer packages there are: | ||
+ | |||
+ | * 25k parameter union types: [[https:// | ||
+ | * 14k return union types: [[https:// | ||
+ | |||
+ | In the PHP stubs for internal functions (these are incomplete right now, so the actual numbers should be at least twice as large) there are: | ||
+ | |||
+ | * 336 union return types | ||
+ | * of which 312 include '' | ||
+ | |||
+ | This illustrates that the '' |
rfc/union_types_v2.1572008417.txt.gz · Last modified: 2019/10/25 13:00 by nikic