====== PHP RFC: Pattern Matching ======
* Version: 0.9
* Date: 2020-11-11
* Author: Larry Garfield (larry@garfieldtech.com), Ilija Tovilo (tovilo.ilija@gmail.com)
* Status: Draft
* First Published at: http://wiki.php.net/rfc/pattern-matching
===== Introduction =====
This RFC introduces pattern matching syntax for PHP. //[[https://en.wikipedia.org/wiki/Pattern_matching|Pattern Matching]]// as a language concept contains two parts: Matching a variable against a potentially complex data structure pattern, and optionally extracting values out of that variable into their own variables. In a sense it serves a similar purpose for complex data structures as regular expressions do for strings. When properly applied, it can lead to very compact but still readable code, especially when combined with conditional structures such as ''match()''. It does not, however, nor is it intended to, represent every possible type of comparison that could be imagined: Just the most common, in a more compact and readable form.
Pattern matching is found in a number of languages, including [[https://peps.python.org/pep-0636/|Python]], [[https://learn.microsoft.com/en-us/dotnet/csharp/fundamentals/functional/pattern-matching|C#]], [[https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching|F#]] [[https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html|Rust]], [[https://docs.scala-lang.org/tour/pattern-matching.html|Scala]], [[https://docs.ruby-lang.org/en/master/syntax/pattern_matching_rdoc.html|Ruby]], [[https://docs.swift.org/swift-book/documentation/the-swift-programming-language/patterns/|Swift]], Haskell, and ML, among others. The syntax offered here draws inspiration from several of them, but is not a direct port of any.
This RFC is part of the [[rfc:adts|Algebraic Data Types Epic]]. It is a stepping stone toward full Algebraic Data Types (Enums with associated values) but stands on its own as useful functionality.
===== Overview =====
This RFC proposes a number of patterns against which to match a value. A complete list is shown below, with minimal detail. Full detail of each option is described in the sections below.
//// Core functionality ////
The "is" keyword evaluates to a boolean.
if ($var is ) {
// Do stuff
}
// Basic type matching
$var is string;
$var is int|float;
$var is ?array;
$var is (Account&Authenticated)|User;
$var is mixed; // Matches anything, effectively a wildcard.
// Literal patterns
$var is "foo";
$var is 5;
$var is 3|5|null;
$var is 'heart'|'spade'|self::Wild;
//// Related syntax enhancement ////
// Support patterns in match() statements:
$result = match ($somevar) is {
Foo => 'foo',
Bar => 'bar',
Baz|Beep => 'baz',
};
//// Structural patterns ////
// Object patterns
$p is Point{ x: 3 }; // Matches any Point whose $x property is 3.
$p is Point{ x: 4|5 }; // Matches any Point whose $x property is 4 or 5.
$p is Point{ y: 3 }|null;
// Array sequence patterns
$list is [1, 2, 3, 4]; // Exact match.
$list is [1, 2, 3, ...]; // Begins with 1, 2, 3, but may have other entries.
$list is [1, 2, mixed, 4]; // Allows any value in the 3rd position.
$list is [1, 2, 3|4, 5]; // 3rd value may be 3 or 4.
// Associative array patterns
// Exact key/value match, but order doesn't matter.
$assoc is ['a' => 'A', 'b' => 'B'];
// Must have a 'b' key with value 'B', and may have other entries.
$assoc is ['b' => 'B', ...];
// Must have a 'b' key with any value, and may have other entries.
$assoc is ['b' => mixed, ...];
// Array shapes (really just a natural implication of array patterns.)
$assoc is ['a' => string, 'b' => int|float, 'c' => 'foo'|'bar'];
// Capturing values out of a pattern and binding them to variables if matched.
$p is Point {x: 3, y: $y}; // If $p->x === 3, bind $p->y to $y and return true.
$assoc is ['a' => 'A', 'b' => $b];
// If $assoc['a'] === 'A', bind $assoc['b'] to $b and return true.
//// Patterns that we may still include, TBD. ////
// Variable pinning
// Using a variable as part of a pattern.
$foo is ^$bar; // Matches $foo against the value of $bar
$p is Point {y: 37, x:^$x}; // $p->x === $x && $p->y == 37
// Nested patterns for captured variables
$p is Point { x: 3, y: $y is 5|6 };
// $p->x === 3, $p->y must be either 5 or 6, and bind $p->y to $y on match.
===== Proposal =====
This RFC introduces a new keyword and binary operator: ''is''. The ''is'' keyword indicates that its right hand side is a pattern against which its left hand side should be applied. The ''is'' operator is technically a comparison operator, and always returns a boolean ''true'' or ''false''.
if($var is ) {
}
The left-hand side of ''is'' will be evaluated first until it is reduced to a single value (which could be an arbitrarily complex object or array). That value will then be compared to the pattern, and ''true'' or ''false'' returned.
While patterns may resemble other language constructs, whatever follows ''is'' is a pattern, not some other instruction.
''is'' may be used in any context in which a boolean result is permissible. That includes variable assignment, ''if'' conditions, ''while'' conditions, ''match()'' expressions, etc.
==== Pattern structure ====
A pattern is a rule that a given value must conform to. That is fairly generic, by design. Each pattern below may be used stand-alone or combined into a compound pattern. The following are all examples of "core patterns," explained in the next section:
// Basic pattern
$foo is string;
// Compound patterns
$foo is int|null; // Combines 2 type patterns.
$foo is 'a'|'b'|'c'; // Combines 3 literal patterns.
$foo is Account&Authenticated; // Combines 2 type patterns.
$foo is Point{x: 5, y: 3|4}; // An object pattern, with a compound sub pattern that combines two literal patterns.
==== Core patterns ====
The core patterns represent the "baseline" for pattern matching. Even if some are not particularly interesting on their own, the pattern matching structure doesn't really work without them.
=== Compound patterns ===
One or more patterns may be combined into a single pattern using | and & conjunctions. Each of the subpatterns is itself a complete pattern, and may be any of the pattern types listed below (except where specifically noted). The compound pattern is also a pattern, and therefore may appear as a component of some other pattern. For instance
// Combines two type patterns.
$foo is int|string;
// Combines three type patterns, using DNF conjunctions.
$foo is User|(Account&Authenticated)
If both | and & are used, patterns must be in Disjunctive Normal Form (And ORed list of ANDs), and each segment must be enclosed in parentheses. These are the same rules that apply to compound types already. The result is that any valid type syntax is also a valid pattern, with the exception of ''never'' and ''void'', which could never match anything and so are excluded.
=== Type pattern ===
A pattern may be a type signature, including both class and primitive types. In this case, ''is'' will match the left hand side value against the specified type. Technically, each type pattern is only for a single type, but as the syntax for compound patterns is deliberately identical to that for compound types, any compound type (DNF) is supported as well.
That is, the following are all legal:
$foo is string; // Equivalent to is_string($foo)
$foo is Request; // Equivalent to $foo instanceof Request
$foo is ?array; // Equivalent to is_array($foo) || is_null($foo)
$foo is float; // Equivalent to is_int($foo) || is_float($foo), for consistency with types.
// These are compound patterns, consisting of two sub-patterns each.
$foo is int|float; // Equivalent to is_int($foo) || is_float($foo)
$foo is User|int; // Equivalent to $foo instanceof User || is_int($foo)
$foo is string|Stringable; // Equivalent to is_string($foo) || $string instanceof Stringable
// This is also a compound pattern. It is equivalent to:
// $foo instanceof User || ($foo instanceof Account && $foo instanceof Authenticated)
$foo is User|(Account&Authenticated)
// Iterable is a type, so this is also valid:
$foo is iterable; // Equivalent of $foo is \Traversable|array
// true, false, and null are now types in their own right, so will also work:
// Simple degenerate case patterns.
$foo is true; // Equivalent to $foo === true
$foo is null; // Equivalent to $foo === null
// More practical compound examples
$foo is array|null; // Equivalent to is_array($foo) || $foo === null
$foo is "Aardvark"|"Bear"|null // Equivalent to $foo === "Aardvark" || $foo === "Bear" || $foo === null
Type patterns are always evaluated in strict mode, so as to be consistent with is_int() and its siblings.
In a sense, ''$foo is pattern'' is equivalent to "would $foo pass a type check if passed to a parameter with this type specification in strict mode." Should more complex type checks become allowed (such as type aliases, etc.) they will become valid in a pattern as well. Note that, as shown in the 4th example above, an integer will pass a pattern match for type float. That is consistent with how strict type declarations work today.
Any type may be used, with the exception of ''never'' and ''void'', which can only be used in a return type and would never match anything anyway.
Of particular note, the mixed type pattern would match any value, so becomes a de facto "wildcard" to use in complex patterns. (See some further examples below.)
=== Literal pattern ===
Any scalar literal may be a pattern. When used on its own it is not particularly useful (it's equivalent to ===), but can be used in a compound pattern to more complex effect. It is also valuable when used with ''match()'' (see below).
// Simple degenerate case patterns.
$foo is 5; // Equivalent to $foo === 5
$foo is 'yay PHP'; // Equivalent to $foo === 'yay PHP'
// More practical compound example
$foo is "beep"|"boop"; // Equivalent to $foo === "beep" || $foo === "boop"
Valid literals include:
* Any int
* Any float
* Any string literal that does no string interpolation, denoted with single quotes, double quotes, heredoc or nowdoc. (So ''"boop"'' is fine, but ''"boop your $nose"'' is not.)
Values that are dynamic at runtime (eg, an interpolated string with a variable in it) are not literal patterns. However, see below on variable pinning.
=== Class constant pattern ===
Class constants may also be used as a pattern:
$foo is 'spade'|'heart'|self::Wild;
Global constants may not be used directly, as they cannot be differentiated from class names. However, they may be used in variable pinning (see below).
Enumeration cases are implemented as class constants, so are supported as well.
=== match() enhancement ===
Pattern matching is frequently used in conjunction with branching structures, in particular with enumerations. To that end, this RFC also enhances the ''match()'' structure. Specifically, if the ''is'' keyword is used in ''match()'' then ''match()'' will perform a pattern match rather than an identity comparison.
That is, this code:
$result = match ($somevar) is {
Foo => 'foo',
Bar => 'bar',
Baz|Beep => 'baz',
};
is equivalent to the following:
$result = match (true) {
$somevar is Foo => 'foo',
$somevar is Bar => 'bar',
$somevar is Baz|Beep => 'baz',
};
(See "Open Questions" below regarding the syntax for match() with patterns.)
==== Structure patterns ====
These are where pattern matching really shines. They are more involved/complex, but have more "bang for the buck" than the basic patterns above. Specifically, there are two kinds of structure patterns: Object property and array patterns. Both can also leverage a third concept, variable binding.
=== Object property pattern ===
A pattern may also define a class and matches against scope-accessible properties of that object. The properties must be get-accessible in the scope in which the pattern executes. That is, a pattern evaluated outside the class may only match against public properties; a pattern inside the class may match against public, private, or protected; a pattern in a child class may match against protected properties of its parent but not private; etc.
The "value" to match each property against is itself a pattern, so can leverage any of the above pattern combinations.
Note that matching against a property's value implies reading that property's value. Therefore, a property match behaves as though the property were read into a temporary variable and then used. That means, for example:
- If a ''get'' hook is defined for that property, it will be called.
- If the property is uninitialized, an error will be thrown.
- If the property is undefined, an error will be thrown.
- If the property is undefined but __isset() is defined and returns false, it will never match anything.
- If the property is undefined but __isset() returns true or is not defined, then the return of invoking __get() will be used. It will then be matched against the pattern the same as if it were a defined property value.
class Point {
public function __construct(
public int $x,
public int $y,
public int $z,
) {}
}
$p = new Point(3, 4, 5);
$p is Point {x: 3};
// Equivalent to:
$p instanceof Point && $p->x === 3;
$p is Point {y: 37, x: 2,};
// Equivalent to:
$p instanceof Point && $p->y === 37 && $p->x === 2;
// A multi-segment pattern that includes an object pattern.
$p is Point {x: 2}|null
// Equivalent to:
$p instanceof Point && $p->x === 2 || $p === null;
// The $x property is matched against an ORed pattern.
$p is Point { x: 2|3 }
// Equivalent to
$p instanceof Point && ($p->x === 2 || $p->x === 3)
// x must be 3, and y must be defined and initialized but we don't care what it is.
$p is Point{ x: 3, y: mixed }
// The following is NOT allowed.
$p is (Product|Point){ x: 3 }
// This is allowed, but will be interpreted like the second line.
$p is Product|Point{ x: 3 };
$p is (Product)|(Point{ x: 3 });
// This is allowed, but has the same effect as the line after it
$p is Point{}
$p is Point
Properties may be listed in any order, but must be named. A trailing comma is permitted.
=== Array structure pattern ===
Array patterns match elements of an array individually against a collection of values. It has two variants, positional or associative. That is, the pattern MUST be entirely positional, or must specify a key for every position. (This is in contrast to array literals, which allow keys to be omitted at random to get an integer assigned.) If an associative pattern is used, the order of keys is explicitly irrelevant.
By default, array matching is exhaustive. That is, the arity of the array and pattern must match. Alternatively, the pattern may include a ... sequence as its last item to disable that arity checking, rendering any unspecified array keys explicitly irrelevant.
The value for each array element is itself a pattern. While the most common use case would normally be a literal match, it also supports a type match, ORed pattern, etc. This means that array patterns can function as "array shapes" if desired. This ability becomes more powerful as more future-scope patterns (such as range or regex) are adopted, as they would also be supported for each property.
The mixed pattern may be used to assert that a key is defined without constraining what its value may be.
Sequential arrays:
// Given:
$list = [1, 3, 5, 7];
// Degenerate, not very useful case.
if ($list is [1, 3, 5, 7]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& count($list) === 4
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
&& array_key_exists(2, $list) && $list[2] === 5
&& array_key_exists(3, $list) && $list[3] === 7
) {
print "Yes";
}
if ($list is [1, 3]) {
print "Yes";
}
// False. Equivalent to:
if (is_array($list)
&& count($list) === 2
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
) {
print "Yes";
}
if ($list is [1, 3, ...]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
) {
print "Yes";
}
if ($list is [1, 3, mixed, 7]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& count($list) === 4
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
&& array_key_exists(2, $list)
&& array_key_exists(3, $list) && $list[3] === 7
) {
print "Yes";
}
if ($list is [1, 3, 5|6, ...]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
&& array_key_exists(2, $list) && ($list[2] === 5 || $list[2] === 6)
) {
print "Yes";
}
// A sequential "array shape".
if ($list is [int, int, int, mixed]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& count($list) === 4
&& array_key_exists(0, $list) && is_int($list[0])
&& array_key_exists(1, $list) && is_int($list[1])
&& array_key_exists(2, $list) && is_int($list[2])
) {
print "Yes";
}
Associative arrays:
// Given:
$assoc = ['a' => 'A', 'b' => 'B'];
// Degenerate, not very useful case.
if ($assoc is ['a' => 'A', 'b' => 'B']) {
print "Yes";
}
// True. Equivalent to:
if (is_array($assoc)
&& count($assoc) === 2
&& array_key_exists('a', $assoc) && $assoc['a'] === 'A'
&& array_key_exists('b', $assoc) && $assoc['b'] === 'B'
) {
print "Yes";
}
if ($assoc is ['b' => 'B']) {
print "Yes";
}
// False. Equivalent to:
if (is_array($assoc)
&& count($assoc) === 1
&& array_key_exists('b', $assoc) && $assoc['b'] === 'B'
) {
print "Yes";
}
if ($assoc is ['b' => 'B', ...]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($assoc) && && array_key_exists('b', $assoc) && $assoc['b'] === 'B') {
print "Yes";
}
if ($assoc is ['b' => mixed, ...]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($assoc) && array_key_exists('b', $assoc) ) {
print "Yes";
}
// An "array shape" pattern.
if ($assoc is ['a' => 'A'|'a', 'b' => string]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($assoc)
&& array_key_exists('a', $assoc) && ($assoc['a'] === 'A' || $assoc['a'] === 'a')
&& array_key_exists('b', $assoc) && is_string($assoc['b'])
) {
print "Yes";
}
Of particular note, the pattern matching approach automatically handles array_key_exists() checking. That means a missing array element will not trigger a warning, whereas with a traditional if ($foo['bar'] === 'baz') approach missing values must be accounted for by the developer manually. An associative array pattern match is also, as mentioned, explicitly unordered, whereas a ''==='' comparison also considers order. That provides some benefit in even the degenerate case of just checking a selection of keys against literal values, as missing values are handled automatically.
$foo = ['a' => 1, 'b' => 2];
// True
$foo is ['b' => 2, 'a' => 1];
// True, because == doesn't consider order.
$foo == ['b' => 2, 'a' => 1];
// False, because === does consider order.
$foo === ['b' => 2, 'a' => 1];
// False, but no error.
$foo is ['a' => 1, 'c' = 3, ...];
// Warning: $foo['c'] is not defined.
if ($foo['a'] == 1, $foo['c'] == 3) { ... }
// Warning: $foo['c'] is not defined.
if ($foo['a'] === 1, $foo['c'] === 3) { ... }
=== Variable binding ===
One of the prime uses of pattern matching is to extract a value from a larger structure, such as an object (or Enumeration/ADT, in the future). This RFC supports such variable binding by specifying the variable to populate. If the input variable matches the rest of the pattern, then the corresponding value will be extracted and assigned to a variable of that name in the current scope. It will remain in scope as long as normal variable rules say it should. Only local variables may be bound, that is, you cannot bind to a property of an object or a variable-variable.
The entire pattern either succeeds or fails. No variables will be bound unless the entire pattern matches. (That also means if a variable exists before the pattern is evaluated, its value will be unchanged if the pattern does not match.)
In the currently planned patterns, it is only relevant for object and array pattern matching.
Object binding examples:
class Point {
public function __construct(
public int $x,
public int $y,
public int $z,
) {}
}
$p = new Point(3, 4, 5);
if ($p is Point {x: 3, y: $y} ) {
print "x is 3 and y is $y.";
}
// Equivalent to:
if ($p instanceof Point && $p->x === 3) {
$y = $p->y;
print "x is 3 and y is $y.";
}
if ($p is Point {z: $z, x: 3, y: $y} ) {
print "x is 3 and y is $y and z is $z.";
}
// Equivalent to:
if ($p instanceof Point && $p->x === 3) {
$y = $p->y;
$z = $p->z;
print "x is 3 and y is $y and z is $z.";
}
Array binding examples:
if ($list is [1, 3, $third, 7]) {
print "Yes: $third";
}
// True. Equivalent to:
if (is_array($list)
&& count($list) === 4
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
&& array_key_exists(2, $list)
&& array_key_exists(3, $list) && $list[3] === 7
) {
$third = $list[2];
print "Yes: $third";
}
if ($list is [1, 3, $third, ...]) {
print "Yes: $third";
}
// True. Equivalent to:
if (is_array($list)
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
&& array_key_exists(2, $list)
) {
$third = $list[2];
print "Yes: $third";
}
if ($assoc is ['a' => 'A', 'b' => $b]) {
print "Yes: $b";
}
// True. Equivalent to:
if (is_array($assoc)
&& count($assoc) === 2
&& array_key_exists('a', $assoc) && $assoc['a'] === 'A'
&& array_key_exists('b', $assoc)
) {
$b = $assoc['b'];
print "Yes: $b";
}
Additionally, on array patterns, a variable name may be listed after the ... to capture any array elements not otherwise matched. On a positional pattern, the array will be reindexed to 0. On an associative pattern, the keys will be retained. This is true regardless of the nature of the array being used. Be aware that the reindexing process necessarily has non-trivial cost. (See notes in the examples below.) If there are no unmatched elements, the trailing variable will be set to an empty array.
if ($list is [1, 3, ...$rest]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
) {
$rest = array_slice($list, 2);
print "Yes";
}
// While this will work as described, be aware that its
//performance will not be as trivial as it typically is
// in functional languages. PHP still needs to create
// a new array and replicate values over to it, which
// has linear cost. It should be safe to do on its own,
// but doing so inside a loop or with recursion is not
// advisable.
if ($list is [$head, ...$tail]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& array_key_exists(0, $list) && $list[0] === 1
) {
$head = $list[0];
$tail = array_slice($list, 1);
print "Yes";
}
// In this case, there are no other elements
// so $tail will be [].
if ($list is [1, 3, 5, 7, ...$tail]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& count($list) === 4
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
&& array_key_exists(2, $list) && $list[2] === 5
&& array_key_exists(3, $list) && $list[3] === 7
) {
$tail = array_slice($list, 3);
print "Yes";
}
if ($assoc is ['b' => 'B', ...$other]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($assoc) && && array_key_exists('b', $assoc) && $assoc['b'] === 'B') {
$other = array_diff_key($assoc, ['b' => true]);
print "Yes";
}
A pattern that includes variable binding may not be ORed with another pattern, as depending on the segment that matches the variable may or may not end up defined, and there's no reliable way to determine that other than isset(). By extension, a mixed AND/OR pattern is also not supported. An AND-only compound pattern is permitted, however, and elements of the structure pattern (object properties or array keys) may contain ORed and ANDed patterns.
// NOT allowed, as its behavior is ambiguous.
$p is Point { $x } | Circle { $radius }
// But this is allowed.
$p is Point { x: 3|5, y: $y }
// Equivalent to
if ($p instanceof Point && ($p->x === 3 || $p->x === 5)) {
$y = $p->y;
// ...
}
// This is also allowed:
$p is Colorable&Point { x: 3|5, y: $y }
For object patterns (only), if the variable name to extract to is the same as the name of the property, then the property name may be omitted. That is, the following two examples are exactly equivalent:
if ($p is Point {z: $z, x: 3, y: $y} ) {
print "x is 3 and y is $y and z is $z.";
}
// Shorthand
if ($p is Point {$z, x: 3, $y} ) {
print "x is 3 and y is $y and z is $z.";
}
Variable binding is especially useful in ''match()'' statements, where there is no simple logical equivalent that doesn't involve additional functions.
$result = match ($p) is {
// These will match only some Point objects, depending on their property values.
Point{x: 3, y: 9, $z} => "x is 3, y is 9, z is $z",
Point{$z, $x, y: 4} => "x is $x, y is 4, z is $z",
Point{x: 5, $y} => "x is 5, y is $y, and z doesn't matter",
// This will match any Point object.
Point{$x, $y, $z} => "x is $x, y is $y, z is $z",
};
Note that in this case, the variables ''$x'', ''$y'', and ''$z'' may or may not be defined after the ''match()'' statement executes depending on which pattern was matched.
=== Applying patterns to bound variables ===
Some languages support a dedicated syntax to apply additional restrictions (pattern based or otherwise) to a matched variable. Say, only match if a given string property is one of a particular set of values, but if it is, then bind that value to the variable.
Because the proposed syntax supports full DNF pattern combinations, such behavior is achievable without a dedicated syntax. Specifically, a bind pattern and a filtering pattern may be combined with an ''&''.
For example:
if ($foo is Foo{a: ^$someA, b: $b & Point(x: 5, y: ^$someY) }) {
print "x is 5, y is $someY, z is $b->z";
}
// Equivalent to:
if ($foo instanceof Foo
&& $foo->a === $someA
&& $foo->b instanceof Point
&& $foo->b->x === 5
&& $foo->b->y = $someY
) {
$b = $foo->b;
print "x is 5, y is $someY, z is $b->z";
}
In the above code, the pattern will match only if both ''$b'' and ''Point(x: 5, y: ^$someY)'' match $foo->b. A binding variable will always match, so is always true. The object pattern may or may not match, which is the desired behavior.
if ($params is ['user' => $user & AuthenticatedUser{role: 'admin'}, ...]) {
print "Congrats, $user->name, you can do admin things!"
}
// Equivalent to:
if (is_array($params)
&& array_key_exists($params, 'user')
&& $params['user'] instanceof AuthenticatedUser
&& $params['user']->role === 'admin'
) {
$user = $params['user'];
print "Congrats, $user->name, you can do admin things!"
}
Similarly, in the above code, The ''user'' key needs to validate against the ''AuthenticatedUser'' pattern, and against the always-true ''$user'' binding pattern.
This does limit the available restrictions to those expressable as patterns, which is not all possible restrictions. However, this behavior is available "for free" by virtue of how ''&'' pattern combinations work. It will also naturally become more powerful in the future if and when additional pattern types are added. At this time, we don't see a necessity to add additional syntax for more complex cases. However, such additions could be made in future RFCs if a need is found. (In most languages, it's a suffix like ''if'' or ''when'' that evaluates a boolean expression, which could itself be a pattern.)
==== Variable pinning ====
Scalar variables may be used in a pattern by prefixing them with ''^''. The carat indicates that the variable that follows is referencing an existing variable, rather than binding a variable. This particular syntax was chosen as it is the same as in Ruby, the only other language we know of that has this functionality.
Global constants may also be referenced using ''^'' the same way.
Only scalar (int, float, string) variables may be pinned. The behavior of an object or array in a pattern is ambiguous, as technically neither are valid patterns. (There are pattern syntaxes that look like them, but are not the same thing.) This restriction may be lifted by a future RFC if a reasonable implementation is found.
Expressions may not be pinned, only defined variables. Allowing expressions opens up a whole host of potential side-effects and other weirdness, none of which we want to address. If an expression is needed, calculate it first and assign it to a variable, which can then be pinned.
As with literals, variable pinning is useful mainly in compound patterns and match().
// Simple degenerate case patterns.
$foo is ^$bar; // Equivalent to $foo === $bar
$foo is ^PHP_VERSION; // Equivalent to $foo === PHP_VERSION
// More practical compound patterns
$foo is ^Errors::$notFound|^Errors::$invalid; // Equivalent to $foo === Errors::$notFound || $foo === Errors::$invalid
// An object pattern with pinned variables.
$p is Point {y: 37, x:^$x,};
// Equivalent to:
$p instanceof Point && $p->y === 37 && $p->x === $x;
// An array pattern with pinned variables.
if ($assoc is ['a' => 'A', 'b' => ^$b]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($assoc)
&& count($assoc) === 2
&& array_key_exists('a', $assoc) && $assoc['a'] === 'A'
&& array_key_exists('b', $assoc) && $assoc['b'] === $b
) {
print "Yes";
}
===== Backward Incompatible Changes =====
A new keyword is added, ''is''. That will conflict with any user-defined global constant named ''is''.
No other BC breaks are expected.
===== Proposed PHP Version(s) =====
PHP 8.5/9.0.
===== RFC Impact =====
===== Open Issues =====
==== Include other patterns in the initial RFC? ====
Do any other patterns need to be included in the initial RFC? Are there any listed in Future Scope that are must-have for the initial release?
==== match() "is" placement ====
The authors are split as to how the syntax for pattern matching match() should work. There are two options:
$result = match ($somevar) is {
Foo => 'foo',
Bar => 'bar',
Baz|Beep => 'baz',
};
$result = match ($somevar) {
is Foo => 'foo',
is Bar => 'bar',
is Baz|Beep => 'baz',
};
The former is shorter, and applies pattern matching to all arms. The latter is more explicit, and would allow individual arms to be pattern matched or not depending on the presence of ''is''. Of course, these options are not mutually exclusive and supporting both would be possible. We are looking for feedback on this question.
Of note, among the languages we surveyed, only Ruby uses the same construct for both equality and pattern matching. The ''case'' construct can use all ''when'' arms to indicate an equality match, or all ''in'' arms to indicate a pattern match, but they may not be mixed. All other languages are always-pattern.
===== Future Scope =====
Numerous other patterns can be supported in the future. The following additional patterns and use cases are possible future additions for other RFCs. (Please don't bikeshed them here; they are shown as an example of where pattern matching can extend to in the future.)
==== Enum/ADT pattern ====
A key goal of this RFC is to lay the groundwork for supporting patterns with Algebraic Data Types, aka, Enums with associated values. We believe that a good pattern matching mechanism is a prerequisite for those being fully usable in the future.
Depending on the implementation, the syntax may be identical to that use for objects above, or it may be positional (using ''()''). If this RFC passes, a future ADT RFC would include a new enum-targeted pattern if needed.
// Example of what is possible with both pattern matching and ADTs,
// All syntax subject to change.
enum Move {
case TurnLeft;
case TurnRight;
case Forward(int $amount);
}
match ($move) is {
Move::TurnLeft => $this->orientation--,
Move::TurnRight => $this->orientation++,
Move::Forward{$amount} => $this->distance += $amount,
};
enum Option {
case None;
case Some(mixed $val);
}
match ($maybe) is {
Option::Some {$val} => compute_something($val),
Option::None => 'default value',
}
==== Range pattern ====
Applicable to numeric variables, this pattern would validate that a value is within a given range. Verifying that the value is numeric is implicitly included.
$foo is 0..=10;
// Equivalent to:
$foo >=0 && $foo <= 10;
$foo is 0..<10;
// Equivalent to:
$foo >=0 && $foo < 10;
$foo is >10;
// Equivalent to:
$foo > 10;
Earlier discussion suggested that such a syntax should only be implemented as part of a broader "range" syntax for the whole language. For instance, allowing foreach (1..10 as $i) { ... }, and similar. For that reason we have omitted range for now.
(The syntax shown here is not fully developed. Please do not nitpick it yet. If there is interest in including ranges out of the gate, we will flesh this out further, possibly modeling on [[https://docs.raku.org/type/Range|Raku]] or similar.)
==== Regex pattern ====
Applicable only to ''string'' (and possible ''Stringable''?) values. This pattern validates that a value conforms to a provided regular expression, and potentially extracts values from it if appropriate. (Extracted values would only be assigned if the pattern matches.)
$foo is /^https:\/\/(?[^\/]*)/
// Equivalent to:
$matches = [];
preg_match('/^https:\/\/(?[^\/]*)/', $foo, $matches);
$hostname == $matches['hostname'];
(Note: This pattern is only in the idiation stage, so the syntax has not been fully thought through.)
==== Array-application pattern ====
One possible extension of patterns is the built-in ability to apply a pattern across an array. While that could be done straightforwardly with a foreach loop over an array, it may be more performant if the entire logic could be pushed into engine-space. One possible approach would look like this:
$ints = [1, 2, 3, 4];
$someFloats = [1, 2, 3.14, 4];
$ints is array; //True.
$someFloats is array; // False
$someFloats is array; // True
// Equivalent to:
$result = true;
foreach ($ints as $v) {
if (!is_int($v)) {
$result = false;
break;
}
}
It is not yet clear if it would indeed be more performant than the user-space alternative, or how common that usage would be. For that reason it has been left out of the RFC for now, but we mention it as a possible future extension.
==== Optional array key marker ====
As described above, array patterns support "this key must be defined and match this pattern" or "I don't care if it's defined or not" (using the ... suffix). However, there is no obvious way to indicate "this key is optional, but if it is defined it must match this pattern." Such a marker would be useful to include, although we have not yet explored a syntax for it. One possibility would be:
// $arr must have a string 'a' key, MAY have a string 'b' key but no other 'b',
// and any other keys are irrelevant.
$arr is ['a' => string, ?'b' => string, ...]
==== as keyword ====
In some cases, the desired result is not a boolean but an error condition. One possible way to address that would be with a second keyword, as, which behaves the same as is but returns the matched value or throws an Error rather than returning false.
// This either evaluates to true and assigns $username and $password to the matching properties of Foo, OR it evaluates to false.
$foo is Foo { $username, $password };
// This either evaluates to $foo and assigns $username and $password to the matching properties of Foo, OR it throws an Error.
$value = $foo as Foo { $username, $password };
This pattern could potentially be combined with the "weak mode flag" (see below) to offer object validation with embedded coercion.
==== "weak mode" flag ====
By default, pattern matching uses strict comparisons. However, there are use cases where a weak comparison is more appropriate. Setting a pattern or sub-pattern to weak-mode would permit standard PHP type coercion to determine if a value matches.
For example:
$s = "5";
// Default, strict mode
$s is int; // False
// Opt-in weak mode
$s is ~int // True
This would be particularly useful in combination with an array application pattern, to verify that, for instance, all elements in an array are numeric.
$a = [2, 4, "6", 8];
$a is array; // False
$a is array<~int>; // True
It is possible that we could extend the ''as'' keyword here as well to save the coercion. That is, if the value is weakly compatible, the ''as'' keyword would convert it safely (or throw if it cannot be). That would allow validation across an object or array in a single operation.
For example:
$a = [2, 4, "6", 8];
$intifiedA = $a as array<~int>;
// $initifiedA is now [2, 4, 6, 8]
$b = [2, 4, 'six', 8];
$intifiedB = $b as array<~int>; // Throws, because 'six' is not coerce-able to an integer.
We have not yet investigated how feasible this sort of coercion would be, but it is a potentially valuable feature.
==== Property guards ====
Something that became apparent during the development of property hooks is that a great many set hooks will be simple validation, often that a number is within a range or a string matches some criteria. At present, those use cases are achievable with hooks but can be somewhat verbose. Applying a pattern rule to a property would allow that rule to be applied on the set operation for that property, without having to implement it manually.
class Test
{
// These two properties have equivalent restrictions.
public string $name is /\w{3,}/;
public string $name {
set {
if (!preg_match($value, '/\w{3,}/') {
throw new \Exception();
}
$this->name = $value;
}
}
}
This more compact syntax would be considerably easier to read and maintain when used within a promoted constructor parameter, too. Note that variable binding would not be supported in a property guard, as it makes little logical sense.
Elevating such checks to a pattern would also make the pattern more readily available to static analysis tools (IDEs or otherwise), which would then be better able to validate if a value is about to be passed to a parameter that would not satisfy the pattern (eg, because the string is too short).
(We're not sure if ''is'' or ''as'' would make more sense to use here. That's an implementation detail we don't need to worry about until this feature is actually getting implemented.)
==== Parameter or return guards ====
In concept, parameters and returns could have a similar guard syntax to properties. The use case is arguably smaller, but it might be possible to allow variable binding. (Unclear.)
As an example, the following would be equivalent.
function test(string $name is /\w{3,}/): string is /\w{10,}/ {
return $name . ' (retired)';
}
function test(string $name): string {
$name as /\w{3,}/; // Throws if it doesn't match.
$return = $name . ' (retired)';
$return as /\w{10,}/; // Throws if it doesn't match.
return $return;
}
Naturally type-only pattern checks are entirely redundant. It would be most useful with regex or range patterns. However, it would allow literal matches, which is a feature that has been requested in the past:
function query(array $args, string $sort is 'ASC'|'DESC') { ... }
==== Patterns as variables/types ====
With complex array or object patterns, especially if guards are adopted, it becomes natural to want to reuse the same pattern in multiple places. At this time we are not sure how to do so, although it is a space we are considering. Possibilities include (unvetted):
// Wrap the pattern into an object that can be referenced, possibly with some distinguishing marker.
$naturalNum = new Pattern(int&>0);
$foo is $naturalNum; // Would need some way to disambiguate it from a binding variable.
// Put this in the "use" section of a file.
use pattern int&>0 as NaturalNum;
$foo is NaturalNum;
// Make this exposed to other files, like a constant would be.
pattern int&>0 as NaturalNum;
$foo is NaturalNum;
This is an area that requires more exploration, but we mention it here for completeness.
===== Proposed Voting Choices =====
This is a simple up-or-down vote, requiring 2/3 Yes to pass.
===== Patches and Tests =====
Links to any external patches and tests go here.
If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.
Make it clear if the patch is intended to be the final patch, or is just a prototype.
For changes affecting the core language, you should also provide a patch for the language specification.
===== Implementation =====
After the project is implemented, this section should contain
- the version(s) it was merged into
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
- a link to the language specification section (if any)
===== References =====
Links to external references, discussions or RFCs
===== Rejected Features =====
Keep this updated with features that were discussed on the mail lists.