rfc:pattern-matching

PHP RFC: Pattern Matching

Introduction

This RFC introduces pattern matching syntax for PHP. Pattern Matching as a language concept contains two parts: Matching a variable against a potentially complex data structure pattern, and optionally extracting values out of that variable into their own variables. In a sense it serves a similar purpose for complex data structures as regular expressions do for strings. When properly applied, it can lead to very compact but still readable code, especially when combined with conditional structures such as match(). It does not, however, nor is it intended to, represent every possible type of comparison that could be imagined: Just the most common, in a more compact and readable form.

Pattern matching is found in a number of languages, including Python, Haskell, C#, ML, Rust, and Swift, among others. The syntax offered here draws inspiration from several of them, but is not a direct port of any.

This RFC is part of the Algebraic Data Types Epic. It is a stepping stone toward full Algebraic Data Types (Enums with associated values) but stands on its own as useful functionality.

Overview

This RFC proposes a number ofpatterns against which to match a value. A complete list is shown below, with minimal detail. Full detail of each option is described in the sections below.

//// Core functionality ////
 
The "is" keyword evaluates to a boolean.
if ($var is <pattern>) {
  // Do stuff
}
 
// Basic type matching
$var is string;
$var is int|float;
$var is ?array;
$var is (Account&Authenticated)|User;
$var is mixed; // Matches anything, effectively a wildcard.
 
// Literal patterns
$var is "foo";
$var is 5;
$var is 3|5|null;
$var is 'heart'|'spade'|self::Wild;
 
//// Related syntax enhancement ////
 
// Support patterns in match() statements:
$result = match ($somevar) is {
    Foo => 'foo',
    Bar => 'bar',
    Baz|Beep => 'baz',
};
//// Structural patterns ////
 
// Object patterns
$p is Point{ x: 3 }; // Matches any Point whose $x property is 3.
$p is Point{ x: 4|5 }; // Matches any Point whose $x property is 4 or 5.
$p is Point{ y: 3 }|null;
 
// Array sequence patterns
$list is [1, 2, 3, 4];   // Exact match.
$list is [1, 2, 3, ...]; // Begins with 1, 2, 3, but may have other entries.
$list is [1, 2, mixed, 4];   // Allows any value in the 3rd position.
$list is [1, 2, 3|4, 5]; // 3rd value may be 3 or 4.
 
// Associative array patterns
 
// Exact key/value match, but order doesn't matter.
$assoc is ['a' => 'A', 'b' => 'B']; 
// Must have a 'b' key with value 'B', and may have other entries.
$assoc is ['b' => 'B', ...];
// Must have a 'b' key with any value, and may have other entries.
$assoc is ['b' => mixed, ...];
 
// Array shapes (really just a natural implication of array patterns.)
$assoc is ['a' => string, 'b' => int|float, 'c' => 'foo'|'bar'];
 
// Capturing values out of a pattern and binding them to variables if matched.
$p is Point {x: 3, y: $y}; // If $p->x === 3, bind $p->y to $y and return true.
$assoc is ['a' => 'A', 'b' => $b];
// If $assoc['a'] === 'A', bind $assoc['b'] to $b and return true.
//// Patterns that we may still include, TBD. ////
 
// Variable reference expressions
// The syntax for this one is still in flux, don't worry about the details.
// This would allow variables from the current scope to be part of a pattern.
$foo is @($bar); // Matches $foo against the value of $bar
$p is Point {y: 37, x:@($x)}; // $p->x === $x && $p->y == 37
 
// Nested patterns for captured variables
$p is Point { x: 3, y: $y is 5|6 }; 
// $p->x === 3, $p->y must be either 5 or 6, and bind $p->y to $y on match.

Proposal

This RFC introduces a new keyword and binary operator: is. The is keyword indicates that its right hand side is a pattern against which its left hand side should be applied. The is operator is technically a comparison operator, and always returns a boolean true or false.

if($var is <pattern>) {
 
}

The left-hand side of is will be evaluated first until it is reduced to a single value (which could be an arbitrarily complex object or array). That value will then be compared to the pattern, and true or false returned.

While patterns may resemble other language constructs, whatever follows is is a pattern, not some other instruction.

is may be used in any context in which a boolean result is permissible. That includes variable assignment, if conditions, while conditions, match() expressions, etc.

Pattern structure

A pattern is a rule that a given value must conform to. That is fairly generic, by design. Each pattern below may be used stand-alone or combined into a compound pattern. The following are all examples of “core patterns,” explained in the next section:

// Basic pattern
$foo is string;
 
// Compound patterns
$foo is int|null;              // Combines 2 type patterns.
$foo is 'a'|'b'|'c';           // Combines 3 literal patterns.
$foo is Account&Authenticated; // Combines 2 type patterns.
$foo is Point{x: 5, y: 3|4};   // An object pattern, with a compound sub pattern that combines two literal patterns.

Core patterns

The core patterns represent the “baseline” for pattern matching. Even if some are not particularly interesting on their own, the pattern matching structure doesn't really work without them.

Compound patterns

One or more patterns may be combined into a single pattern using | and & conjunctions. Each of the subpatterns is itself a complete pattern, and may be any of the pattern types listed below (except where specifically noted). The compound pattern is also a pattern, and therefore may appear as a component of some other pattern. For instance

// Combines two type patterns.
$foo is int|string;
 
// Combines three type patterns, using DNF conjunctions.
$foo is User|(Account&Authenticated)

If both | and & are used, patterns must be in Disjunctive Normal Form (And ORed list of ANDs), and each segment must be enclosed in parentheses. These are the same rules that apply to compound types already. The result is that any valid type syntax is also a valid pattern, with the exception of never and void, which could never match anything and so are excluded.

Type pattern

A pattern may be a type signature, including both class and primitive types. In this case, is will match the left hand side value against the specified type. Technically, each type pattern is only for a single type, but as the syntax for compound patterns is deliberately identical to that for compound types, any compound type (DNF) is supported as well.

That is, the following are all legal:

$foo is string;    // Equivalent to is_string($foo)
$foo is Request;   // Equivalent to $foo instanceof Request
$foo is ?array;    // Equivalent to is_array($foo) || is_null($foo)
$foo is float;     // Equivalent to is_int($foo) || is_float($foo), for consistency with types.
 
// These are compound patterns, consisting of two sub-patterns each.
$foo is int|float; // Equivalent to is_int($foo) || is_float($foo)
$foo is User|int;  // Equivalent to $foo instanceof User || is_int($foo)
$foo is string|Stringable; // Equivalent to is_string($foo) || $string instanceof Stringable
 
// This is also a compound pattern. It is equivalent to:
// $foo instanceof User || ($foo instanceof Account && $foo instanceof Authenticated)
$foo is User|(Account&Authenticated)
 
// Iterable is a type, so this is also valid:
$foo is iterable;  // Equivalent of $foo is \Traversable|array
 
// true, false, and null are now types in their own right, so will also work:
// Simple degenerate case patterns.
$foo is true;      // Equivalent to $foo === true
$foo is null;      // Equivalent to $foo === null
 
// More practical compound examples
$foo is array|null; // Equivalent to is_array($foo) || $foo === null
$foo is "Aardvark"|"Bear"|null // Equivalent to $foo === "Aardvark" || $foo === "Bear" || $foo === null

Type patterns are always evaluated in strict mode, so as to be consistent with is_int() and its siblings.

In a sense, $foo is pattern is equivalent to “would $foo pass a type check if passed to a parameter with this type specification in strict mode.” Should more complex type checks become allowed (such as type aliases, etc.) they will become valid in a pattern as well. Note that, as shown in the 4th example above, an integer will pass a pattern match for type float. That is consistent with how strict type declarations work today.

Any type may be used, with the exception of never and void, which can only be used in a return type and would never match anything anyway.

Of particular note, the mixed type pattern would match any value, so becomes a de facto “wildcard” to use in complex patterns. (See some further examples below.)

Literal pattern

Any scalar literal may be a pattern. When used on its own it is not particularly useful (it's equivalent to ===), but can be used in a compound pattern to more complex effect. It is also valuable when used with match() (see below).

// Simple degenerate case patterns.
$foo is 5;         // Equivalent to $foo === 5
$foo is 'yay PHP'; // Equivalent to $foo === 'yay PHP'
 
// More practical compound example
$foo is "beep"|"boop"; // Equivalent to $foo === "beep" || $foo === "boop"

Valid literals include:

  • Any int
  • Any float
  • Any string literal that does no string interpolation, denoted with single quotes, double quotes, heredoc or nowdoc. (So “boop” is fine, but “boop your $nose” is not.)

Values that are dynamic at runtime (eg, an interpolated string with a variable in it) are not literal patterns. However, see below on “limited expression patterns.”

Class constant pattern

Class constants may also be used as a pattern:

$foo is 'spade'|'heart'|self::Wild;

Global constants may not be used directly, as they cannot be differentiated from class names. However, they may be used in expression patterns (see below).

Enumeration cases are implemented as class constants, so are supported as well.

match() enhancement

Pattern matching is frequently used in conjunction with branching structures, in particular with enumerations. To that end, this RFC also enhances the match() structure. Specifically, if the is keyword is used in match() then match() will perform a pattern match rather than an identity comparison.

That is, this code:

$result = match ($somevar) is {
    Foo => 'foo',
    Bar => 'bar',
    Baz|Beep => 'baz',
};

is equivalent to the following:

$result = match (true) {
    $somevar is Foo => 'foo',
    $somevar is Bar => 'bar',
    $somevar is Baz|Beep => 'baz',
};

(See “Open Questions” below regarding the syntax for match() with patterns.)

Structure patterns

These are where pattern matching really shines. They are more involved/complex, but have more “bang for the buck” than the basic patterns above. Specifically, there are two kinds of structure patterns: Object property and array patterns. Both can also leverage a third concept, variable binding.

Object property pattern

A pattern may also define a class and matches against scope-accessible properties of that object. The properties must be accessible in the scope in which the pattern executes. That is, a pattern evaluated outside the class may only match against public properties; a pattern inside the class may match against public, private, or protected; a pattern in a child class may match against protected properties of its parent but not private; etc.

The “value” to match each property against is itself a pattern, so can leverage any of the above pattern combinations.

Note that matching against a property's value implies reading that property's value. Therefore, a property match behaves as though the property were read into a temporary variable and then used. That means, for example:

  1. If a get hook is defined for that property, it will be called.
  2. If the property is uninitialized, an error will be thrown.
  3. If the property is undefined, an error will be thrown.
  4. If the property is undefined but __isset() is defined and returns false, it will never match anything.
  5. If the property is undefined but __isset() returns true or is not defined, then the return of invoking __get() will be used. It will then be matched against the pattern the same as if it were a defined property value.
class Point {
    public function __construct(
        public int $x, 
        public int $y, 
        public int $z,
    ) {}
}
 
$p = new Point(3, 4, 5);
 
$p is Point {x: 3};
// Equivalent to:
$p instanceof Point && $p->x === 3;
 
$p is Point {y: 37, x: 2,};
// Equivalent to:
$p instanceof Point && $p->y === 37 && $p->x === 2;
 
// A multi-segment pattern that includes an object pattern.
$p is Point {x: 2}|null
// Equivalent to:
$p instanceof Point && $p->x === 2 || $p === null;
 
// The $x property is matched against an ORed pattern.
$p is Point { x: 2|3 }
// Equivalent to
$p instanceof Point && ($p->x === 2 || $p->x === 3)
 
// x must be 3, and y must be defined and initialized but we don't care what it is.
$p is Point{ x: 3, y: mixed }
 
// The following is NOT allowed.
$p is (Product|Point){ x: 3 }
 
// This is allowed, but will be interpreted like the second line.
$p is Product|Point{ x: 3 };
$p is (Product)|(Point{ x: 3 });
 
// This is allowed, but has the same effect as the line after it
$p is Point{}
$p is Point

Properties may be listed in any order, but must be named. A trailing comma is permitted.

Array structure pattern

Array patterns match elements of an array individually against a collection of values. It has two variants, positional or associative. That is, the pattern MUST be entirely positional, or must specify a key for every position. (This is in contrast to array literals, which allow keys to be omitted at random to get an integer assigned.) If an associative pattern is used, the order of keys is explicitly irrelevant.

By default, array matching is exhaustive. That is, the arity of the array and pattern must match. Alternatively, the pattern may include a ... sequence as its last item to disable that arity checking, rendering any unspecified array keys explicitly irrelevant.

The value for each array element is itself a pattern. While the most common use case would normally be a literal match, it also supports a type match, ORed pattern, etc. This means that array patterns can function as “array shapes” if desired. This ability becomes more powerful as more future-scope patterns (such as range or regex) are adopted, as they would also be supported for each property.

The mixed pattern may be used to assert that a key is defined without constraining what its value may be.

Sequential arrays:

// Given:
$list = [1, 3, 5, 7];
 
// Degenerate, not very useful case.
if ($list is [1, 3, 5, 7]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($list) 
    && count($list) === 4 
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3 
    && array_key_exists(2, $list) && $list[2] === 5 
    && array_key_exists(3, $list) && $list[3] === 7
    ) {
    print "Yes";
}
 
 
if ($list is [1, 3]) {
  print "Yes";
}
// False.  Equivalent to:
if (is_array($list) 
    && count($list) === 2
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3
    ) {
    print "Yes";
}
 
if ($list is [1, 3, ...]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($list) 
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3
    ) {
    print "Yes";
}
 
if ($list is [1, 3, mixed, 7]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($list) 
    && count($list) === 4
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3
    && array_key_exists(2, $list)
    && array_key_exists(3, $list) && $list[3] === 7
    ) {
    print "Yes";
}
 
 
if ($list is [1, 3, 5|6, ...]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($list) 
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3
    && array_key_exists(2, $list) && ($list[2] === 5 || $list[2] === 6)
    ) {
    print "Yes";
}
 
// A sequential "array shape".
if ($list is [int, int, int, mixed]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($list) 
    && count($list) === 4
    && array_key_exists(0, $list) && is_int($list[0])
    && array_key_exists(1, $list) && is_int($list[1])
    && array_key_exists(2, $list) && is_int($list[2])
    ) {
    print "Yes";
}

Associative arrays:

// Given:
$assoc = ['a' => 'A', 'b' => 'B'];
 
// Degenerate, not very useful case.
if ($assoc is ['a' => 'A', 'b' => 'B']) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($assoc) 
    && count($assoc) === 2 
    && array_key_exists('a', $assoc) && $assoc['a'] === 'A'
    && array_key_exists('b', $assoc) && $assoc['b'] === 'B'
    ) {
    print "Yes";
}
 
if ($assoc is ['b' => 'B']) {
  print "Yes";
}
// False.  Equivalent to:
if (is_array($assoc) 
    && count($assoc) === 1 
    && array_key_exists('b', $assoc)  && $assoc['b'] === 'B'
    ) {
    print "Yes";
}
 
if ($assoc is ['b' => 'B', ...]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($assoc) && && array_key_exists('b', $assoc)  && $assoc['b'] === 'B') {
    print "Yes";
}
 
if ($assoc is ['b' => mixed, ...]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($assoc) && array_key_exists('b', $assoc) ) {
    print "Yes";
}
 
// An "array shape" pattern.
if ($assoc is ['a' => 'A'|'a', 'b' => string]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($assoc)
    && array_key_exists('a', $assoc) && ($assoc['a'] === 'A' || $assoc['a'] === 'a')
    && array_key_exists('b', $assoc) && is_string($assoc['b'])
   ) {
    print "Yes";
}

Of particular note, the pattern matching approach automatically handles array_key_exists() checking. That means a missing array element will not trigger a warning, whereas with a traditional if ($foo['bar'] === 'baz') approach missing values must be accounted for by the developer manually. An associative array pattern match is also, as mentioned, explicitly unordered, whereas a === comparison also considers order. That provides some benefit in even the degenerate case of just checking a selection of keys against literal values, as missing values are handled automatically.

$foo = ['a' => 1, 'b' => 2];
 
// True
$foo is ['b' => 2, 'a' => 1];
 
// True, because == doesn't consider order.
$foo == ['b' => 2, 'a' => 1];
 
// False, because === does consider order.
$foo === ['b' => 2, 'a' => 1];
 
// False, but no error.
$foo is ['a' => 1, 'c' = 3, ...];
 
// Warning: $foo['c'] is not defined.
if ($foo['a'] == 1, $foo['c'] == 3) { ... }
 
// Warning: $foo['c'] is not defined.
if ($foo['a'] === 1, $foo['c'] === 3) { ... }

Variable binding

One of the prime uses of pattern matching is to extract a value from a larger structure, such as an object (or Enumeration/ADT, in the future). This RFC supports such variable binding by specifying the variable to populate. If the input variable matches the rest of the pattern, then the corresponding value will be extracted and assigned to a variable of that name in the current scope. It will remain in scope as long as normal variable rules say it should. Only local variables may be bound, that is, you cannot bind to a property of an object or a variable-variable.

The entire pattern either succeeds or fails. No variables will be bound unless the entire pattern matches. (That also means if a variable exists before the pattern is evaluated, its value will be unchanged if the pattern does not match.)

In the currently planned patterns, it is only relevant for object and array pattern matching.

Object binding examples:

class Point {
    public function __construct(
        public int $x, 
        public int $y, 
        public int $z,
    ) {}
}
 
$p = new Point(3, 4, 5);
 
if ($p is Point {x: 3, y: $y} ) {
    print "x is 3 and y is $y.";
}
// Equivalent to:
if ($p instanceof Point && $p->x === 3) {
    $y = $p->y;
    print "x is 3 and y is $y.";
}
 
if ($p is Point {z: $z, x: 3, y: $y} ) {
  print "x is 3 and y is $y and z is $z.";
}
// Equivalent to:
if ($p instanceof Point && $p->x === 3) {
    $y = $p->y;
    $z = $p->z;
    print "x is 3 and y is $y and z is $z.";
}

Array binding examples:

if ($list is [1, 3, $third, 7]) {
  print "Yes: $third";
}
// True.  Equivalent to:
if (is_array($list) 
    && count($list) === 4
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3
    && array_key_exists(2, $list) 
    && array_key_exists(3, $list) && $list[3] === 7
    ) {
    $third = $list[2];
    print "Yes: $third";
}
 
if ($list is [1, 3, $third, ...]) {
  print "Yes: $third";
}
// True.  Equivalent to:
if (is_array($list) 
    && array_key_exists(0, $list) && $list[0] === 1 
    && array_key_exists(1, $list) && $list[1] === 3
    && array_key_exists(2, $list) 
    ) {
    $third = $list[2];
    print "Yes: $third";
}
 
if ($assoc is ['a' => 'A', 'b' => $b]) {
  print "Yes: $b";
}
// True.  Equivalent to:
if (is_array($assoc) 
    && count($assoc) === 2 
    && array_key_exists('a', $assoc) && $assoc['a'] === 'A'
    && array_key_exists('b', $assoc) 
    ) {
    $b = $assoc['b'];
    print "Yes: $b";
}

A pattern that includes variable binding may not be ORed with another pattern, as depending on the segment that matches the variable may or may not end up defined, and there's no reliable way to determine that other than isset(). By extension, a mixed AND/OR pattern is also not supported. An AND-only compound pattern is permitted, however, and elements of the structure pattern (object properties or array keys) may contain ORed and ANDed patterns.

// NOT allowed, as its behavior is ambiguous.
$p is Point { $x } | Circle { $radius }
 
// But this is allowed.
$p is Point { x: 3|5, y: $y }
// Equivalent to 
if ($p instanceof Point && $p->x === 3 || $p->x === 5) {
    $y = $p->y;
    // ...
}
 
// This is also allowed:
$p is Colorable&Point { x: 3|5, y: $y }

For object patterns (only), if the variable name to extract to is the same as the name of the property, then the property name may be omitted. That is, the following two examples are exactly equivalent:

if ($p is Point {z: $z, x: 3, y: $y} ) {
    print "x is 3 and y is $y and z is $z.";
}
 
// Shorthand
if ($p is Point {$z, x: 3, $y} ) {
    print "x is 3 and y is $y and z is $z.";
}

Variable binding is especially useful in match() statements, where there is no simple logical equivalent that doesn't involve additional functions.

$result = match ($p) is {
  // These will match only some Point objects, depending on their property values.
  Point{x: 3, y: 9, $z} => "x is 3, y is 9, z is $z",
  Point{$z, $x, y: 4} => "x is $x, y is 4, z is $z",
  Point{x: 5, $y} => "x is 5, y is $y, and z doesn't matter",
  // This will match any Point object.
  Point{$x, $y, $z} => "x is $x, y is $y, z is $z",
};

Note that in this case, the variables $x, $y, and $z may or may not be defined after the match() statement executes depending on which pattern was matched.

Still in-development patterns

The following patterns are nominally optional, and we are still exploring them and the proper syntax. They will be included if we can figure out how to make them work nicely.

Limited expression pattern

The use of variables directly in a pattern is not supported, as it would conflict with variable binding. However, they may be included by delineating them within @(). This approach also works for global constants. As with literals, they are useful mainly in compound patterns and match(). (NOTE: We hate the @() syntax, too. Alternative suggestions very welcome. Please just consider the feature itself for the moment.)

// Simple degenerate case patterns.
$foo is @($bar); // Equivalent to $foo === $bar
$foo is @(PHP_VERSION); // Equivalent to $foo === PHP_VERSION
 
// More practical compound expressions
$foo is @(Errors::$notFound)|@(Errors::$invalid); // Equivalent to $foo === Errors::$notFound || $foo === Errors::$invalid
 
// An object pattern with expressions to reference variables.
$p is Point {y: 37, x:@($x),};
// Equivalent to:
$p instanceof Point && $p->y === 37 && $p->x === $x;
 
// An array pattern with expressions to reference variables.
if ($assoc is ['a' => 'A', 'b' => @($b)]) {
  print "Yes";
}
// True.  Equivalent to:
if (is_array($assoc) 
    && count($assoc) === 2 
    && array_key_exists('a', $assoc) && $assoc['a'] === 'A'
    && array_key_exists('b', $assoc) && $assoc['b'] === $b
    ) {
    print "Yes";
}

It would be possible to expand this pattern to support arbitrary expressions within the delimiters, including function calls. However, that has been omitted at this time in the interest of simplicity. If a good use case for it can be shown in the future, that can be added in a backward compatible way, however.

Variable binding pattern matching

When binding to a variable, the is keyword may be nested. In that case, the entire pattern must succeed or fail. Values will be bound if and only if all binding patterns match as well.

For example:

if ($foo is Foo{a: @($someA), $b is Point(x: 5, y: @($someY)) }) {
  print "x is 5, y is $someY, z is $b->z";
}
// Equivalent to:
if ($foo instanceof Foo
    && $foo->a === $someA
    && $foo->b instanceof Point
    && $foo->b->x === 5
    && $foo->b->y = $someY
    ) {
    $b = $foo->b;
    print "x is 5, y is $someY, z is $b->z";
}
if ($params is ['user' => $user is AuthenticatedUser{role: 'admin'}, ...]) {
    print "Congrats, $user->name, you can do admin things!"
}
// Equivalent to:
if (is_array($params)
    && array_key_exists($params, 'user')
    && $params['user'] instanceof AuthenticatedUser
    && $params['user']->role === 'admin'
    ) {
    $user = $params['user'];
    print "Congrats, $user->name, you can do admin things!"
}

(Note: Some languages use a different syntax than above for this behavior. We are still investigating the ideal syntax to use. Rust, for instance, uses an @ suffix on a pattern to indicate further restrictions to apply.)

Backward Incompatible Changes

A new keyword is added, is. That will conflict with any user-defined global constant named is.

No other BC breaks are expected.

Proposed PHP Version(s)

PHP 8.5/9.0.

RFC Impact

Open Issues

Include other patterns in the initial RFC?

Do any other patterns need to be included in the initial RFC? Are there any listed in Future Scope that are must-have for the initial release?

Expression pattern syntax

The @() syntax for expression patterns is still an open question. It needs some kind of delimeter to differentiate it from class names and binding variables, but the specific syntax we are flexible on.

match() "is" placement

The authors are split as to how the syntax for pattern matching match() should work. There are two options:

$result = match ($somevar) is {
    Foo => 'foo',
    Bar => 'bar',
    Baz|Beep => 'baz',
};
$result = match ($somevar) {
    is Foo => 'foo',
    is Bar => 'bar',
    is Baz|Beep => 'baz',
};

The former is shorter, and applies pattern matching to all arms. The latter is more explicit, and would allow individual arms to be pattern matched or not depending on the presence of is. Of course, these options are not mutually exclusive and supporting both would be possible. We are looking for feedback on this question.

Future Scope

Numerous other patterns can be supported in the future. The following additional patterns and use cases are possible future additions for other RFCs. (Please don't bikeshed them here; they are shown as an example of where pattern matching can extend to in the future.)

Enum/ADT pattern

A key goal of this RFC is to lay the groundwork for supporting patterns with Algebraic Data Types, aka, Enums with associated values. We believe that a good pattern matching mechanism is a prerequisite for those being fully usable in the future.

Depending on the implementation, the syntax may be identical to that use for objects above, or it may be positional (using ()). If this RFC passes, a future ADT RFC would include a new enum-targeted pattern if needed.

// Example of what is possible with both pattern matching and ADTs,
// All syntax subject to change.
 
enum Move {
    case TurnLeft;
    case TurnRight;
    case Forward(int $amount);
}
 
match ($move) is {
    Move::TurnLeft => $this->orientation--,
    Move::TurnRight => $this->orientation++,
    Move::Forward{$amount} => $this->distance += $amount,
};
 
 
enum Option {
    case None;
    case Some(mixed $val);
}
 
match ($maybe) is {
    Option::Some {$val} => compute_something($val),
    Option::None => 'default value',
}

Range pattern

Applicable to numeric variables, this pattern would validate that a value is within a given range. Verifying that the value is numeric is implicitly included.

$foo is 0..=10;
 
// Equivalent to:
$foo >=0 && $foo <= 10;
 
$foo is 0..<10;
 
// Equivalent to:
$foo >=0 && $foo < 10;
 
$foo is >10;
 
// Equivalent to:
$foo > 10;

(The syntax shown here is not fully developed. Please do not nitpick it yet. If there is interest in including ranges out of the gate, we will flesh this out further, possibly modeling on Raku or similar.)

Regex pattern

Applicable only to string (and possible Stringable?) values. This pattern validates that a value conforms to a provided regular expression, and potentially extracts values from it if appropriate. (Extracted values would only be assigned if the pattern matches.)

$foo is /^https:\/\/(?<hostname>[^\/]*)/
 
// Equivalent to:
$matches = [];
preg_match('/^https:\/\/(?<hostname>[^\/]*)/', $foo, $matches);
$hostname == $matches['hostname'];

(Note: This pattern is only in the idiation stage, so the syntax has not been fully thought through.)

Array-application pattern

One possible extension of patterns is the built-in ability to apply a pattern across an array. While that could be done straightforwardly with a foreach loop over an array, it may be more performant if the entire logic could be pushed into engine-space. One possible approach would look like this:

$ints = [1, 2, 3, 4];
$someFloats = [1, 2, 3.14, 4];
 
$ints is array<int>; //True.  
$someFloats is array<int>; // False
$someFloats is array<int|float>; // True
 
// Equivalent to:
$result = true;
foreach ($ints as $v) {
  if (!is_int($v)) {
    $result = false;
    break;
  }
}

It is not yet clear if it would indeed be more performant than the user-space alternative, or how common that usage would be. For that reason it has been left out of the RFC for now, but we mention it as a possible future extension.

Optional array key marker

As described above, array patterns support “this key must be defined and match this pattern” or “I don't care if it's defined or not” (using the ... suffix). However, there is no obvious way to indicate “this key is optional, but if it is defined it must match this pattern.” Such a marker would be useful to include, although we have not yet explored a syntax for it. One possibility would be:

// $arr must have a string 'a' key, MAY have a string 'b' key but no other 'b',
// and any other keys are irrelevant.
$arr is ['a' => string, ?'b' => string, ...]

as keyword

In some cases, the desired result is not a boolean but an error condition. One possible way to address that would be with a second keyword, as, which behaves the same as is but returns the matched value or throws an Error rather than returning false.

 
// This either evaluates to true and assigns $username and $password to the matching properties of Foo, OR it evaluates to false.
$foo is Foo { $username, $password };
 
// This either evaluates to $foo and assigns $username and $password to the matching properties of Foo, OR it throws an Error.
$value = $foo as Foo { $username, $password };

This pattern could potentially be combined with the “weak mode flag” (see below) to offer object validation with embedded coercion.

"weak mode" flag

By default, pattern matching uses strict comparisons. However, there are use cases where a weak comparison is more appropriate. Setting a pattern or sub-pattern to weak-mode would permit standard PHP type coercion to determine if a value matches.

For example:

$s = "5";
 
// Default, strict mode
 
$s is int; // False
 
// Opt-in weak mode
 
$s is ~int // True

This would be particularly useful in combination with an array application pattern, to verify that, for instance, all elements in an array are numeric.

$a = [2, 4, "6", 8];
 
$a is array<int>; // False
 
$a is array<~int>; // True

It is possible that we could extend the as keyword here as well to save the coercion. That is, if the value is weakly compatible, the as keyword would convert it safely (or throw if it cannot be). That would allow validation across an object or array in a single operation.

For example:

$a = [2, 4, "6", 8];
 
$intifiedA = $a as array<~int>;
 
// $initifiedA is now [2, 4, 6, 8]
 
$b = [2, 4, 'six', 8];
 
$intifiedB = $b as array<~int>; // Throws, because 'six' is not coerce-able to an integer.

We have not yet investigated how feasible this sort of coercion would be, but it is a potentially valuable feature.

Property guards

Something that became apparent during the development of property hooks is that a great many set hooks will be simple validation, often that a number is within a range or a string matches some criteria. At present, those use cases are achievable with hooks but can be somewhat verbose. Applying a pattern rule to a property would allow that rule to be applied on the set operation for that property, without having to implement it manually.

class Test
{
    // These two properties have equivalent restrictions.
 
    public string $name is /\w{3,}/;
 
    public string $name { 
        set {
           if (!preg_match($value, '/\w{3,}/') {
               throw new \Exception();
           }
           $this->name = $value;
        }
    }
}

This more compact syntax would be considerably easier to read and maintain when used within a promoted constructor parameter, too. Note that variable binding would not be supported in a property guard, as it makes little logical sense.

Elevating such checks to a pattern would also make the pattern more readily available to static analysis tools (IDEs or otherwise), which would then be better able to validate if a value is about to be passed to a parameter that would not satisfy the pattern (eg, because the string is too short).

(We're not sure if is or as would make more sense to use here. That's an implementation detail we don't need to worry about until this feature is actually getting implemented.)

Parameter or return guards

In concept, parameters and returns could have a similar guard syntax to properties. The use case is arguably smaller, but it might be possible to allow variable binding. (Unclear.)

As an example, the following would be equivalent.

function test(string $name is /\w{3,}/): string is /\w{10,}/ {
    return $name . ' (retired)';
}
 
function test(string $name): string {
    $name as /\w{3,}/; // Throws if it doesn't match.
 
    $return = $name . ' (retired)';
    $return as /\w{10,}/; // Throws if it doesn't match.
    return $return;
}

Naturally type-only pattern checks are entirely redundant. It would be most useful with regex or range patterns. However, it would allow literal matches, which is a feature that has been requested in the past:

function query(array $args, string $sort is 'ASC'|'DESC') { ... }

Patterns as variables/types

With complex array or object patterns, especially if guards are adopted, it becomes natural to want to reuse the same pattern in multiple places. At this time we are not sure how to do so, although it is a space we are considering. Possibilities include (unvetted):

// Wrap the pattern into an object that can be referenced, possibly with some distinguishing marker.
$naturalNum = new Pattern(int&>0);
$foo is $naturalNum;    // Would need some way to disambiguate it from a binding variable.
 
// Put this in the "use" section of a file.
use pattern int&>0 as NaturalNum;
$foo is NaturalNum;
 
// Make this exposed to other files, like a constant would be.
pattern int&>0 as NaturalNum;
$foo is NaturalNum;

This is an area that requires more exploration, but we mention it here for completeness.

Proposed Voting Choices

This is a simple up-or-down vote, requiring 2/3 Yes to pass.

Patches and Tests

Links to any external patches and tests go here.

If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.

Make it clear if the patch is intended to be the final patch, or is just a prototype.

For changes affecting the core language, you should also provide a patch for the language specification.

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Links to external references, discussions or RFCs

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/pattern-matching.txt · Last modified: 2024/07/22 16:13 by crell