rfc:match_expression

This is an old revision of the document!


PHP RFC: Match expression

Proposal

The switch statement is a fundamental control structure in almost every programming language. Unfortunately, in PHP it has some long-standing issues that make it hard to use correctly, namely:

  • Type coercion
  • No return value
  • Fallthrough
  • Inexhaustiveness

This RFC proposes a new control structure called match to resolve these issues.

match ($condition) {
    1 => {
        foo();
        bar();
    },
    2 => baz(),
}
 
$expressionResult = match ($condition) {
    1, 2 => foo(),
    3, 4 => bar(),
    default => baz(),
};

Issues

We're going to take a look at each issue and how the new match expression resolves them.

Type coercion

The switch statement loosely compares (==) the given value to the case values. This can lead to some very surprising results.

switch ('foo') {
    case 0:
      echo "Oh no!\n";
      break;
}

The match expression uses strict comparison (===) instead. The comparison is strict regardless of strict_types.

match ('foo') {
    0 => {
        echo "Never reached\n";
    },
}

No return value

It is very common that the switch produces some value that is used afterwards.

switch (1) {
    case 0:
        $y = 'Foo';
        break;
    case 1:
        $y = 'Bar';
        break;
    case 2:
        $y = 'Baz';
        break;
}
 
echo $y;
//> Bar

It is easy to forget assigning $y in one of the cases. It is also visually unintuitive to find $y declared in a deeper nested scope. match is an expression that evaluates to the result of the executed arm. This removes a lot of boilerplate and makes it impossible to forget assigning a value in an arm.

echo match (1) {
    0 => 'Foo',
    1 => 'Bar',
    2 => 'Baz',
};
//> Bar

Fallthrough

The switch fallthrough has been a large source of bugs in many languages. Each case must explicitly break out of the switch statement or the execution will continue into the next case even if the condition is not met.

switch ($pressedKey) {
    case Key::ENTER:
        save();
        // Oops, forgot the break
    case Key::DELETE:
        delete();
        break;
}

This was intended to be a feature so that multiple conditions can execute the same block of code. It is often hard to understand if the missing break was the authors intention or a mistake.

switch ($x) {
    case 1:
    case 2:
        // Same for 1 and 2
        break;
    case 3:
        // Only 3
    case 4:
        // Same for 3 and 4
        break;
}

The match expression resolves this problem by adding an implicit break after every arm. Multiple conditions can be comma-separated to execute the same block of code. There's no way to achieve the same result as 3 and 4 in the example above without an additional if statement. This is a little bit more verbose but makes the intention very obvious.

match ($x) {
    1, 2 => {
        // Same for 1 and 2
    },
    3, 4 => {
        if ($x === 3) {
            // Only 3
        }
        // Same for 3 and 4
    },
}

Inexhaustiveness

Another large source of bugs is not handling all the possible cases supplied to the switch statement.

switch ($configuration) {
    case Config::FOO:
        // ...
        break;
    case Config::BAR:
        // ...
        break;
}

This will go unnoticed until the program crashes in a weird way, causes strange behavior or even worse becomes a security hole. Many languages can check if all the cases are handled at compile time or force you to write a default case if they can't. For a dynamic language like PHP the only alternative is throwing an error. This is exactly what the match expression does. It throws an UnhandledMatchError if the condition isn't met for any of the arms.

match ($x) {
    1 => ...,
    2 => ...,
}
 
// $x can never be 3

Blocks

Sometimes passing a single expression to a match arm isn't enough, either because you need to use a statement or the code is just too long for a single expression. In those cases you can pass a block to the arm.

match ($x) {
    0 => {
        foo();
        bar();
        baz();
    },
}

When you're making use of the result value of the match expression you must return a value from each block to the match expression. This can be done by omitting the last semicolon in the block. This syntax is borrowed (pun intended) from Rust 1).

$y = match ($x) {
    0 => {
        foo();
        bar();
        baz() // This value is returned
    },
};

Not returning a value from a block will lead to a compilation error.

$y = match ($x) {
    0 => {
        foo();
        bar();
        baz();
    },
};
 
//> Match block must return a value. Did you mean to omit the last semicolon?

This is not the same as using the return keyword.

function test() {
    $y = match ($x) {
        0 => {
            foo();
            bar();
            // baz() will be returned from test(), $y will never be assigned
            return baz();
        },
    };
}

Semicolon

When using match as part of some other expression it is necessary to terminate the statement with a semicolon.

$x = match ($y) { ... };

The same would usually be true if the match expression were used as a standalone expression.

match ($y) {
    ...
};

However, to make the match expression more similar to other statements like if and switch it is allowed to drop the semicolon in this case only.

match ($y) {
    ...
}

This introduces some ambiguities with prefix operators that are also binary operators, namely + and -.

match ($y) { ... }
-1;
 
// Could be parsed as
 
// 1
match ($y) { ... };
-1;
 
// 2
match ($y) { ... } - 1;

When match appears as the first element of a statement it will always be parsed as option 1.

Miscellaneous

Arbitrary expressions

A match condition can be any arbitrary expression. Analogous to switch each condition will be checked from top to bottom until the first one matches.

match ($x) {
    functionCall() => ...,
    $this->methodCall() => ...,
    $this->property => ...,
    // etc.
}

break/continue

Just like with the switch you can use break to break out of the executed arm.

match ($x) {
    $y => {
        if ($condition) {
            break;
        }
 
        // Not executed if $condition is true
    },
}

Unlike the switch using continue targeting the match expression will trigger a compilation error.

match ($i) {
    default => {
        continue;
    },
}
//> Fatal error: "continue" targeting match is disallowed. Did you mean to use "break" or "continue 2"?

It is not allowed to break out of a match expression that makes use of the return value.

$x = match ($i) {
    default => {
        break;
        'foo'
    },
};
 
echo $x;
 
//> Fatal error: Breaking out of match with result value disallowed

goto

Like with the switch you can use goto to jump out of match expressions.

match ($a) {
    1 => {
        match ($b) {
            2 => {
                goto end_of_match;
            },
        }
    },
}
 
end_of_match:

It is not allowed to jump into match expressions.

goto match_arm;
 
match ($b) {
    1 => {
        match_arm:
    },
}
 
//> Fatal error: 'goto' into loop, switch or match is disallowed

Nor is it allowed to jump out of match expressions that make use of the return value.

$a = match ($b) {
    1 => {
        goto end_of_match;
        'foo'
    },
};
 
end_of_match:
 
//> Fatal error: 'goto' out of match with result value disallowed

return

return behaves the same as in any other context. It will return from the function.

function foo($x) {
    match ($x) {
        1 => {
            return;
        },
    }
 
    // Not executed if $x is 1
}

Future scope

Pattern matching

I have experimented with pattern matching 2) for this RFC. Realistically it could sometimes save a few keystrokes. In my opinion this does not justify the significant complexity added to the language at the moment. It would be mostly useful for algebraic data types which PHP currently does not have.

// With pattern matching
match ($value) {
    let $a => ..., // Identifer pattern
    let 0..<10 => ..., // Range pattern
    let is string => ..., // Type pattern
    let [1, 2, $c] => ..., // Array pattern
    let Foo { foo: 1, getBar(): 2 } => ..., // Object pattern
    let $str @ is string if $str !== '' => ..., // Guard
}
 
// Without pattern matching
match (true) {
    true => $value ..., // Identifier pattern
    $value >= 0 && $value < 10 => ..., // Range pattern
    is_string($value) => ..., // Type pattern
    count($value) === 3
        && isset($value[0]) && $value[0] === 1
        && isset($value[1]) && $value[1] === 2
        && isset($value[2]) => $value[2] ..., // Array pattern
    $value instanceof Foo
        && $value->foo === 1
        && $value->getBar() === 2 => ..., // Object pattern
    is_string($str) && $str !== '' => ..., // Guard
}

While some patterns are significantly shorter (namely the array pattern) code like that is relatively rare. At the moment the argument for such a big language change is pretty weak. If the situation ever changes we can always add pattern matching at a later point in time.

Language wide blocks

While it is possible to make blocks a language wide feature there are simply not enough use cases for this at the moment. One potential use case are arrow functions.

$callable = fn() => {
    return 'foo';
};

However, allowing the same blocks here would allow for two semantically equivalent versions with different syntax.

$callable = fn() => {
    'foo'
};

We would likely want to disallow this. So even here blocks should behave slightly differently. Thus we're better off implementing blocks for arrow functions separately.

There were no other legitimate use cases I could think of.

"Why don't you just use x"

if statements

if ($x === 1) {
    $y = ...;
} elseif ($x === 2) {
    $y = ...;
} elseif ($x === 3) {
    $y = ...;
}

Needless to say this is incredibly verbose and there's a lot of repetition. It also can't make use of the jumptable optimization. You must also not forget to write an else statement to catch unexpected values.

Hash maps

$y = [
    1 => ...,
    2 => ...,
][$x];

This code will execute every single “arm”, not just the one that is finally returned. It will also build a hash map in memory everytime it is executed. And again, you must not forget to handle unexpected values.

Nested ternary operators

$y = $x === 1 ? ...
  : ($x === 2 ? ...
  : ($x === 3 ? ...
  : 0));

The parentheses make it hard to read and it's easy to make mistakes and there is no jumptable optimization. Adding more cases will make the situation worse.

Backward Incompatible Changes

match was added as a keyword (reserved_non_modifiers). This means it can't be used in the following contexts anymore:

  • namespaces
  • class names
  • function names
  • global constants

Note that it will continue to work in method names and class constants.

Proposed PHP Version(s)

The proposed version is PHP 8.

Proposed Voting Choices

As this is a language change, a 2/3 majority is required. The vote is a straight Yes/No vote for accepting the RFC and merging the patch.

rfc/match_expression.1587678836.txt.gz · Last modified: 2020/04/23 21:53 by ilijatovilo