rfc:match_expression

This is an old revision of the document!


PHP RFC: Match expression

Proposal

The switch statement is a fundamental control structure in almost every programming language. Unfortunately, in PHP it has some long-standing issues that make it hard to use correctly, namely:

  • Type coercion
  • No return value
  • Fallthrough
  • Inexhaustiveness

This RFC proposes a new control structure called match to resolve these issues.

match ($condition) {
    1 => {
        foo();
        bar();
    },
    2 => baz(),
}
 
$expressionResult = match ($condition) {
    1, 2 => foo(),
    3, 4 => bar(),
    default => baz(),
};

Issues

We're going to take a look at each issue and how the new match expression resolves them.

Type coercion

The switch statement loosely compares (==) the given value to the case values. This can lead to some very surprising results.

switch ('foo') {
    case 0:
      echo "Oh no!\n";
      break;
}

The match expression uses strict comparison (===) instead. The comparison is strict regardless of strict_types.

match ('foo') {
    0 => {
        echo "Never reached\n";
    },
}

No return value

It is very common that the switch produces some value that is used afterwards.

switch (1) {
    case 0:
        $y = 'Foo';
        break;
    case 1:
        $y = 'Bar';
        break;
    case 2:
        $y = 'Baz';
        break;
}
 
echo $y;
//> Bar

It is easy to forget assigning $y in one of the cases. It is also visually unintuitive to find $y declared in a deeper nested scope. match is an expression that evaluates to the result of the executed arm. This removes a lot of boilerplate and makes it impossible to forget assigning a value in an arm.

echo match (1) {
    0 => 'Foo',
    1 => 'Bar',
    2 => 'Baz',
};
//> Bar

Fallthrough

The switch fallthrough has been a large source of bugs in many languages. Each case must explicitly break out of the switch statement or the execution will continue into the next case even if the condition is not met.

switch ($pressedKey) {
    case Key::RETURN:
        save();
        // Oops, forgot the break
    case Key::DELETE:
        delete();
        break;
}

This was intended to be a feature so that multiple conditions can execute the same block of code. It is often hard to understand if the missing break was the authors intention or a mistake.

switch ($x) {
    case 1:
    case 2:
        // Same for 1 and 2
        break;
    case 3:
        // Only 3
    case 4:
        // Same for 3 and 4
        break;
}

The match expression resolves this problem by adding an implicit break after every arm. Multiple conditions can be comma-separated to execute the same block of code. There's no way to achieve the same result as 3 and 4 in the example above without an additional if statement. This is a little bit more verbose but makes the intention very obvious.

match ($x) {
    1, 2 => {
        // Same for 1 and 2
    },
    3, 4 => {
        if ($x === 3) {
            // Only 3
        }
        // Same for 3 and 4
    },
}

Inexhaustiveness

Another large source of bugs is not handling all the possible cases supplied to the switch statement.

switch ($configuration) {
    case Config::FOO:
        // ...
        break;
    case Config::BAR:
        // ...
        break;
}

This will go unnoticed until the program crashes in a weird way, causes strange behavior or even worse becomes a security hole. Many languages can check if all the cases are handled at compile time or force you to write a default case if they can't. For a dynamic language like PHP the only alternative is throwing an error. This is exactly what the match expression does. It throws an UnhandledMatchError if the condition isn't met for any of the arms.

match ($x) {
    1 => ...,
    2 => ...,
}
 
// $x can never be 3

Blocks

Sometimes passing a single expression to a match arm isn't enough, either because you need to use a statement or the code is just too long for a single expression. In those cases you can pass a block to the arm.

match ($x) {
    0 => {
        foo();
        bar();
        baz();
    },
}

Originally this RFC included a way to return a value from a block by omitting the semicolon of the last expression. This syntax is borrowed from Rust 1). Due to memory management difficulties and a lot of negative feedback on the syntax this is no longer a part of this proposal and will be discussed in a separate RFC.

// Original proposal
$y = match ($x) {
    0 => {
        foo();
        bar();
        baz() // This value is returned
    },
};
 
// Alternative syntax, <=
$y = match ($x) {
    0 => {
        foo();
        bar();
        <= baz();
    },
};
 
// Alternative syntax, separate keyword
$y = match ($x) {
    0 => {
        foo();
        bar();
        pass baz();
    },
};
 
// Alternative syntax, automatically return last expression regardless of semicolon
$y = match ($x) {
    0 => {
        foo();
        bar();
        baz();
    },
};

For the time being the following code will result in a compilation error:

$y = match ($x) {
    0 => {},
};
 
//> Match that not used as a statement can't contain blocks

Optional semicolon for match in statement form

When using match as part of some other expression it is necessary to terminate the statement with a semicolon.

$x = match ($y) { ... };

The same would usually be true if the match expression were used as a standalone expression.

match ($y) {
    ...
};

However, to make the match expression more similar to other statements like if and switch we could allow dropping the semicolon in this case only.

match ($y) {
    ...
}

This introduces an ambiguity with the + and - prefix operators.

match ($y) { ... }
-1;
 
// Could be parsed as
 
// 1
match ($y) { ... };
-1;
 
// 2
match ($y) { ... } - 1;

When match appears as the first element of a statement would always be parsed as option 1 because there are no legitimate use cases for binary operations at a statement level. All other cases work as expected.

// These work fine
$x = match ($y) { ... } - 1;
foo(match ($y) { ... } - 1);
$x[] = fn() => match ($y) { ... };
// etc.

Because there was some controversy around this feature it was moved to a secondary vote.

Miscellaneous

Arbitrary expressions

A match condition can be any arbitrary expression. Analogous to switch each condition will be checked from top to bottom until the first one matches. If a condition matches the remaining conditions won't be evaluated

match ($x) {
    functionCall() => ...,
    $this->methodCall() => ..., // methodCall isn't called if functionCall() matched with $x
    $this->property => ...,
    // etc.
}

break/continue

Just like with the switch you can use break to break out of the executed arm.

match ($x) {
    $y => {
        if ($condition) {
            break;
        }
 
        // Not executed if $condition is true
    },
}

Unlike the switch using continue targeting the match expression will trigger a compilation error.

match ($i) {
    default => {
        continue;
    },
}
//> Fatal error: "continue" targeting match is disallowed. Did you mean to use "break" or "continue 2"?

goto

Like with the switch you can use goto to jump out of match expressions.

match ($a) {
    1 => {
        match ($b) {
            2 => {
                goto end_of_match;
            },
        }
    },
}
 
end_of_match:

It is not allowed to jump into match expressions.

goto match_arm;
 
match ($b) {
    1 => {
        match_arm:
    },
}
 
//> Fatal error: 'goto' into loop, switch or match is disallowed

return

return behaves the same as in any other context. It will return from the function.

function foo($x) {
    match ($x) {
        1 => {
            return;
        },
    }
 
    // Not executed if $x is 1
}

Future scope

Block expressions

As mentioned above block expressions will be discussed in a separate RFC. We'll also use this opportunity to think about blocks in arrow functions.

Pattern matching

I have experimented with pattern matching 2) and decided not to include it in this RFC. Pattern matching is a complex topic and requires a lot of thought. Each pattern should be discussed in detail in a separate RFC.

// With pattern matching
match ($value) {
    let $a => ..., // Identifer pattern
    let 'foo' => ..., // Scalar pattern
    let 0..<10 => ..., // Range pattern
    let is string => ..., // Type pattern
    let [1, 2, $c] => ..., // Array pattern
    let Foo { foo: 1, getBar(): 2 } => ..., // Object pattern
    let $str @ is string if $str !== '' => ..., // Guard
 
    // Algebraic data types if we ever get them
    let Ast\BinaryExpr($lhs, '+', $rhs) => ...,
}
 
// Without pattern matching
match (true) {
    true => $value ..., // Identifier pattern
    'foo' => ..., // Scalar pattern
    $value >= 0 && $value < 10 => ..., // Range pattern
    is_string($value) => ..., // Type pattern
    count($value) === 3
        && isset($value[0]) && $value[0] === 1
        && isset($value[1]) && $value[1] === 2
        && isset($value[2]) => $value[2] ..., // Array pattern
    $value instanceof Foo
        && $value->foo === 1
        && $value->getBar() === 2 => ..., // Object pattern
    is_string($str) && $str !== '' => ..., // Guard
}

"Why don't you just use x"

if statements

if ($x === 1) {
    $y = ...;
} elseif ($x === 2) {
    $y = ...;
} elseif ($x === 3) {
    $y = ...;
}

Needless to say this is incredibly verbose and there's a lot of repetition. It also can't make use of the jumptable optimization. You must also not forget to write an else statement to catch unexpected values.

Hash maps

$y = [
    1 => ...,
    2 => ...,
][$x];

This code will execute every single “arm”, not just the one that is finally returned. It will also build a hash map in memory every time it is executed. And again, you must not forget to handle unexpected values.

Nested ternary operators

$y = $x === 1 ? ...
  : ($x === 2 ? ...
  : ($x === 3 ? ...
  : 0));

The parentheses make it hard to read and it's easy to make mistakes and there is no jumptable optimization. Adding more cases will make the situation worse.

Backward Incompatible Changes

match was added as a keyword (reserved_non_modifiers). This means it can't be used in the following contexts anymore:

  • namespaces
  • class names
  • function names
  • global constants

Note that it will continue to work in method names and class constants.

Proposed PHP Version(s)

The proposed version is PHP 8.

Proposed Voting Choices

As this is a language change, a 2/3 majority is required. The vote is a straight Yes/No vote for accepting the RFC and merging the patch.

Would you like to add match expressions to the language?
Real name Yes No
ajf (ajf)  
alec (alec)  
ashnazg (ashnazg)  
bwoebi (bwoebi)  
danack (danack)  
daverandom (daverandom)  
derick (derick)  
dmitry (dmitry)  
galvao (galvao)  
jasny (jasny)  
kalle (kalle)  
kguest (kguest)  
klaussilveira (klaussilveira)  
krakjoe (krakjoe)  
lcobucci (lcobucci)  
levim (levim)  
lstrojny (lstrojny)  
marandall (marandall)  
narf (narf)  
nicolasgrekas (nicolasgrekas)  
nikic (nikic)  
ocramius (ocramius)  
pollita (pollita)  
ramsey (ramsey)  
rasmus (rasmus)  
reywob (reywob)  
salathe (salathe)  
sebastian (sebastian)  
sergey (sergey)  
svpernova09 (svpernova09)  
tandre (tandre)  
trowski (trowski)  
yunosh (yunosh)  
zimt (zimt)  
Final result: 6 28
This poll has been closed.

Secondary vote (choice with the most votes is picked):

Should the semicolon for match in statement form be optional?
Real name Yes No
ashnazg (ashnazg)  
bwoebi (bwoebi)  
derick (derick)  
galvao (galvao)  
jasny (jasny)  
kalle (kalle)  
kguest (kguest)  
lcobucci (lcobucci)  
lstrojny (lstrojny)  
marandall (marandall)  
narf (narf)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
ramsey (ramsey)  
reywob (reywob)  
salathe (salathe)  
sebastian (sebastian)  
sergey (sergey)  
stas (stas)  
svpernova09 (svpernova09)  
tandre (tandre)  
trowski (trowski)  
zimt (zimt)  
Final result: 3 20
This poll has been closed.


If you voted no, why?
Real name Not interested Don't want blocks Want return val in blocks Not useful w/o pattern matching BC break Other
ajf (ajf)       
ashnazg (ashnazg)       
danack (danack)       
daverandom (daverandom)      
derick (derick)      
kguest (kguest)     
klaussilveira (klaussilveira)      
krakjoe (krakjoe)       
lcobucci (lcobucci)      
lstrojny (lstrojny)     
marandall (marandall)      
nicolasgrekas (nicolasgrekas)       
ocramius (ocramius)      
ramsey (ramsey)      
sebastian (sebastian)       
sergey (sergey)       
svpernova09 (svpernova09)      
trowski (trowski)      
zimt (zimt)      
Final result: 0 10 0 1 0 3
This poll has been closed.
rfc/match_expression.1587757575.txt.gz · Last modified: 2020/04/24 19:46 by ilijatovilo