rfc:match_blocks

This is an old revision of the document!


PHP RFC: Match blocks

Proposal

The match expression was added to PHP 8.0 with the goal of being a safer and more useful alternative to the switch statement. In its current form, each match arm is limited to a single expression. Any arm that does not nicely fold into a singular expression prevents match from being used. This RFC proposes to lift this restriction and allow the placement of blocks at the match arm site.

// Before
switch (true) {
    // InputParameter
    case $entityExpr instanceof AST\InputParameter:
        $dqlParamKey = $entityExpr->name;
        $entitySql   = '?';
        break;
 
    // snip
};
 
// After
$entitySql = match (true) {
    // InputParameter
    $entityExpr instanceof AST\InputParameter => {
        $dqlParamKey = $entityExpr->name;
        <- '?';
    },
 
    // snip
};

Source: doctrine/orm SqlWalker.php

Semantics

Return value

Match blocks may have return values. The match expression will propagate the return value of the executed match arm. The return value is the last expression after an optional list of statements, preceded by a <- symbol to denote the value flowing out of the block. If the match return value is used, each block is expected to return a value, unless it terminates early.

$result = match($foo) {
    'bar' => {
        $l = 1;
        $r = 2;
        <- $l + $r;
    },
    'baz' => {
        throw new Exception();
    },
    'qux' => {
        // Forgot to return something
        // This will throw a MatchBlockNoValueError if executed
    },
};

If the match return value is not used, the blocks must not return a value.

FIXME: This is subject to change. Since single expressions and blocks may be mixed, it might make sense to be more lax.

match($foo) {
    'bar' => {
        echo 'This branch does not return a value';
    },
    'baz' => {
        $l = 1;
        $r = 2;
        // Compile time error: Blocks of match expression with unused result must not return a value
        <- $l + $r;
    },
    // Single expressions are still allowed, even if they are non-void
    'qux' => qux(),
};

Note that the return keyword is not reused in place of <- because it would be ambiguous whether the user meant to return from the match block, or return from the function. Similarly, yield can already refer to pausing a generator in this context.

Control statements

return, break, continue and goto statements are allowed in match blocks only if the return value of match is not used, or if they don't escape the block (e.g. continue in a loop contained in the block).

match ($foo) {
    'bar' => {
        // This is ok
        return 'baz';
        // Hint: return here means return from function, not return from block,
        // like everywhere else.
    },
};
 
var_dump(match ($foo) {
    'bar' => {
        // This is **not** ok
        break;
    },
});
 
var_dump(match ($foo) {
    'bar' => {
        for ($i = 0; $i < 10; $i++) {
            // This is ok
            continue;
        }
        <- 42;
    },
});

The rationale for this decision is twofold:

  • It attempts to avoid confusing and potentially unsound control flow. For example:
var_dump(match (1) {
    1 => {
        break;
        <- 42;
    },
});
// What is the return value of match? A value was never returned, but the var_dump must receive a value nonetheless.
  • There are technical challenges to correctly implementing control flow that escapes mid-expression. For the interested, this is explained in more detail under “Technical implications of control statements” below. Disallowing escaping of the match block completely dodges this problem.

Scoping

Match blocks behave just like any other statement list in PHP in terms of scoping. That is, no new scope is created. All variables assigned inside the block are visible outside the block, in the same function.

match ($foo) {
    'bar' => {
        $bar = 'I can see this';
    },
};
echo $bar; // I can see this

Motivation

The match expression has been introduced to address some shortcomings of switch statements, but currently fails to address approximately half of its use cases. switch cases commonly contain more than one statement. popular-package-analysis revealed that 3 507 of 6 012 switch statements contained at least one case with more than one statement (excluding breaks). Moreover, 29 690 of 67 563 cases were multi-statement. Since match expressions are limited to one expressions per arm, a single arm that does not nicely fold into a singular expression prevents a match expression from being used entirely.

It has previously been argued that limiting match arms to single expressions is beneficial for enforcing clean code. While keeping functions and consequently match arms short certainly has its merits, I personally find excessively small functions disorienting and hard to name well. Moreover, some statements (e.g. control statements) cannot be moved into separate functions.

Furthermore, the pattern matching plans to add enhancements to the match expression. Specifically, each match arm will be able to specify a pattern to match the expression against, including type checks. The Doctrine example from the introduction could become the following:

$entitySql = match ($entityExpr) {
    // InputParameter
    is AST\InputParameter => {
        $dqlParamKey = $entityExpr->name;
        <- '?';
    },
 
    // snip
};

Why not language-level blocks?

Instead of just implementing blocks for match expressions, it has been suggested to implement them as a language-level concept instead. There are three evident use cases for block expressions.

  • Match blocks
  • Arrow function blocks
  • Short-circuiting operators (??=, ??, ?:, ? :)

Unfortunately, these three use cases are all slightly different.

  • For match, whether the block should return a value depends on whether the match itself returns a value.
  • Arrow function blocks should never return a value, because function return values are controlled by return. No return value should mean null, to stay consistent with other functions.
  • For the remaining cases, a value should always be returned.

Furthermore, blocks for arrow functions have been discussed and rejected in two separate RFCs.

It seems that most concerns for both of these RFCs were related to auto-capturing, which language-level blocks cannot properly address.

It's also note that the general use of blocks is quite limited due to PHPs scoping rules. In other languages, blocks can be used to prevent pollution of the current scope.

let foo = {
    let tmp = tmp();
    // ...
    Foo { tmp }
};

In this case, tmp resides in the isolated scope and is inaccessible outside of the block. However, given that PHP only has a single scope per function, there is no point in lexically nesting the temporary variables, other than potential visual benefits. The benefits are mainly limited to some of the short circuiting operators (??=, ??, ?:, ? :), as they may skip the execution of the block under certain conditions.

$foo ??= {
    // This is only executed if $foo was null/undefined.
    $tmp = tmp();
    // ...
    <- new Foo($tmp);
};

Technical implications of control statements

PHPs VM is in three-address form. As opposed to most machines, PHP opcodes are destructive in that they consume their operands. A consumed operands may not be consumed again. Moreover, an unconsumed operand may result in leaked memory. Control statements in match expression blocks pose a problem when they skip over the consuming opcodes of temporary VARs.

new Foo() + match (1) {
    1 => { return; },
};
0000 V0 = NEW 0 string("Foo")
0001 DO_FCALL
0002 T2 = IS_IDENTICAL int(1) int(1)
0003 JMPNZ T2 0006
0004 JMP 0005
0005 MATCH_ERROR int(1)
0006 RETURN null
0007 MATCH_BLOCK_NO_VALUE_ERROR
0008 T3 = QM_ASSIGN null
0009 JMP 0010
0010 T4 = ADD V0 T3
0011 FREE T4
0012 RETURN int(1)

The opcode 0006 (RETURN) is always executed, skipping the 0010 (ADD) instruction, not consuming V0 and thus leaking the Foo object. This problem may be avoided by emitting a FREE opcode before RETURN. The same issue can occur when breaking out of switch statements, continuing in loops, using goto, etc. This approach is implemented in this PR. However, it has proven to be much more complex for questionable benefit.

Similarly, we run into an issue in this code.

foo()->bar(match (1) {
    1 => { return; },
});
0000 INIT_FCALL_BY_NAME 0 string("foo")
0001 V0 = DO_FCALL_BY_NAME
0002 INIT_METHOD_CALL 1 V0 string("bar")
0003 T1 = IS_IDENTICAL int(1) int(1)
0004 JMPNZ T1 0007
0005 JMP 0006
0006 MATCH_ERROR int(1)
0007 RETURN null
0008 MATCH_BLOCK_NO_VALUE_ERROR
0009 T2 = QM_ASSIGN null
0010 JMP 0011
0011 SEND_VAL_EX T2 1
0012 DO_FCALL
0013 RETURN int(1)

The 0007 (RETURN) instruction skips over 0012 (DO_FCALL). However, the 0002 (INIT_METHOD_CALL) instruction has already received V0 (foo()) and increased its refcount to make sure the value is not released before the method bar() is called on it. Given that 0012 (DO_FCALL) is never executed and thus foo() is never released it leaks.

Both of these issues arise because there are unfreed VARs at the time the escaping control statements in the match blocks are executed, skipping over their consuming opcodes. Disallowing the escaping of the match blocks prevents skipping over the consuming opcodes, and thus circumvents the issue.

Backwards incompatible changes

There are no backwards incompatible changes in this RFC.

Vote

Voting starts ????-??-?? and ends ????-??-??.

As this is a language change, a 2/3 majority is required.

Add support for blocks at match arms in PHP 8.x?
Real name Yes No
Final result: 0 0
This poll has been closed.
rfc/match_blocks.1694203419.txt.gz · Last modified: 2023/09/08 20:03 by ilutov