rfc:chaining_comparison

This is an old revision of the document!


PHP RFC: Chaining Comparison

Introduction

This RFC proposes a syntax change to allow the chaining together of comparison and equality operations [==, !=, !==, ===, <, <=, >, >=] to allow arbitrary comparisons. The initial request that spawned this RFC[1] was initially only for interval checking. Discussion on the thread expanded the scope of the request to go from strictly interval checking to allowing more arbitrary number of comparisons. It evolved from there to expand to a majority of the comparison operations.

<?php
$a = 10;
 
/*
 * The initial request of this proposal was to change the following syntax
 */
if (0 < $a && $a < 100) {
    echo "Value is between 0 and 100\n";
}
 
/*
 * To allow this to be functionally the same
 */
if (0 < $a < 100) {
    echo "Value is between 0 and 100\n";
}

Proposal

Proposals herein will contain a dump of relevant AST (php-ast) nodes and OPCodes (vld) to better visualize the compilation, and execution.

Comparison Chaining

The proposal creates a new AST operation type ZEND_AST_COMPARE_OP which will be compiled in a left-precedence required (see: open issues) manor. In doing this compilation we ensure shortcutting of righter operations if the left sides have evaluated to false. To accomplish this we introduce a new means of emitting an operation, by noting where a JMPZ_EX may need to exist (see implementations for zend_emit_op_at). This will shift operations that may have been emitted by compiling the right side of this AST compare to allow jumping over them if the left side of the operation is evaluated to false. I believe this means is necessary because we can't just shortcut if the left operation is false, false < $a++ should still evaluate the right part of the expression. We should only inject the JMPZ_EX ops, IF, the left child is a chained ZEND_AST_COMPARE_OP. The proposal also changes the associativity of the equality, and comparison, operations to being left associative.

<?php
$a = 1;
$b = 10;
 
var_dump($a < 5 < $b++); // bool(true)
 
/*
 * AST Dump
 *
 * 2:  AST_CALL
 *      expr: AST_NAME
 *          flags: NAME_NOT_FQ (1)
 *        name: "var_dump"
 *      args: AST_ARG_LIST
 *         0: AST_COMPARE_OP
 *             flags: COMPARE_IS_SMALLER (19)
 *             left: AST_COMPARE_OP
 *                 flags: COMPARE_IS_SMALLER (19)
 *                 left: AST_VAR
 *                     name: "a"
 *                 right: 5
 *             right: AST_POST_INC
 *                 var: AST_VAR
 *                     name: "b"
 */
 
/*
 * OPCodes
 *
 *    2        INIT_FCALL                                               'var_dump'
 *    3        IS_SMALLER                                       ~4      !0, 5
 *    4      > JMPZ_EX                                          ~6      ~4, ->7
 *    5    >   POST_INC                                         ~5      !1
 *    6    >   IS_SMALLER                                       ~6      ~4, ~5
 *    7    >   SEND_VAL                                                 ~6
 *    8        DO_ICALL                                                 
 */

So we can see what this feature will do internally. Speaking directly at the OPCodes we see how our JMPZ_EX code injection works. Since the first evaluated IS_SMALLER op is the left side-recursive of the expression, we determine this expressions result. If the result evaluates to true (and if you look at the code it checks to see if there's an extended_value flag) we continue to the POST_INC otherwise we skip to the sending of the value which would be false.

Equality Chaining

The proposal is also extended to allow chaining of equality operators. However, equality operators are at a higher (or not as significant) precedence as the comparison operations. This enforces equality operators operate on either booleans, or values that will be compared to a boolean value. Example:

<?php
$a = 1;
$b = 10;
 
var_dump($a == 1 === true); // bool(true)
 
/*
 * AST Dump
 *
 *   2: AST_CALL
 *       expr: AST_NAME
 *           flags: NAME_NOT_FQ (1)
 *           name: "var_dump"
 *       args: AST_ARG_LIST
 *           0: AST_COMPARE_OP
 *               flags: COMPARE_IS_IDENTICAL (15)
 *               left: AST_COMPARE_OP
 *                   flags: COMPARE_IS_EQUAL (17)
 *                   left: AST_VAR
 *                       name: "a"
 *                   right: 1
 *               right: AST_CONST
 *                   name: AST_NAME
 *                       flags: NAME_NOT_FQ (1)
 *                       name: "true"
 */
 
/*
 * OPCodes
 *
 *   2        INIT_FCALL                                               'var_dump'
 *   3        IS_EQUAL                                         ~4      !0, 1
 *   4      > JMPZ_EX                                          ~5      ~4, ->6
 *   5    >   IS_IDENTICAL                                     ~5      ~4, <true>
 *   6    >   SEND_VAL                                                 ~5
 */

Backward Incompatible Changes

No BC Breaking changes expected (see: Open Issues)

Proposed PHP Version(s)

Next PHP (currently 7.2)

RFC Impact

To Opcache

Yes, we're adding new JMPZ_EX codes when chaining to ensure false values correctly jump over any pre/post inc/dev ops from eval.

Open Issues

Should equality and comparison expressions be treated as same precedence?

This is harder of a question that it seems. What we are asking is how should we parse a seemingly simple expression: 1 < 2 == 3 < 4

Why is this even a question, much less a challenging one? Well, a seemingly majority of languages [C, C++, Java, Ruby, Perl] all would tell you that the expression would evaluate to true. However some, [WollframAlpha, Python] would evaluate that expression to false. Some, like [Numbers, LibreOffice] will raise a syntax error, or give awkward answers. The question we have is which way should PHP go with the evaluation of this expression? Clearly we can ascertain that the true-evaluating languages have the precedence of the less-than operator more imporatant than that of the equality, so they check if true == true. Whereas the false-evaluating languages treat comparisons and equality with the same precedence. As such they compare 1 less than 2, 2 is-equal 3. The latter group are apparently more strictly typed and won't compare bools to numbers, but even there we can see the precedence is equal, as it's comparing the result of the first expression into the next (1 < 2) == 3

It is important to point out that the example syntax is currently valid in PHP 7.1. PHP 7.1 currently has a C-like precedence where [<, <=, >, >=] are a higher precedence than [==, !=, ===, !==]. Below are expressions and their return values in PHP 7.1, and with the two potential methods of evaluating that expression.

<?php
 
/*
 * PHP <= 7.1
 */
var_dump(1 < 2 == 3 < 4); // bool(true)
var_dump(1 < 2 == 3 < 4 == 5 < 6) // Syntax Error
 
/*
 * Proposed Chaining, comparators evaluated first; equality second [See: Implementation #1]
 */
var_dump(1 < 2 == 3 < 4); // bool(true)
var_dump(1 < 2 == 3 < 4 == 5 < 6) // bool(true)
 
/*
 * Proposed Strict Chaining [See: Implementation #2]
 */
var_dump(1 < 2 == 3 < 4); // bool(false)
var_dump(1 < 2 == 3 < 4 == 5 < 6) // bool(false)
var_dump((1 < 2) == (3 < 4) == (5 < 6)) // bool(true)

Should we allow user-defined right recursion?

Both proposed implementations currently, for non-equality, operations require a left-recursive chain. In doing this, the right node of the left comparison, if evaluated to true, is returned up the tree for comparison.

1 < 2 < 3

What I mean by this, for this example, the compiler would have the first compiled AST with a left side of another comparison-op, and the right side of 3. So it'd recurse and evaluate the left child, being 1 < 2. If the node can evaluate, and evaluate to true rather than the return result being true it would be the result of the right node, in this case 2. So when the parent node compiles the left-node, rather than the bool true being there, it's that childs right node of 2. And would then compare 2 < 3. Being a non-child node it'll here set the result to true.

The question is should we allow users to define right recursion in the manor of

1 < (2 < 3)

This would then instruct the compiler to have the 'top' node have a left side of 1, and a right side of a comparison-op. Should we be evaluating this as 1 < true, or, allow right-side defined recursion and return the left node for comparison with a 1 < (result of expr) := 1 < 2?

This is a question when it comes to personal preference, and the short circuiting of expressions. For example:

1 < 1 < $a++

With the above expression, the $a++ would never run, so after the line $a would not be altered. However, we could allow right recursion with

1 < (1 < $a++)

This writing would ensure $a++ is evaluated in the chain of less than expressions. However the above could easily be written with greater than expressions to prevent right-recursion

$a++ > 1 > 1

Unaffected PHP Functionality

Does not alter the operation of the comparison Spaceship [<=>] operator.

Proposed Voting Choices

Requires 2/3 vote

Patches and Tests

Implementation #1: comparisons evaluated before equality: https://github.com/php/php-src/compare/master...bp1222:multi-compare Implementation #2: comparisons and equality evaluated together: https://github.com/php/php-src/compare/master...bp1222:multi-compare-equal-prec

Will need eyes of those more familiar with AST/VM to review.

For changes affecting the core language, you should also provide a patch for the language specification.

Implementation

References

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/chaining_comparison.1481763012.txt.gz · Last modified: 2017/09/22 13:28 (external edit)