rfc:pipe-operator

PHP RFC: Pipe Operator

Introduction

A common PHP OOP pattern is the use of method chaining, or what is also known as “Fluent Expressions”. So named for the way one method flows into the next to form a conceptual hyper-expression.

For example, the following shows a SQL query expression built out of component pieces, then executed:

$rs = $db
  ->select()->fields('id', 'name', 'email')
  ->from('user_table')
  ->whereLike('email'=>'%@example.com')
  ->orderAsc('id')
  ->execute();

This works well enough for OOP classes which were designed for fluent calling, however it is impossible, or at least unnecessarily arduous, to adapt non-fluent classes to this usage style, harder still for functional interfaces.

While decomposing these expressions to make use of multiple variables is an option, this can lead to reduced readability, polluted symbol tables, or static-analysis defying type inconsistency such as in the following two examples:

$config = loadConfig();
$dic = buildDic($config);
$app = getApp($dic);
$router = getRouter($app);
$dispatcher = getDispatcher($router, $request);
$logic = dispatchBusinessLogic($dispatcher, $request, new Response());
$render = renderResponse($logic);
$psr7 = buildPsr7Response($render);
$response =emit($psr7);  

Or:

$x = loadConfig();
$x = buildDic($x);
$x = getApp($x);
$x = getRouter($x);
$x = getDispatcher($x, $request);
$x = dispatchBusinessLogic($x, $request, new Response());
$x = renderResponse($x);
$x = buildPsr7Response($x);
$response =emit($x);  

These sorts of chains could also, conceivably, be handled using deeply nested expressions, but the code becomes even more unreadable this way:

$dispatcher = getDispatcher(getRouter(getApp(buildDic(loadConfig()))), $request);
$response = emit(buildPsr7Response(renderResponse(dispatchBusinessLogic($dispatcher, $request, new Response()))));

This RFC aims to improve code readability by bringing fluent expressions to functional and OOP libraries not originally designed for the task.

Proposal

Introduce the “Pipe Operator” `|>`, mirroring the method call operator `→`. This expression acts as a binary operation using the result of the LHS as an input to the RHS expression at an arbitrary point denoted by the additional “Pipe Replacement Variable” expression `$$`. The result of the RHS expression, after substitution, is used as the result of the operator.

This feature is being culled from HackLang 3.13 and the manual page for it may be referenced at: https://docs.hhvm.com/hack/operators/pipe-operator

PSR7 Example

Here's the equivalent chain of function calls as demonstrated in the intro section above:

  $response = loadConfig()
   |> buildDic($$)
   |> getApp($$)
   |> getRouter($$)
   |> getDispatcher($$, $request)
   |> dispatchBusinessLogic($$, $request, new Response())
   |> renderResponse($$)
   |> buildPsr7Response($$)
   |> emit($$);

File collection Example

As an example, consider the following real block of code I wrote while creating a test importer (to migrate HHVM format tests into PHPT format). Please try not to get hung up into whether or not it's “good” PHP code, but rather that it's solving a problem, which is precisely what PHP is designed to do:

$ret = 
  array_merge(
    $ret,
    getFileArg(
      array_map(
        function ($x) use ($arg) { return $arg . '/' . $x; },
        array_filter(
          scandir($arg),
          function ($x) { return $x !== '.' && $x !== '..'); }
        )
      )
    )
  );

This block of code is readable, but one must carefully examine the nesting to determine what the initial input it, and what order it traverses the steps involved.

With this proposal, the above could be rewritten as:

$ret = scandir($arg)
    |> array_filter($$, function($x) { return $x !== '.' && $x != '..'; })
    |> array_map(function ($x) use ($arg) { return $arg . '/' . $x; }, $$)
    |> getFileArg($$)
    |> array_merge($ret, $$);

This clearly, and unambiguously shows `scandir()` as the initial source of data, that it goes through an `array_filter` to avoid recursion, an `array_map` to requalify the paths, some local function, and finally a merge to combine the result with a collector variable.

FBShipIt Example

Also consider the follow segment of code which is used in production by FBShipIt to translate and export nearly all of Facebook's Opensource projects to github:

return $changeset
    |> self::skipIfAlreadyOnGitHub($$)
    |> self::stripCommonFiles(
        $$,
        $config['stripCommonFiles/exceptions'] ?? ImmVector {},
      )
    |> self::stripSubjectTags($$)
    |> self::stripCommands($$)
    |> self::delinkifyDifferentialURLs($$)
    |> self::restoreGitHubAuthor($$)
    |> ShipItUserFilters::rewriteSVNAuthor(
        $$,
        FBToGitHubUserInfo::class,
      )
    |> self::filterMessageSections(
        $$,
        $config['filterMessageSections/keepFields']
          ?? self::getDefaultMessageSectionNames(),
      )
    |> self::rewriteMentions($$)
    |> self::rewriteReviewers($$)
    |> self::rewriteAuthor($$);

This presents every step taken by the common filter chain in an easy to follow list of actions.

Backward Incompatible Changes

While most ambiguities of `$$` between pipe replacement variable and variable variables are covered in the lexer rule, the following case is not accounted for:

$a = 1;
$b = 'a';
var_dump($$ /* comment */ {'b'});
// Expected: int(1)
// Actual: Use of $$ outside of a pipe expression

This particular quirk of the parser (allowing comments in the middle of a variable-variable-brace-expression) is doubtlessly a rare occurrence in the wild, so the current implementation stopped short of trying to resolve it.

Potential resolutions:

  • Use a less-ambiguous token. `$>` which mirrors `|>` is my personal favorite. Downshot: doesn't match HackLang
  • Get very creative in the parser. Since '$' '{' expr '}' is handled by the parser, then perhaps '$' '$' should as well. So far, attempts to resolve this result in conflicts. More work may yet yield results.

Note that HHVM does not handle this case either. Nor, in fact, does it handle mere whitespace between `$$` and `{expr}`, which the attached PHP implementation does.

Update: HackLang is normally supposed to disallow variable-variables, so the use of `$$` was seen as non-conflicting. A bug in the 3.13 implementation of pipe operator meant that variable-variables temporarily wound up working where they should not have. So whatever we propose for Pipe Operator's substitution will face the same issues in both lexers eventually anyhow.

Proposed PHP Version(s)

7.2

Open Issues

See BC issues

Future Scope

The current proposal limits use of the `$$` to a single replacement per expression. This feature could potentially be expanded to allow multiple uses of `$$` within a single RHS expression.

Third-party Arguments

Informal Twitter poll (821 respondents) results: https://twitter.com/SaraMG/status/727305412807008256

  • 62% “Love It”
  • 24% “Don't Care”
  • 14% “Hate It”

In favor

  • Produces cleaner, more readable code
  • Doesn't pollute local symbol table with intermediates of potentially varying types

Against

  • The new tokens are inobvious and difficult to google for
    • Pipe chaining in other languages follows different rules
      (e.g. implicit first arg, rather than explicit placeholder)
    • Potentially confusable with variable-variables
  • No opportunity for error catching/handling
  • Can be implemented using intermediate variables

Proposed Voting Choices

Adopt the PIpe Operator yes/no? Requires a 2/3 + 1 majority.

Patches and Tests

rfc/pipe-operator.txt · Last modified: 2016/07/21 02:08 by pollita