Table of Contents

PHP RFC: Pipe operator v3

Introduction

In object-oriented code, “composition” generally means “one object having a reference to another.” In functional programming, “composition” generally means “sticking two functions together end-to-end to make a new function.” Both are valid and useful techniques, especially in a multi-paradigm language like PHP.

Composition generally takes two forms: Immediate and delayed. The immediate execution of chained functions is typically implemented with a “pipe” operator. Delayed execution is typically implemented with a composition operator, which takes two functions and produces a new function that will call each one in turn. The combination of the two cleanly enables “point-free style,” an approach to programming that limits the use of unnecessary intermediary variables. Point-free style has been gaining popularity in JavaScript circles, so will be familiar to JavaScript developers using that style.

This RFC introduces the “pipe” operator, in the form used by most other languages with such functionality. A function composition operator is saved for a follow up RFC. (See Future Scope.)

For example:

function getUsers(): array {
    return [
        new User('root', isAdmin: true),
        new User('john.doe', isAdmin: false),
    ];
}
 
function isAdmin(User $user): bool {
  return $user->isAdmin;
}
 
// This is the new syntax.
$numberOfAdmins = getUsers()
    |> fn ($list) => array_filter($list, isAdmin(...)) 
    |> count(...);
 
var_dump($numberOfAdmins); // int(1);

Proposal

This RFC introduces a new operator:

mixed |> callable;

The |> operator, or “pipe,” accepts a single-parameter callable on the right and passes the left-side value to it, evaluating to the callable's result.

Pipe (|>) evaluates left to right by passing the value (or expression result) on the left as the first and only parameter to the callable on the right. That is, the following two code fragments are logically equivalent:

$result = "Hello World" |> strlen(...);
 
$result = strlen("Hello World");

For a single call that is not especially useful. It becomes useful when multiple calls are chained together. That is, the following two code fragments are effectively equivalent:

$result = "Hello World"
    |> htmlentities(...)
    |> str_split(...)
    |> fn($x) => array_map(strtoupper(...), $x)
    |> fn($x) => array_filter($x, fn($v) => $v != 'O');
$temp = "Hello World";
$temp = htmlentities($temp);
$temp = str_split($temp);
$temp = array_map(strtoupper(...), $temp);
$temp = array_filter($temp, fn($v) => $v != 'O');
$result = $temp;

The left-hand side of the pipe may be any value or expression. The right-hand side may be any valid PHP callable that takes a single parameter, or any expression that evaluates to such a callable. Functions with more than one required parameter are not allowed and will fail as if the function were called normally with insufficient arguments. If the right-hand side does not evaluate to a valid callable it will throw an Error.

A pipe chain is an expression, and therefore may be used anywhere an expression is valid.

Precedence

The pipe operator is left-associative. The left side will be evaluated first, before the right side.

The pipe operator has a deliberately low binding order, so that most surrounding operators will execute first. In particular, arithmetic operations, null coalesce, and ternaries all have higher binding priority, allowing for the RHS to have arbitrarily complex expressions in it that will still evaluate to a callable. For example:

// These are equivalent.
$res1 = 5 + 2 |>  someFunc(...);
$res1 = (5 + 2) |>  someFunc(...);
 
// These are equivalent.
$res1 = 5 |> $null_func ?? defaultFunc(...);
$res1 = 5 |> ($null_func ?? defaultFunc(...));
 
// These are equivalent.
$res1 = 5 |> $config['flag'] ? enabledFunc(...) : disabledFunc(...);
$res1 = 5 |> ($config['flag'] ? enabledFunc(...) : disabledFunc(...));

One notable exception is if other binding orders would result in nonsensical semantics. In particular:

// This
$x ? $y |> strlen(...) : $z;
 
// Is interpreted like this:
$x ? ($y |> strlen(...)) : $z;
 
// As the alternative (processing the ? first) would not be syntactically valid.

Also of note, PHP's comparison operators (==, ===, <, etc.) have a relatively high binding priority. Therefore, |> necessarily binds lower than those, as doing otherwise would require rethinking the entire binding order and that is entirely out of scope. As a result, comparing the result of a pipe to something requires parentheses around the pipe chain.

// Without the parens here, PHP would try to
// compare the strlen closure against an integer, which is nonsensical.
$res1 = ('beep' |> strlen(...)) == 4;

Performance

The current implementation works entirely at the compiler level, and effectively transforms the first example above into the second at compile time. The result is that pipe has virtually no runtime overhead.

Callable styles

Pipe supports any callable syntax supported by PHP. At present, the most common form is first-class-callables (eg, strlen(...)), which dovetails with this syntax very cleanly. Should further improvements be made in the future, such as a revised Partial Function Application RFC, it would be supported naturally.

References

As usual, references are an issue. Supporting pass-by-ref parameters in simple cases is quite easy, and a naive implementation would support it. However, passing a value from a compound value (an object property or array element) by reference does not work, and throws an “Argument could not be passed by reference” error. In practice, it is easier to forbid pass-by-ref parameters in pipe than to allow them.

$arr = ['a' => 'A', 'b' => 'B'];
 
$val = 'C';
 
function inc_print(&$v) {
  $v++;
  print $v;
}
 
// This can be made to work.
$val |> inc_print(...);
 
// This cannot be easily made to work, and it might not even be possible.
$arr |> inc_print(...);

That is also consistent with the typical usage patterns. The whole design of the pipe operator is that data flows through it from left to right, in pure-functional way. Passing by reference would introduce all sorts of potential “spooky action at a distance.” In practice, there are few if any use cases where it would be appropriate to do in the first place.

For that reason, pass-by-ref callables are disallowed on the right-hand side of a pipe operator. That is, both examples above would error.

One exception to this is “prefer-ref” functions, which only exist in the stdlib and cannot be implemented in user-space. There are a small handful of functions that will accept either a reference or a direct value, and vary their behavior depending on which they get. When those functions are used with the pipe operator, the value will be passed by value, and the function will behave accordingly.

Syntax choice

F#, Elixir, and OCaml all use the |> operator already for this exact same behavior. There has been a long-standing discussion in JavaScript about adding a |> operator as described here. It is the standard operator for this task.

Use cases

The use cases for a pipe operator are varied. They include, among others, encouraging shallow-function-nesting, encouraging pure functions, expressing a complex process in a single expression, and emulating extension functions.

The following examples are all simplified from real-world use cases in code I have written.

String manipulation

// Convert a string to snake_case
 
$result = 'Fred Flintstone'
    |> splitString(...)           // Produces an array of individual words.
    |> fn($x) => implode('_', $x) // Join those words with _
    |> strtolower(...)            // Lowercase everything.
;
 
// $result is 'fred_flintstone'
 
// Convert a string to lowerCamelCase
 
$result = 'Fred Flintstone'
    |> splitString(...),
    |> fn($x) => array_map(ucfirst(...), $x)  // Uppercase the first letter of each word
    |> fn($x) => implode('', $x)              // Join those words
    |> lcfirst(...)                           // Now lowercase just the first letter
;
 
// $result is 'fredFlintstone'

Array combination

$arr = [
  new Widget(tags: ['a', 'b', 'c']),
  new Widget(tags: ['c', 'd', 'e']),
  new Widget(tags: ['x', 'y', 'a']),
];
 
$arr
    |> fn($x) => array_column($x, 'tags') // Gets an array of arrays
    |> fn($x) => array_merge(...$x)       // Flatten that array into one big array
    |> array_unique(...)                  // Remove duplicates
    |> array_values(...)                  // Reindex the array.
;
 
// $arr is now ['a', 'b', 'c', 'd', 'e', 'x', 'y'. 'z']

The single-expression alternative today would be:

array_values(array_unique(array_merge(...array_column($arr, 'tags'))));

Which I believe is indisputably worse.

Shallow calls

The use of a pipe for function composition also helps to separate closely related tasks so they can be developed and tested in isolation. For a (slightly) contrived and simple example, consider:

function loadWidget($id): Widget
{
    $record = DB::query("something");
    return makeWidget($record);
}
 
function loadMany(array $ids): array
{
    $data = DB::query("something");
    $ret = [];
    foreach ($data as $record) {
        $ret[] = $this->makeWidget($record);
    }
    return $ret;
}
 
function makeWidget(array $record): Widget
    // Assume this is more complicated.
    return new Widget(...$record);
}

In this code, it is impossible to test loadWidget() or loadMany() without also executing makeWidget(). While in this trivial example that's not a huge problem, in a more complex example it often is, especially if several functions/methods are nested more deeply. Dependency injection cannot fully solve this problem, unless each step is in a separate injected class.

By making it easy to chain functions together, however, that can be rebuilt like this:

function loadWidget($id): array
{
    return DB::query("something");
}
 
function loadMany(array $ids): array
{
    return DB::query("something else");
}
 
function makeWidget(array $record): Widget
    // Assume this is more complicated.
    return new Widget(...$record);
}
 
$widget = loadWidget(5) |> makeWidget(...);
 
$widgets = [1, 4, 5] 
    |> loadMany(...) 
    |> fn(array $records) => array_map(makeWidget(...), $records);

And the latter could be further simplified with either a higher-order function or partial function application. Those chains could also be wrapped up into their own functions/methods for trivial reuse. They can also be extended, too. For instance, the result of loadMany() is most likely going to be used in a foreach() loop. That's a simple further step in the chain.

$profit = [1, 4, 5] 
    |> loadMany(...)
    |> fn(array $records) => array_map(makeWidget(...), $records)
    |> fn(array $ws) => array_filter(isOnSale(...), $ws)
    |> fn(array $ws) => array_map(sellWidget(...), $ws)
    |> array_sum(...);

Moreover, because a pipe can take any callable, a pipe chain can be easily packaged up, either as a named function or anon function.

// This would be the "real" API that most code uses.
function loadSeveral($id) {
  return $id 
      |> loadMany(...) 
      |> fn(array $records) => array_map(makeWidget(...), $records);
}
 
$profit = [1, 4, 5] 
    |> loadSeveral(...)
    |> fn(array $ws) => array_filter(isOnSale(...), $ws)
    |> fn(array $ws) => array_map(sellWidget(...), $ws)
    |> array_sum(...);

That neatly encapsulates the entire logic flow of a process in a clear, compact, highly-testable set of operations.

Single-expression pipelines

Of particular note, all of the above examples are a single expression. That makes them trivial to use in places where only a single-expression is allowed, such as match() arms, short-get property hooks, short-closures, etc. For example:

$string = 'something goes HERE';
 
$newString = match ($format) {
    case 'snake_case' => $string
        |> splitString(...)
        |> fn($x) => implode('_', $x)
        |> strtolower(...),
    case 'lowerCamel' => $string
        |> splitString(...),
        |> fn($x) => array_map(ucfirst(...), $x)
        |> fn($x) => implode('', $x)
        |> lcfirst(...),
    // Other case options here.
};
 
class BunchOfTags
{
    private array $widgets = [];
 
    public array $tags {
        get => $this->widgets
            |> fn($x) => array_column($x, 'tags')
            |> fn($x) => array_merge(...$x)
            |> array_unique(...)
            |> array_values(...);
    }
}
 
$loadSeveral = fn($id) => $id 
      |> loadMany(...) 
      |> fn(array $records) => array_map(makeWidget(...), $records);

Pseudo-extension functions

“Extension functions” are a feature of Kotlin and C# (and possibly other languages) that allow for a function to act as though it is a method of another object. It has only public-read access, but has the ergonomics of a method. While not a perfect substitute, pipes do offer similar capability with a little more work.

For instance, we could easily make utility higher-order functions that will map or filter an array that is piped to them. (A more robust version that also handles iterables is only slightly more work.)

function amap(callable $c): \Closure
{
    return fn(array $a) => array_map($c, $a);
}
 
function afilter(callable $c): \Closure
{
    return fn(array $a) => array_filter($a, $c);
}

That allows them to be used, via pipes, in a manner similar to “scalar methods.”

$result = $array 
    |> afilter(is_even(...)) 
    |> amap(some_transformation(...)) 
    |> afilter(a_filter(...))
    |> count(...);

Which is not far off from what it would look like with scalar methods:

$result = $array 
    ->filter(is_even(...)) 
    ->map(some_transformation(...)) 
    ->filter(a_filter(...))
    ->count(...);

But can work with any value type, object or scalar. It also entirely removes the “does the subject come first or last” question: the subject is piped, and the arguments the higher-order function are the modifiers. It also eliminates the need to discuss which methods deserve to be “first class” operations that turn into methods. Any function can be chained onto a value of any type.

While I do not believe pipes can completely replace extension functions, they provide a reasonable emulation and most of the benefits, for trivial cost.

This RFC does not propose any such higher-order functions for the PHP standard library, as most are quite easy to implement in user space. However, such could be easily added in the future if desired for especially common cases.

Existing implementations

Multiple user-space libraries exist in PHP that attempt to replicate pipe-like or compose-like behavior. All are clunky and complex by necessity compared to a native solution. There is clear demand for this functionality, but user-space's ability to provide it is currently limited. This list has only grown since the Pipes v2 RFC, indicating an even stronger benefit to the PHP ecosystem with a solid built-in composition syntax.

Those libraries would be mostly obsoleted by this RFC (in combination with the compose follow on, as noted in future-scope), with a more compact, more universal, better-performing syntax.

Why in the engine?

The biggest limitation of any user-space implementation is performance. Even the most minimal implementation (Crell/fp) requires adding 2-3 function calls to every operation, which is relatively expensive in PHP. A native implementation would not have that additional overhead. Crell/fp also results in somewhat awkward function nesting, like this:

pipe($someVal,
    htmlentities(...),
    str_split(...),
    fn($x) => array_map(strtoupper(...), $x),
    fn($x) => array_filter($x, fn($v) => $v != 'O'),
);
 
// (Or worse if you need conditional stages.)

More elaborate implementations tend to involve magic methods (which are substantially slower than normal function/method calls) or multi-layer middlewares, which are severe overkill for sticking two functions together.

A native implementation eliminates all of those challenges.

Additionally, a native operator would make it much easier for static analysis tools to ensure compatible types. The SA tools would know the input value's type, in most cases the callable type on the RHS, and could compare them directly without several layers of obfuscated user-space function calls between them.

Future Scope

This RFC is deliberately “step 1” of several closely related features to make composition-based code easier and more ergonomic. It offers benefit on its own, but deliberately dovetails with several other features that are worthy of their own RFCs.

A compose operator for closures (likely +). Where pipe executes immediately, compose creates a new callable (Closure) that composes two or more other Closures. That allows a new operation to be defined simply and easily and then saved for later in a variable. Because it is “just” an operator, it is compatible with all other language features. That means, for example, conditionally building up a pipeline is just a matter of throwing if statements around as appropriate. The author firmly believes that a compose operator is a necessary companion to pipe, and the functionality will be incomplete without it. However, while pipe can be implemented trivially in the compile step, a compose operator will require non-trivial runtime work. For that reason it has been split out to its own RFC.

Generic partial function application. While the prior RFC was declined due to its perceived use cases being insufficient to justify its complexity, there was clear interest in it, and it would vastly improve the usability of function composition. If a less complex implementation can be found, it would most likely pass and complement this RFC well.

A __bind method or similar on objects, possibly with a dedicated operator of its own (such as >>=). If implemented by an object on the left-hand side, the right-hand side would be passed to that method to invoke as it sees fit. Such a feature would be sufficient to support arbitrary monadic behavior in PHP in a type-friendly way.

Backward Incompatible Changes

None

Proposed PHP Version(s)

8.5

Open Issues

None

Proposed Voting Choices

Yes or no vote. 2/3 required to pass.

Add the pipe operator?
Real name Yes No
Final result: 0 0
This poll has been closed.

Patches and Tests

* PR is available here: https://github.com/php/php-src/pull/17118

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged into
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature
  4. a link to the language specification section (if any)

References

Links to external references, discussions or RFCs

Rejected Features