rfc:enumerations

PHP RFC: Enumerations

  • Date: 2020-11-13
  • Author: Larry Garfield (larry@garfieldtech.com), Ilija Tovilo (tovilo.ilija@gmail.com)
  • Status: Draft
  • Target Version: PHP 8.1
  • Implementation: TBD

Introduction

This RFC introduces Enumerations to PHP. The scope of this RFC is limited to “unit enumerations,” that is, enumerations that are themselves a value, rather than simply a fancy syntax for a primitive constant, and do not include additional associated information. This capability offers greatly expanded support for data modeling, custom type definitions, and monad-style behavior. Enums enable the modeling technique of “make invalid states unrepresentable,” which leads to more robust code with less need for exhaustive testing.

Many languages have support for enumerations of some variety. A survey we conducted of various languages found that they could be categorized into three general groups: Fancy Constants, Fancy Objects, and full Algebraic Data Types (ADTs).

This RFC is part of a larger effort to introduce full Algebraic Data Types. It implements the “Fancy Objects” variant of enumerations in such a way that it may be extended to full ADTs by future RFCs. It draws both conceptually and semantically from Swift, Rust, and Kotlin, although it is not directly modeled on either.

The most popular case of enumerations is boolean, which is an enumerated type with legal values true and false. This RFC allows developers to define their own arbitrarily robust enumerations.

Proposal

Basic enumerations

This RFC introduces a new language construct, enum. Enums are similar to classes, and share the same namespaces as classes, interfaces, and traits. They are also autoloadable the same way. An Enum defines a new type, which has a fixed, limited number of possible legal values.

enum Suit {
  case Hearts;
  case Diamonds;
  case Clubs;
  case Spades;
}

This declaration creates a new enumerated type named Suit, which has four and only four legal values: Suit::Hearts, Suit::Diamonds, Suit::Clubs, and Suit::Spades. Variables may be assigned to one of those legal values. A function may be type checked against an enumerated type, in which case only values of that type may be passed.

$val = Suit::Diamonds;
 
function pick_a_card(Suit $suit) { ... }
 
pick_a_card($val);        // OK
pick_a_card(Suit::Clubs); // OK
pick_a_card('Spades');    // throws TypeError

An Enumeration may have one or more case definitions, with no maximum, although at least one is required.

Cases are not intrinsically backed by a primitive value. That is, Suit::Hearts is not equal to 0. Instead, each case is backed by a singleton object of that name. That means that:

$a = Suit::Spades;
$b = Suit::Spades;
 
$a === $b; // true
 
 
$a instanceof Suit;         // true
$a instanceof Suit::Spades; // true

Enumerated Case Methods

As both Enum Types and Enum Cases are implemented using classes, they may take methods. The Enum Type may also implement an interface, which all Cases must then fulfill, directly or indirectly. Enum Cases may not implement interfaces themselves.

interface Colorful {
  public function color(): string;
}
 
enum Suit implements Colorful {
  case Hearts {
    public function color(): string {
      return "Red";
    }
  }
 
  case Diamonds {
    public function color(): string {
      return "Red";
    }
  }
 
  case Clubs {
    public function color(): string {
      return "Black";
    }
  }
 
  case Spades {
    public function color(): string {
      return "Black";
    }
  }
 
  public function shape(): string {
    return "Rectangle";
  }
}
 
function paint(Colorful $c) { ... }
 
paint(Suit::Clubs);  // Works

In this example, all four Enum cases will have a method shape inherited from Suit, and will all have their own method color, which they implement themselves. Case methods may be arbitrarily complex, and function the same as any other method. Additionally, magic methods such as __toString and friends may also be implemented and will behave like a normal method on an object. The one exception is __construct, which it not permitted. (See below.)

Inside a method on a Case, The $this variable is defined and refers to the Case instance.

Note that in this case it would be a better data modeling practice to also define a SuitColor Enum Type with values Red and Black and return that instead. However, that would complicate this example.

The above hierarchy is logically similar to the following class structure:

interface Colorful {
  public function color(): string;
}
 
abstract class Suit implements Colorful {
  public function shape(): string {
    return "Rectangle";
  }
}
 
class Hearts extends Suit {
  public function color(): string {
    return "Red";
  }
}
 
class Diamonds extends Suit {
  public function color(): string {
    return "Red";
  }
}
 
class Clubs extends Suit {
  public function color(): string {
    return "Black";
  }
}
 
class Spades extends Suit {
  public function color(): string {
    return "Black";
  }
}

Comparison to objects

Although Enums are implemented using classes under the hood and share much of their semantics, some object-style functionality is forbidden. These either do not make sense in the scope of enums, their value is debatable (but could be re-added in the future), or their semantics are unclear.

Specifically, the following features of objects are not allowed on enumerations:

  • Static methods
  • Constructors
  • Destructors
  • Class/Enum inheritance. (Interfaces are allowed, but not parent classes.)
  • Enum/Case constants
  • Enum/Case properties
  • Dynamic properties
  • Magic methods except for those specifically listed below.

If you need any of that functionality, classes as they already exist is the superior option.

The following object functionality is available, and behaves just as it does on any other object:

  • Public, private, and protected methods. (Protected methods are effectively identical to private as inheritance is not allowed.)
  • __get, __call, __serialize, __deserialize, and __invoke magic methods
  • CLASS and FUNCTION constants behave as normal

The ::class magic constant on an Enum type evaluates to the type name including any namespace, exactly the same as an object.

The ::class magic constant on a Case evaluates to the FQCN of the Type, followed by ::, followed by the name of the case. For example, Foo\Bar\Baz\Suit::Spades.

Primitive-Equivalent Cases

By default, Enumerated Cases have no primitive equivalent. They are simply singleton objects. However, there are ample cases where an Enumerated Case needs to be able to round-trip to a database or similar datastore, so having a built-in primitive (and thus trivially serializable) equivalent defined intrinsically is useful.

To define a primitive equivalent for an Enumeration, the syntax is as follows:

enum Suit: string {
  case Hearts = 'H';
  case Diamonds = 'D';
  case Clubs = 'C';
  case Spades = 'S';
}

Primitive backing types of int, string, or float are supported, and a given enumeration supports only a single type at a time. (That is, no union of int|string.) If an enumeration is marked as having a primitive equivalent, then all cases must have a unique primitive equivalent defined explicitly. There are no auto-generated primitive equivalents (eg, sequential integers).

A Primitive-Equivalent Case will automatically down-cast to its primitive when used in a primitive context. For example, when used with print.

print Suit::Clubs;
// prints "C"
print "I hope I draw a " . Suit::Spades;
// prints "I hope I draw a S".

Passing a Primitive Case to a primitive-typed parameter or return will produce the primitive value in weak-typing mode, and produce a TypeError in strict-typing mode.

A Primitive-Backed enumeration also has a static method from() that is automatically generated. The from() method will up-cast from a primitive to its corresponding Enumerated Case. Invalid primitives with no matching Case will throw a ValueError.

$record = get_stuff_from_database($id);
print $record['suit'];
// Prints "H"
$suit = Suit::from($record['suit']);
$suit === Suit::Hearts; // True

Primitive-backed Cases are not allowed to define a __toString() method, as that would create confusion with the primitive value itself. However, primitive-backed Cases are allowed to have other methods just like any other enum:

enum Suit: string {
  case Hearts = 'H';
  case Diamonds = 'D';
  case Clubs = 'C';
  case Spades = 'S' {
    public function color(): string { return 'Black'; }
  }
 
  public function color(): string
  {
    // ...
  }
}

Value listing

The enumeration itself has an automatically generated static method cases(). cases() returns an array of all defined Cases in lexical order.

Suit::cases();
// Produces: [Suit::Hearts, Suit::Diamonds, Suit::Clubs, Suit:Spades]

If the enumeration has no primitive equivalent, the array will be packed (indexed sequentially starting from 0). If the enumeration has a primitive equivalent, the keys will be the corresponding primitive for each enumeration. If the enumeration is of type float, the keys will be rendered as strings. (So a primitive equivalent of 1.5 will result in a key of “1.5”.)

Attributes

Enums and cases may have attributes attached to them, like any other language construct. The Attribute class has two additional target constants defined: TARGET_ENUM to target only the Enum itself, and TARGET_CASE to target an Enum Case, specifically.

No engine-defined attributes are included. User-defined attributes can do whatever.

Match expressions

match expressions offer a natural and convenient way to branch logic depending on the enum value. Since every instance of a Unit Case is a singleton, it will always pass an identity check. Therefore:

$val = Suit::Diamonds;
 
$str = match ($val) {
    Suit::Spades => "The swords of a soldier",
    Suit::Clubs => "Weapons of war",
    Suit::Diamonds => "Money for this art",
    default => "The shape of my heart",
}

This usage requires no modification of match. It is a natural implication of the current functionality.

WeakMaps

As objects, Enum cases cannot be used as keys in an array. However, they can be used as keys in a WeakMap. Because they are singletons they never get garbage collected, and thus will never be removed from a WeakMap. The result is that WeakMap can be used as a reliable map from enum cases to some other value, should the need arise.

This usage requires no modification to WeakMap. It is a natural implication of the current functionality.

Reflection

Enums are reflectable using a ReflectionEnum class. It is similar to ReflectionClass but is of course missing irrelevant methods. The following methods are present behave the same as on ReflectionObject:

  • getDocComment
  • getEndLine
  • getExtension
  • getExtensionName
  • getFileName
  • getInterfaceNames
  • getInterfaces
  • getMethod
  • getMethods
  • getName
  • getNamespaceName
  • getShortName
  • getStartLine
  • hsaMethod
  • implementsInterface
  • inNamespace
  • isIterable
  • isUserDefined
  • getAttributes

It additionally has:

  • hasCase(string $name): bool - Returns true if there is a Case defined with that name. For instance, $r->hasCase('Hearts') returns true.
  • getCases(): array - Returns an array of ReflectionCase objects.
  • getCase(string $name): ReflectionCase - Returns a single ReflectionCase object for the corresponding case. If not found, it throws a ReflectionException.
  • hasType(): bool - Returns true if the Enum has a primitive equilvalent type. False if not.
  • getType(): ?string - Returns the primitive equivalent type of the Enum, if any (the string int, string, or float). If it doesn't have one, returns null.

ReflectionCase represents an individual Case in an enumeration. It also contains the following methods that mirror the version for objects:

  • getDocComment
  • getEndLine
  • getExtension
  • getExtensionName
  • getFileName
  • getInterfaceNames
  • getInterfaces
  • getMethod
  • getMethods
  • getName
  • getNamespaceName
  • getShortName
  • getStartLine
  • hsaMethod
  • implementsInterface
  • inNamespace
  • isIterable
  • isUserDefined
  • getAttributes

It also has the following methods:

  • getEnum(): ReflectionEnum - Returns a reflection of the Enum that contains the Case.
  • getPrimitive(): ?int|string|float - Returns the primitive equivalent value defined for the case, if defined. If one is not defined, it returns null.
  • getInstance(): Enum - Returns the singleton instance of the Case, as though it were read off of the Enum.

Examples

Below are a few examples of Enums in action.

Basic limited values

enum SortOrder {
  case ASC;
  case DESC;
}
 
function query($fields, $filter, SortOrder $order) { ... }

The query function can now proceed safe in the knowledge that $order is guaranteed to be either SortOrder::ASC or SortOrder::DESC. Any other value would have resulted in a TypeError, so no further error checking or testing is needed.

Advanced Exclusive values

enum UserStatus: string {
  case Pending = 'pending' {
    public function label(): string { 
      return 'Pending';
    }
  }
  case Active = 'active' {
    public function label(): string { 
      return 'Active';
    }
  }
  case Suspended = 'suspended' {
    public function label(): string { 
      return 'Suspended';
    }
  }
  case CanceledByUser = 'canceled' {
    public function label(): string { 
      return 'Canceled by user';
    }
  }
}

In this example, a user's status may be one of, and exclusively, UserStatus::Pending, UserStatus::Active, UserStatus::Suspended, or UserStatus::CanceledByUser. A function can type a parameter against UserStatus and then only accept those four values, period.

All four values have a polymorphic label() method, which returns a human-readable string. That string is independent of the “machine name” primitive equivalent string, which can be used in, for example, a database field or an HTML select box.

foreach (UserStatus::cases() as $key => val) {
  printf('<option value="%s">%s</option\n", $key, $val->label());
}

label() could alternatively be implemented as a single method using a match:

enum UserStatus: string {
  case Pending = 'pending';
  case Active = 'active';
  case Suspended = 'suspended';
  case CanceledByUser = 'canceled';
 
  public function label(): string {
    return match($this) {
      UserStatus::Pending => 'Pending',
      UserStatus::Active => 'Active',
      UserStatus::Suspended => 'Suspended',
      UserStatus::CanceledByUser => 'Canceled by user',
    };
  }
}

Which approach is better will depend on the particulars of what the method is supposed to do, and is left at the discretion of the developer.

State machine

Enums make it straightforward to express finite state machines.

enum OvenStatus {
 
  case Off {
    public function turnOn() { return OvenStatus::On; }
  }
 
  case On {
    public function turnOff() { return OvenStatus::Off; }
    public function idle() { return OvenStatus::Idle; }
  }
 
  case Idle {
    public function on() { return OvenStatus::On; }
  }
}

In this example, the oven can be in one of three states (Off, On, and Idling, meaning the flame is not on, but it will turn back on when it detects it needs to). However, it can never go from Off to Idle or Idle to Off; it must go through On state first. That means no tests need to be written or code paths defined for going from Off to Idle, because it’s literally impossible to even describe that state.

(Additional methods are of course likely in a real implementation.)

Backward Incompatible Changes

“enum” becomes a language keyword, with the usual potential for naming conflicts with existing global constants.

Open questions

* Is the case keyword necessary?

* Should it be possible to type against a specific enum case? Eg:

public function stuff(Suit::Heart|Suit:Diamond $card) { ... }

Future Scope

Grouped syntax

It would be possible, in the simple case, to allow multiple cases to be defined together, like so:

enum Suit {
  case Hearts, Diamonds, Clubs, Spades;
}

That would only work on the simple, non-primitive-backed case with no methods defined. Given that it is unclear how common that will be in practice, grouped syntaxes have a controversial history, and it's easy enough to add later if needed, we have omitted that shorthand at this time.

Static methods

The value of static methods on either Enums or Cases is unclear. As such they are not permitted at this time. Should a good case for them be made in the future (perhaps only on full ADTs?), they could be supported once a good use case has been demonstrated.

Voting

This is a simple yes/no vote to include Enumerations. 2/3 required to pass.

References

[Survey of enumerations supported by various languages, conducted by Larry](https://github.com/Crell/enum-comparison}

rfc/enumerations.txt · Last modified: 2020/11/30 23:41 by crell