rfc:enumerations

This is an old revision of the document!


PHP RFC: Enumerations

  • Date: 2020-12-04
  • Author: Larry Garfield (larry@garfieldtech.com), Ilija Tovilo (tovilo.ilija@gmail.com)
  • Status: In Discussion
  • Target Version: PHP 8.1

Introduction

This RFC introduces Enumerations to PHP. The scope of this RFC is limited to “unit enumerations,” that is, enumerations that are themselves a value, rather than simply a fancy syntax for a primitive constant, and do not include additional associated information. This capability offers greatly expanded support for data modeling, custom type definitions, and monad-style behavior. Enums enable the modeling technique of “make invalid states unrepresentable,” which leads to more robust code with less need for exhaustive testing.

Many languages have support for enumerations of some variety. A survey we conducted of various languages found that they could be categorized into three general groups: Fancy Constants, Fancy Objects, and full Algebraic Data Types (ADTs).

This RFC is part of a larger effort to introduce full Algebraic Data Types. It implements the “Fancy Objects” variant of enumerations in such a way that it may be extended to full ADTs by future RFCs. It draws both conceptually and semantically from Swift, Rust, and Kotlin, although it is not directly modeled on either.

The most popular case of enumerations is boolean, which is an enumerated type with legal values true and false. This RFC allows developers to define their own arbitrarily robust enumerations.

Proposal

Enumerations are built on top of classes and objects. That means, except where otherwise noted, “how would Enums behave in situation X” can be answered “the same as any other object instance.” They would, for example, pass an object type check. Enum names are case-insensitive, but subject to the same caveat about autoloading on case-sensitive file systems that already applies to classes generally. Case names are internally implemented as class constants, and thus are case-sensitive.

Basic enumerations

This RFC introduces a new language construct, enum. Enums are similar to classes, and share the same namespaces as classes, interfaces, and traits. They are also autoloadable the same way. An Enum defines a new type, which has a fixed, limited number of possible legal values.

enum Suit {
  case Hearts;
  case Diamonds;
  case Clubs;
  case Spades;
}

This declaration creates a new enumerated type named Suit, which has four and only four legal values: Suit::Hearts, Suit::Diamonds, Suit::Clubs, and Suit::Spades. Variables may be assigned to one of those legal values. A function may be type checked against an enumerated type, in which case only values of that type may be passed.

function pick_a_card(Suit $suit) { ... }
 
$val = Suit::Diamonds;
 
pick_a_card($val);        // OK
pick_a_card(Suit::Clubs); // OK
pick_a_card('Spades');    // TypeError: pick_a_card(): Argument #1 ($suit) must be of type Suit, string given

An Enumeration may have zero or more case definitions, with no maximum. A zero-case enum is syntactically valid, if rather useless.

By default, cases are not intrinsically backed by a scalar value. That is, Suit::Hearts is not equal to 0. Instead, each case is backed by a singleton object of that name. That means that:

$a = Suit::Spades;
$b = Suit::Spades;
 
$a === $b; // true
 
$a instanceof Suit;  // true

It also means that enum values are never < or > each other, since those comparisons are not meaningful on objects. Those comparisons will always return false when working with enum values.

This type of case, with no related data, is called a “Pure Case.” An Enum that contains only Pure Cases is called a Pure Enum.

All Pure Cases are implemented as instances of their enum type. The enum type is represented internally as a class.

All Cases have a read-only property, name, that is the case-sensitive name of the case itself. That is largely an implementation artifact, but may also be used for debugging purposes.

print Suit::Spades->name;
// prints "Spades"

Backed Enums

By default, Enumerated Cases have no scalar equivalent. They are simply singleton objects. However, there are ample cases where an Enumerated Case needs to be able to round-trip to a database or similar datastore, so having a built-in scalar (and thus trivially serializable) equivalent defined intrinsically is useful.

To define a scalar equivalent for an Enumeration, the syntax is as follows:

enum Suit: string {
  case Hearts = 'H';
  case Diamonds = 'D';
  case Clubs = 'C';
  case Spades = 'S';
}

A case that has a scalar equivalent is called a Backed Case, as it is “Backed” by a simpler value. An Enum that contains all Backed Cases is called a “Backed Enum.” A Backed Enum may contain only Backed Cases. A Pure Enum may contain only Pure Cases.

A Backed Enum may be backed by types of int or string, and a given enumeration supports only a single type at a time. (That is, no union of int|string.) If an enumeration is marked as having a scalar equivalent, then all cases must have a unique scalar equivalent defined explicitly. There are no auto-generated scalar equivalents (e.g., sequential integers). Value cases must be unique; two backed enum cases may not have the same scalar equivalent. (However, a constant may refer to a case, effectively creating an alias.)

Equivalent values must be literals or literal expressions. Constants and constant expressions are not supported. That is, 1+1 is allowed, but 1 + SOME_CONST is not. This is primarily due to implementation complexity. (See Future Scope below.)

Value Cases have an additional read-only property, value, which is the value specified in the definition.

print Suit::Clubs->value;
// Prints "C"

In order to enforce the value property as read-only, a variable cannot be assigned as a reference to it. That is, the following throws an error:

$suit = Suit::Clubs;
$ref = &$suit->value;
// Error: Cannot acquire reference to property Suit::$value

Backed enums implement an internal BackedEnum interface, which exposes two additional methods:

  • from(int|string): self will take a scalar and return the corresponding Enum Case. If one is not found, it will throw a ValueError. This is mainly useful in cases where the input scalar is trusted and a missing enum value should be considered an application-stopping error.
  • tryFrom(int|string): ?self will take a scalar and return the corresponding Enum Case. If one is not found, it will return null. This is mainly useful in cases where the input scalar is untrusted and the caller wants to implement their own error handling or default-value logic.

The “tryX” idiom is common in C# and Rust (albeit in somewhat different ways) to indicate that the result may be null/optional. It would be new to PHP, but not incompatible with any current conventions.

$record = get_stuff_from_database($id);
print $record['suit'];
 
$suit =  Suit::from($record['suit']);
// Invalid data throws a ValueError: "X" is not a valid scalar value for enum "Suit"
print $suit->value;
 
$suit = Suit::tryFrom('A') ?? Suit::Spades;
// Invalid data returns null, so Suit::Spades is used instead.
print $suit->value;

Manually defining a from() or tryFrom() method on a Backed Enum will result in a fatal error.

Enumerated Methods

Enums (both Pure Enums and Backed Enums) may contain methods, and may implement interfaces. If an Enum implements an interface, then any type check for that interface will also accept all cases of that Enum.

interface Colorful {
  public function color(): string;
}
 
enum Suit implements Colorful {
  case Hearts;
  case Diamonds;
  case Clubs;
  case Spades;
 
  // Fulfills the interface contract.
  public function color(): string {
    return match($this) {
      Suit::Hearts, Suit::Diamonds => 'Red',
      Suit::Clubs, Suit::Spaces => 'Black',
    };
  }
 
  // Not part of an interface; that's fine.
  public function shape(): string {
    return "Rectangle";
  }
}
 
function paint(Colorful $c) { ... }
 
paint(Suit::Clubs);  // Works
 
print Suit::Diamonds->shape(); // prints "rectangle"

In this example, all four instances of Suit have two methods, color() and shape(). As far as calling code and type checks are concerned, they behave exactly the same as any other object instance.

Inside a method, the $this variable is defined and refers to the Case instance.

Methods may be arbitrarily complex, but in practice will usually return a static value or match on $this to provide different results for different cases.

Note that in this case it would be a better data modeling practice to also define a SuitColor Enum Type with values Red and Black and return that instead. However, that would complicate this example.

The above hierarchy is logically similar to the following class structure (although this is not the actual code that runs):

interface Colorful {
  public function color(): string;
}
 
final class Suit implements IterableEnum, Colorful {
 
  public const Hearts = new self('Hearts');
  public const Diamonds = new self('Diamonds');
  public const Clubs = new self('Clubs');
  public const Spades = new self('Spades');
 
  private function __construct(public string $name) {}
 
  public function color(): string {
    return match($this) {
      Suit::Hearts, Suit::Diamonds => 'Red',
      Suit::Clubs, Suit::Spaces => 'Black',
    };
  }
 
  public function shape(): string {
    return "Rectangle";
  }
 
  public static function cases(): array {
    // See below.
  }
}

The case instance objects may be assigned to constants because they are created internally in the engine rather than in user-space. Additionally, the differentiating flag for each case is not actually a constructor parameter.

Methods may be public, private, or protected, although in practice private and protected are equivalent as inheritance is not allowed.

Enumeration static methods

Enumerations may also have static methods. The use for static methods on the enumeration itself is primarily for alternative constructors. E.g.:

enum Size {
  case Small;
  case Medium;
  case Large;
 
  public static function fromLength(int $cm) {
    return match(true) {
      $cm < 50 => static::Small,
      $cm < 100 => static::Medium,
      default => static::Large,
    };
  }
}

Static methods may be public, private, or protected, although in practice private and protected are equivalent as inheritance is not allowed.

Enumeration constants

Enumerations may include constants, which may be public, private, or protected, although in practice private and protected are equivalent as inheritance is not allowed.

An enum constant may refer to an enum case:

enum Size {
  case Small;
  case Medium;
  case Large;
 
  public const Huge = self::Large;
}

Traits

Enumerations may leverage traits, which will behave the same as on classes. The caveat is that traits used in an enum must not contain properties. They may only include methods and static methods. A trait with properties will result in a fatal error.

interface Colorful {
  public function color(): string;
}
 
trait Rectangle {
  public function shape(): string {
    return "Rectangle";
  }
}
 
enum Suit implements Colorful {
  use Rectangle;
 
  case Hearts;
  case Diamonds;
  case Clubs;
  case Spades;
 
  public function color(): string {
    return match($this) {
      Suit::Hearts, Suit::Diamonds => 'Red',
      Suit::Clubs, Suit::Spaces => 'Black',
    };
  }
}

Enum values in constant expressions

Because cases are represented as constants on the enum itself, they may be used as static values in most constant expressions: property defaults, static variable defaults, parameter defaults, global and class constant values. They may not be used in other enum case values due to implementation complexity. (That restriction may be lifted in the future, but since they can be used by constants on an enum it is not a significant limitation.)

However, implicit magic method calls such as ArrayAccess on enums are not allowed in static or constant definitions as we cannot absolutely guarantee that the resulting value is deterministic or that the method invocation is free of side effects. Function calls, method calls, and property access continue to be invalid operations in constant expressions.

In code:

// This is an entirely legal Enum definition.
enum Direction implements ArrayAccess {
  case Up;
  case Down;
 
  public method offsetGet($val) { ... }
  public method offsetExists($val) { ... }
  public method offsetSet($val) { throw new Exception(); }
  public method offsetUnset($val) { throw new Exception(); }
}
 
class Foo {
  // This is allowed.
  const Bar = Direction::Down;
 
  // This is disallowed, as it may not be deterministic.
  const Bar = Direction::Up['short'];
  // Fatal error: Cannot use [] on enums in constant expression
}
 
// This is entirely legal, because it's not a constant expression.
$x = Direction::Up['short'];

Comparison to objects

Although Enums are implemented using classes under the hood and share much of their semantics, some object-style functionality is forbidden. These either do not make sense in the scope of enums, their value is debatable (but could be re-added in the future), or their semantics are unclear.

Specifically, the following features of objects are not allowed on enumerations:

  • Constructors - Not relevant without data/state.
  • Destructors - Not relevant without data/state.
  • Class/Enum inheritance. - Enums are by design a closed list, which inheritance would violate. (Interfaces are allowed, but not parent classes.)
  • Enum/Case properties - Properties are a form of state, and enum cases are stateless singletons. Metadata about an enum or case can always be exposed via methods.
  • Dynamic properties - Avoid state. Plus, they're a bad idea on classes anyway.
  • Magic methods except for those specifically listed below - Most of the excluded ones involve state.
  • Cloning of enum cases. Enum cases must be single instances in order to behave predictably.

If you need any of that functionality, classes as they already exist are the superior option.

The following object functionality is available, and behaves just as it does on any other object:

  • Public, private, and protected methods.
  • Public, private, and protected static methods.
  • Public, private, and protected constants.
  • __call, __callStatic, and __invoke magic methods
  • __CLASS__ and __FUNCTION__ constants behave as normal

The ::class magic constant on an Enum type evaluates to the type name including any namespace, exactly the same as an object. The ::class magic constant on a Case instance also evaluates to the Enum type, as it is an instance of that type.

Additionally, enum cases may not be instantiated directly with new, nor with newInstanceWithoutConstructor in reflection. Both will result in an error.

$clovers = new Suit();
// Error: Cannot instantiate enum Suit
$mace = (new ReflectionClass(Suit::class))->newInstanceWithoutConstructor()
// Error: Cannot instantiate enum Suit

Value listing

Both Pure Enums and Backed Enums implement an internal interface named IterableEnum. IterableEnum includes a static method cases(). cases() returns a packed array of all defined Cases in the order of declaration.

Suit::cases();
// Produces: [Suit::Hearts, Suit::Diamonds, Suit::Clubs, Suit:Spades]

Manually defining a cases() method on an Enum will result in a fatal error.

Non-iterable Enums are not yet supported, but are expected to be part of the future ADT/Tagged Union RFC. (Those will not have a finite set of possible values.)

Note that IterableEnum does not extend Iterator, as the enum case instances themselves are not iterable; it's the Enum type that is iterable. An Enum could implement Iterator or IteratorAggregate if it so chose, however.

Serialization

Enumerations are serialized differently from objects. Specifically, they have a new serialization code, “E”, that specifies the name of the enum case. The deserialization routine is then able to use that to set a variable to the existing singleton value. That ensures that:

Suit::Hearts === unserialize(serialize(Suit::Hearts));
 
print serialize(Suit::Hearts);
// E:11:"Suit:Hearts";

On deserialization, if an enum and case cannot be found to match a serialized value a warning will be issued and false returned. (That is standard existing behavior for unserialize().)

If a Pure Enum is serialized to JSON, an error will be thrown. If a Backed Enum is serialized to JSON, it will be represented by its value scalar only, in the appropriate type. The behavior of both may be overridden by implementing JsonSerializable.

For print_r(), the output of an enum case has been modified to not confuse it with objects, although it is still similar to objects.

enum Foo {
  case Bar;
}
 
enum Baz: int {
  case Beep = 5;
}
 
print_r(Foo::Bar);
print_r(Baz::Beep);
Foo Enum (
  [name] => Bar
)
Baz Enum:int {
  [name] => Beep
  [value] => 5
}

Attributes

Enums and cases may have attributes attached to them, like any other language construct. The TARGET_CLASS target filter will include Enums themselves. The TARGET_CLASS_CONST target filter will include Enum Cases.

No engine-defined attributes are included. User-defined attributes can do whatever.

Match expressions

match expressions offer a natural and convenient way to branch logic depending on the enum value. Since every instance of an Enum is a singleton, it will always pass an identity check. Therefore:

$val = Suit::Diamonds;
 
$str = match ($val) {
  Suit::Spades => "The swords of a soldier",
  Suit::Clubs => "Weapons of war",
  Suit::Diamonds => "Money for this art",
  default => "The shape of my heart",
}

This usage requires no modification of match. It is a natural implication of the current functionality.

SplObjectStorage and WeakMaps

As objects, Enum cases cannot be used as keys in an array. However, they can be used as keys in a SplObjectStorage or WeakMap. Because they are singletons they never get garbage collected, and thus will never be removed from a WeakMap, making these two storage mechanisms effectively equivalent.

This usage requires no modification to SplObjectStorage or WeakMap. It is a natural implication of the current functionality.

Reflection

Enums are reflectable using a ReflectionEnum class, which extends ReflectionClass. Their cases are reflectable using ReflectionEnumPureCase and ReflectionEnumBackedCase, which extend ReflectionClassConstant. They are defined as follows:

class ReflectionEnum extends ReflectionClass {
 
  // Returns true if there is a Case defined with that name.  
  // For instance, ''$r->hasCase('Hearts')'' returns true.
  public function hasCase(string $name): bool {}
 
  // Returns an array of ReflectionEnumPureCase|ReflectionEnumBackedCase objects.
  public function getCases(): array {}
 
  // Returns a single reflection object for the corresponding case.
  // If not found, throws, ReflectionException.
  public function getCase(string $name): ReflectionEnumPureCase|ReflectionEnumBackedCase
 
  // True if this enum has a backing type, false otherwise.
  public function isBacked(): bool {}
 
  // Returns the type of the backing values of this enum, if any.
  // On a Pure Enum, returns null.
  public getBackingType(): ?ReflectionType {}
}
 
class ReflectionEnumPureCase extends ReflectionClassConstant {
 
  // Pre-existing. This will return the corresponding enum instance for this case.
  public function getValue() {}
}
 
class ReflectionEnumBackedCase extends ReflectionEnumPureCase {
 
  // Returns the scalar equivalent defind for the case.
  public function getBackingValue(): int|string {}
}

Additionally, a new function enum_exists(string $enum, bool $autoload = true): bool returns true if the value passed is the name of an Enum class.

Examples

Below are a few examples of Enums in action.

Basic limited values

enum SortOrder {
  case ASC;
  case DESC;
}
 
function query($fields, $filter, SortOrder $order = SortOrder::ASC) { ... }

The query() function can now proceed safe in the knowledge that $order is guaranteed to be either SortOrder::ASC or SortOrder::DESC. Any other value would have resulted in a TypeError, so no further error checking or testing is needed.

Advanced Exclusive values

enum UserStatus: string {
  case Pending = 'P';
  case Active = 'A';
  case Suspended = 'S';
  case CanceledByUser = 'C';
 
  public function label(): string {
    return match($this) {
      static::Pending => 'Pending',
      static::Active => 'Active',
      static::Suspended => 'Suspended',
      static::CanceledByUser => 'Canceled by user',
    };
  }
}

In this example, a user's status may be one of, and exclusively, UserStatus::Pending, UserStatus::Active, UserStatus::Suspended, or UserStatus::CanceledByUser. A function can type a parameter against UserStatus and then only accept those four values, period.

All four values have a label() method, which returns a human-readable string. That string is independent of the “machine name” scalar equivalent string, which can be used in, for example, a database field or an HTML select box.

foreach (UserStatus::cases() as $case) {
  printf('<option value="%s">%s</option>\n', $case->value, $case->label());
}

New interfaces

As noted above, this RFC defines two additional internal interfaces. These interfaces are available to make it possible for user code to determine if a given object is an enumeration, and if so what type. User-defined classes may not implement or extend these interfaces directly.

interface IterableEnum {
  public static function cases(): array;
}
 
interface BackedEnum {
  public string $value;
 
  public static function from(int|string $scalar): static;
  public static function tryFrom(int|string $scalar): ?static;
}

Backward Incompatible Changes

“enum” becomes a language keyword, with the usual potential for naming conflicts with existing global constants and class/interface/trait names.

The global scoped internal interfaces IterableEnum, and BackedEnum are defined.

The global function enum_exists is defined.

Future Scope

Grouped syntax

It would be possible, in the simple case, to allow multiple cases to be defined together, like so:

enum Suit {
  case Hearts, Diamonds, Clubs, Spades;
}

However, that may cause syntactic issues with the planned addition of tagged unions, which may or may not end up including per-case methods. Until that future extension is settled, we opted to skip this syntactic optimization. Grouped syntaxes have a somewhat controversial history anyway (they're not universally loved, and often unused entirely in many situations), and it's easy enough to add later if needed, so we have omitted that shorthand at this time. Once the dust settles, they may get added in the future.

Enums as array keys

Because they are objects, enum cases may not be used as keys in an associative array. It may be possible to support that in the future, but that is not covered at this time. For now, SplObjectStorage and WeakMaps are good enough.

Enum Sets

An enum set is the logical OR of two other cases. For instance, $red = Suit::Hearts | Suit::Diamonds. Those are not supported at this time.

Adding support for enum sets is a possibility for a future RFC, should an appropriate implementation be determined.

Auto-scalar conversion

Whether or not a Backed Enum can be viewed as “close enough” to its corresponding scalar value is debatable, and of debatable value. For instance, is a string-backed enum Stringable? Should an int type check accept an int-backed enum value? Should a string-backed enum work in a print statement? What about up-converting a scalar to its corresponding enum automatically?

The optimal behavior here, if any, will likely not become apparent until enums see widespread use. We have therefore opted to omit all auto-conversion at this time. If clear and compelling use cases for auto-conversion appear in the future, later PHP versions can re-introduce such auto-conversion in a more targeted, well-informed way.

Magic read-methods

The __get and __isset magic methods are likely safe, as they cannot manipulate state (or at least no more than any other method). They have been omitted at this time largely to avoid BC breaks in future planned extensions of enumerations, such as Tagged Unions/ADTs. (See the Meta RFC linked above.) It is possible that the introduction of associated values will require internal changes that result in additional property names becoming reserved. For that reason, we have for now omitted those potentially conflicting magic methods. In practice, there is no functionality they offer that couldn't be implemented using methods.

If when the dust settles it appears that __get would not cause a conflict after all, it may be permitted at a later date.

Constant-reference expression values

Currently, a Backed Enum value may only be a constant literal or an arithmetic expression involving only constant literals. They cannot reference other constant symbols, such as const constants or other Enum cases. That is not out of a lack of desire but simply because it turns out to be quite difficult to do. It's not a blocker for the remainder of the functionality listed here. If we or someone else can figure out how to make it work in the future it would be a good addition, but for now it is infeasible.

Voting

This is a simple yes/no vote to include Enumerations. 2/3 required to pass.

References

rfc/enumerations.1612194944.txt.gz · Last modified: 2021/02/01 15:55 by crell