rfc:enumerations_and_adts

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:enumerations_and_adts [2020/09/19 21:57] crellrfc:enumerations_and_adts [2020/12/04 23:26] (current) crell
Line 1: Line 1:
 ====== PHP RFC: Enumerations and Algebraic Data Types ====== ====== PHP RFC: Enumerations and Algebraic Data Types ======
-  * Version: 0.9+
   * Date: 2020-09-19   * Date: 2020-09-19
   * Author: Larry Garfield (larry@garfieldtech.com), Ilija Tovilo (tovilo.ilija@gmail.com)   * Author: Larry Garfield (larry@garfieldtech.com), Ilija Tovilo (tovilo.ilija@gmail.com)
   * Status: Draft   * Status: Draft
-  * First Published athttp://wiki.php.net/rfc/your_rfc_name+  * Target VersionPHP 8.1 
 +  * ImplementationTBD 
 + 
 +This RFC has been supplanted by [[rfc:adts|PHP RFC: Algebraic Data Types]]. 
 + 
 +Please ignore this page.
  
  
 ===== Introduction ===== ===== Introduction =====
  
-This RFC introduces Enumerations to PHP.  Specifically, it introduces what are variously called "Algebraic Data Types""tagged unions", or simply "enumerationsdepending on the language.  This capability offers greatly expanded support for data modeling, custom type definitions, and monad-style behavior.  Enums enable the modeling technique of "make invalid states unrepresentable", which leads to more robust code with less need for exhaustive testing.+This RFC introduces Enumerations to PHP. Specifically, it introduces what are variously called Algebraic Data Typestagged unions, or simply enumerations” depending on the language. This capability offers greatly expanded support for data modeling, custom type definitions, and monad-style behavior. Enums enable the modeling technique of make invalid states unrepresentable,” which leads to more robust code with less need for exhaustive testing.
  
-Many languages have support for enumerations of some variety.  A [[https://github.com/Crell/enum-comparison|survey we conducted of various languages]] found that they could be categorized into three general groups: Fancy Constants, Fancy Objects, and full Algebraic Data Types.  For this implementation we opted to implement full Algebraic Data Types, as that offers the most robust set of functionality while also degrading gracefully to simpler use cases.  (Or it progressively enhances to more complex use cases, depending on your point of view.)+Many languages have support for enumerations of some variety. A [[https://github.com/Crell/enum-comparison|survey we conducted of various languages]] found that they could be categorized into three general groups: Fancy Constants, Fancy Objects, and full Algebraic Data Types. For this implementation we opted to implement full Algebraic Data Types, as that offers the most robust set of functionality while also degrading gracefully to simpler use cases. (Or it progressively enhances to more complex use cases, depending on your point of view.)
  
-The specific implementation here draws inspiration primarily from Swift, Rust, and Kotlin, but is not (nor is it intended as) a perfect 1:1 port of any of them.+The specific implementation here draws inspiration primarily from Swift, Rust, and Kotlin, but is not (nor is it intended as) a perfect 1:1 port of any of them. Enumerations take many forms depending on the language, and we opted to implement the most robust combination of functionality feasible. Every piece of functionality described here exists in a similar form in at least one, usually several, other enumeration-supporting languages. It is implemented as a single RFC rather than a series of RFCs as the functionality all inter-relates, and if full ADTs are the goal (as we believe they should be) then it’s easier to implement them at once rather than to dribble-in functionality in potentially disjoint pieces.
  
-The most popular case of enumerations is ''%%boolean%%'', which is an enumerated type with legal values ''%%true%%'' and ''%%false%%'' This RFC allows developers to define their own arbitrarily robust enumerations.+The most popular case of enumerations is ''%%boolean%%'', which is an enumerated type with legal values ''%%true%%'' and ''%%false%%''. This RFC allows developers to define their own arbitrarily robust enumerations.
  
 ===== Proposal ===== ===== Proposal =====
Line 21: Line 26:
 ==== Basic enumerations ==== ==== Basic enumerations ====
  
-This RFC introduces a new language construct, ''%%enum%%'' Enums are similar to classes, and share the same namespaces as classes, interfaces, and traits.  They are also autoloadable the same way.  An Enum defines a new type, which has a fixed, limited number of possible legal values.+This RFC introduces a new language construct, ''%%enum%%''. Enums are similar to classes, and share the same namespaces as classes, interfaces, and traits. They are also autoloadable the same way. An Enum defines a new type, which has a fixed, limited number of possible legal values.
  
 <code php> <code php>
Line 31: Line 36:
 } }
 </code> </code>
- +This declaration creates a new enumerated type named ''%%Suit%%'', which has four and only four legal values: ''%%Suit::Hearts%%'', ''%%Suit::Diamonds%%'', ''%%Suit::Clubs%%'', and ''%%Suit::Spades%%''. Variables may be assigned to one of those legal values. A function may be type checked against an enumerated type, in which case only values of that type may be passed.
-This declaration creates a new enumerated type named %%Suit%%, which has four and only four legal values: ''%%Suit::Hearts%%'', ''%%Suit::Diamonds%%'', ''%%Suit::Clubs%%'', and ''%%Suit::Spades%%'' Variables may be assigned to one of those legal values.  A function may be type checked against an enumerated type, in which case only values of that type may be passed.+
  
 <code php> <code php>
Line 39: Line 43:
 function pick_a_card(Suit $suit) { ... } function pick_a_card(Suit $suit) { ... }
  
-pick_a_card($val);       // OK +pick_a_card($val);        // OK 
-pick_a_card(Suit:Clubs); // OK +pick_a_card(Suit::Clubs); // OK 
-pick_a_card('Spades');   // throws TypeError+pick_a_card('Spades');    // throws TypeError
 </code> </code>
- +In the simple case, multiple cases may be defined on a single line. The following is semantically equivalent to the definition above.
-In the simple case, multiple cases may be defined on a single line.  The following is semantically equivalent to the definition above.+
  
 <code php> <code php>
Line 51: Line 54:
 } }
 </code> </code>
- 
 An Enumeration may have one or more ''%%case%%'' definitions, with no maximum, although at least one is required. An Enumeration may have one or more ''%%case%%'' definitions, with no maximum, although at least one is required.
  
-Cases are not backed by a primitive value.  That is, ''%%Suit::Hearts%%'' is not equal to 0.  Instead, each case is backed by a singleton object of that name.  That means that:+Cases are not backed by a primitive value. That is, ''%%Suit::Hearts%%'' is not equal to 0. Instead, each case is backed by a singleton object of that name. That means that:
  
 <code php> <code php>
Line 66: Line 68:
 $a instanceof Suit::Spades; // true $a instanceof Suit::Spades; // true
 </code> </code>
- 
-[Note to Iliya: The last line there is the tricksy one we haven't figured out.] 
- 
-Each Case class includes a default ''%%__toString()%%'' implementation that returns the name of the Case as a string, without the Enum type.  That is: 
- 
-<code php> 
-print Suit::Clubs;  
-// prints "Clubs", not "Suit::Clubs". 
-</code> 
- 
-That function may be overridden if desired.  (See below.) 
- 
-[To Ilija: Do we want this part or not?  I only thought of it while writing this.  I don't know if it's good or bad.] 
- 
-Enumerated type Cases may be used in union type definitions.  For example: 
- 
-<code php> 
-function gimmie_red_card(Suit::Hearts|Suit::Diamonds $card) { ... } 
-</code> 
- 
 ==== Enumerated Case Methods ==== ==== Enumerated Case Methods ====
  
-As both Enum Types and Enum Cases are implemented using classes, they may take methods.  The Enum Type may also implement an interface, which all Cases must then fulfill, directly or indirectly.+As both Enum Types and Enum Cases are implemented using classes, they may take methods. The Enum Type may also implement an interface, which all Cases must then fulfill, directly or indirectly.
  
 <code php> <code php>
Line 129: Line 111:
 paint(Suit::Clubs);  // Works paint(Suit::Clubs);  // Works
 </code> </code>
- +In this example, all four Enum cases will have a method ''%%shape%%'' inherited from ''%%Suit%%'', and will all have their own method ''%%color%%'', which they implement themselves. Case methods may be arbitrarily complex, and function the same as any other method. Additionally, magic methods such as ''%%__toString%%'' and friends may also be implemented and will behave like a normal method on an object. The one exception is ''%%__construct%%'', which it not permitted. (See below.)
-In this example, all four Enum cases will have a method ''%%shape%%'' inherited from ''%%Suit%%'', and will all have their own method ''%%color%%'', which they implement themselves.  Case methods may be arbitrarily complex, and function the same as any other method.  Additionally, magic methods such as ''%%__toString%%'' and friends may also be implemented and will behave like a normal method on an object.  The one exception is ''%%__construct%%'', which it not permitted.  (See below.)+
  
 Enum Cases may not implement interfaces themselves. Enum Cases may not implement interfaces themselves.
  
-Static methods on Cases are not supported.  Static methods on the Enum Type are supported.+Static methods on Cases are not supported. Static methods on the Enum Type are supported.
  
-[Ilija: We haven't discussed static methods at all.  This is what makes the most sense to me at the moment but we can easily revisit this.  I'm flexible.)+(Ilija: We havent discussed static methods at all. This is what makes the most sense to me at the moment but we can easily revisit this. Im flexible.)
  
-Inside a method on a Case, The ''%%$this%%'' variable is defined and refers to the Case instance.  (That is mainly useful with Associated Values.  See below.)+Inside a method on a Case, The ''%%$this%%'' variable is defined and refers to the Case instance. (That is mainly useful with Associated Values. See below.)
  
-(Note that in this case it would be a better data modeling practice to also define a ''%%SuitColor%%'' Enum Type with values Red and Black and return that instead.  However, that would complicate this example.)+(Note that in this case it would be a better data modeling practice to also define a ''%%SuitColor%%'' Enum Type with values Red and Black and return that instead. However, that would complicate this example.)
  
 The above hierarchy is logically similar to the following class structure: The above hierarchy is logically similar to the following class structure:
Line 179: Line 160:
 } }
 </code> </code>
 +==== Value listing ====
  
 +The enumeration itself has an automatically generated static method ''%%values()%%''. ''%%values()%%'' returns a packed array of all defined Cases in lexical order.
 +
 +<code php>
 +Suit::values();
 +// Produces: [Suit::Hearts, Suit::Diamonds, Suit::Clubs, Suit:Spades]
 +</code>
 +==== Primitive-Equivalent Cases ====
 +
 +By default, Enumerated Cases have no primitive equivalent. They are simply singleton objects. However, there are ample cases where an Enumerated Case needs to be able to round-trip to a database or similar datastore, so having a built-in primitive (and thus trivially serializable) equivalent defined intrinsically is useful.
 +
 +To define a primitive equivalent for an Enumeration, the syntax is as follows:
 +
 +<code php>
 +enum Suit: string {
 +  case Hearts = 'H';
 +  case Diamonds = 'D';
 +  case Clubs = 'C';
 +  case Spades = 'S';
 +}
 +</code>
 +Primitive backing types of ''%%int%%'', ''%%string%%'', or ''%%float%%'' are supported, and a given enumeration supports only a single type at a time. (That is, no union of ''%%int|string%%''.) If an enumeration is marked as having a primitive equivalent, then all cases must have a unique primitive equivalent defined.
 +
 +A Primitive-Equivalent Case will automatically down-cast to its primitive when used in a primitive context. For example, when used with ''%%print%%''.
 +
 +<code php>
 +print Suit::Clubs;
 +// prints "C"
 +print "I hope I draw a " . Suit::Spades;
 +// prints "I hope I draw a S".
 +</code>
 +Passing a Primitive Case to a primitive-typed parameter or return will produce the primitive value in weak-typing mode, and produce a ''%%TypeError%%'' in strict-typing mode.
 +
 +A Primitive-Backed enumeration also has a static method ''%%from()%%'' that is automatically generated. The ''%%from()%%'' method will up-cast from a primitive to its corresponding Enumerated Case. Invalid primitives with no matching Case will throw a ''%%ValueError%%''.
 +
 +<code php>
 +$record = get_stuff_from_database($id);
 +print $record['suit'];
 +// Prints "H"
 +$suit = Suit::from($record['suit']);
 +$suit === Suit::Hearts; // True
 +</code>
 +A Primitive-Backed enumeration additionally has a method ''%%list()%%'' that returns an associated array of cases, in lexical order, keyed by their primitive equivalent.
 +
 +<code php>
 +$list = Suit::list();
 +$list === [
 +'H' => Suit::Hearts,
 +'D' => Suit::Diamonds,
 +'C' => Suit::Clubs,
 +'S' => Suit::Spades,
 +]; // true
 +</code>
 +Primitive-backed Cases are not allowed to define a ''%%__toString()%%'' method, as that would create confusion with the primitive value itself. However, primitive-backed Cases are allowed to have other methods just like any other enum:
 +
 +<code php>
 +enum Suit: string {
 +  case Hearts = 'H';
 +  case Diamonds = 'D';
 +  case Clubs = 'C';
 +  case Spades = 'S' {
 +    public function color(): string { return 'Black'; }
 +  }
 +
 +  public function color(): string
 +  {
 +    // ...
 +  }
 +}
 +</code>
 ==== Associated Values ==== ==== Associated Values ====
  
-Enumerated Cases may optionally include associated values.  An associated value is one that is associated with an instance of a Case.  If a Case has associated values, it will //**not**// be implemented as a singleton.  Each instance of the Case will then be its own object instance, so will not === another instance.+Enumerated Cases may optionally include associated values. An associated value is one that is associated with an instance of a Case. If a Case has associated values, it will **not** be implemented as a singleton. Each instance of the Case will then be its own object instance, so will not === another instance
 + 
 +Associated values are mutually exclusive with Primitive-Equivalent Cases.
  
 Associated values are defined using constructor property promotion. Associated values are defined using constructor property promotion.
Line 200: Line 253:
 $my_walk === $next_walk; // FALSE! $my_walk === $next_walk; // FALSE!
 </code> </code>
 +Enum Cases may not implement a full constructor. However, they may list parameters that will be auto-promoted to properties using constructor promotion. The visibility modifier is required. Cases may not implement properties other than promoted properties.
  
-Enum Cases may not implement full constructor Howeverthey may list parameters that will be auto-promoted to properties using constructor promotion.  The visibility modifier is required.  Cases may not implement properties other than promoted properties.+An Enum Case that supports Associated Values is called an Associable Case. An Enum Case that does not have Associated Values is called Unit CaseAn Enumerated Type may consist of any combination of Associable and Unit Casesbut no Primitive-Equivalent Cases.
  
-An Enum Case that supports Associated Values is called an Associable Case An Enum Case that does not have Associated Values is called Unit Case.  An Enumerated Type may consist of any combination of Associable and Unit Cases.+The Enum Type itself may not define associated valuesOnly a Case may do so.
  
-The Enum Type itself may not define associated values.  Only Case may do so.+Associated values are always read-only, both internally to the class and externallyTherefore, making them public does not pose risk of 3rd party code modifying them inadvertently. They may, however, have attributes associated with them like any other property.
  
-Associated values are always read-onlyboth internally to the class and externally.  Therefore, making them public does not pose risk of 3rd party code modifying them inadvertently They mayhowever, have attributes associated with them like any other property.+On an Associable Case enumeration, the ''%%values()%%'' method is not available and will throw ''%%TypeError%%''Since Associable Cases are technically unboundedthe method has no logical sense.
  
 Use cases that would require more complete class functionality (arbitrary properties, custom constructors, mutable properties, etc.) should be implemented using traditional classes instead. Use cases that would require more complete class functionality (arbitrary properties, custom constructors, mutable properties, etc.) should be implemented using traditional classes instead.
Line 213: Line 267:
 ==== Match expressions ==== ==== Match expressions ====
  
-When dealing with Unit Cases, ''%%match%%'' expressions offer a natural and convenient way to branch logic depending on the enum value.  Since every instance of a Unit Case is a singleton, it will always pass an identity check.  Therefore:+When dealing with Unit Cases, ''%%match%%'' expressions offer a natural and convenient way to branch logic depending on the enum value. Since every instance of a Unit Case is a singleton, it will always pass an identity check. Therefore:
  
 <code php> <code php>
Line 225: Line 279:
 } }
 </code> </code>
- +That is not true when dealing with Associable Cases. Therefore, an alternate version of ''%%match%%'' is included. When ''%%match%%'' is suffixed with ''%%type%%'', it will perform an ''%%instanceof%%'' check instead of an identity check.
-That is not true when dealing with Associable Cases.  Therefore, an alternate version of ''%%match%%'' is included.  When ''%%match%%'' is suffixed with ''%%type%%'', it will perform an ''%%instanceof%%'' check instead of an identity check.+
  
 <code php> <code php>
Line 236: Line 289:
 } }
 </code> </code>
- +(Ilija, your thoughts on this?)
-[Ilija, your thoughts on this?]+
  
 ==== Examples ==== ==== Examples ====
Line 255: Line 307:
     }     }
   };   };
-    +
   // This is an Associable Case.   // This is an Associable Case.
   case Some(private mixed $value) {   case Some(private mixed $value) {
Line 269: Line 321:
   public function value(): mixed {   public function value(): mixed {
     // Still need to sort out match() for this to make sense.     // Still need to sort out match() for this to make sense.
-    return match enum ($this) {+    return match type ($this) {
         Optional::None => throw new Exception(),         Optional::None => throw new Exception(),
         Optional::Some => $this->val,         Optional::Some => $this->val,
Line 276: Line 328:
 } }
 </code> </code>
- +=== State machine ===
-=== State machine ====+
  
 Enums make it straightforward to express finite state machines. Enums make it straightforward to express finite state machines.
  
 +<code php>
 enum OvenStatus { enum OvenStatus {
  
Line 296: Line 348:
   };   };
 } }
- +</code> 
-In this example, the oven can be in one of three states (Off, On, and Idling, meaning the flame is not on but it will turn back on when it detects it needs to).  However, it can never go from Off to Idle or Idle to Off; it must go through On state first.  That means no tests need to be written or code paths defined for going from Off to Idle, because it's literally impossible to even describe that state.+In this example, the oven can be in one of three states (Off, On, and Idling, meaning the flame is not onbut it will turn back on when it detects it needs to). However, it can never go from Off to Idle or Idle to Off; it must go through On state first. That means no tests need to be written or code paths defined for going from Off to Idle, because its literally impossible to even describe that state.
  
 (Additional methods are of course likely in a real implementation.) (Additional methods are of course likely in a real implementation.)
Line 315: Line 367:
 $p->z = 9;   // throws an Error of some kind, TBD. $p->z = 9;   // throws an Error of some kind, TBD.
 </code> </code>
- 
 This is not a specific design goal of the implementation, but a potentially useful side effect. This is not a specific design goal of the implementation, but a potentially useful side effect.
- 
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
  
-"enum" becomes a language keyword, with the usual potential for naming conflicts with existing global constants+enum” and “type” become language keywords, with the usual potential for naming conflicts with existing global constants.
- +
- +
-===== Proposed PHP Version(s) ===== +
- +
-Next PHP 8.x. +
- +
- +
-===== RFC Impact ===== +
- +
-===== Open Issues ===== +
- +
-We're still not sure what to do with ''%%match%%'' The above handling may well change. +
- +
-Details of how the object-ness of Enum Cases get exposed are still unclear.  That will probably get determined by implementation necessity. +
- +
-===== Unaffected PHP Functionality ===== +
- +
-No existing functionality should be affected, other than the new "enum" keyword.+
  
 ===== Future Scope ===== ===== Future Scope =====
- 
-==== Case enumeration ==== 
- 
-In some languages, it is possible to enumerate all possible values of an Enum Type.  For now that functionality is not implemented, but it may be in the future.  It would be limited to the case where the Enum Type contains only Unit Values.  (That limitation exists in other languages as well.) 
  
 ==== Pattern matching ==== ==== Pattern matching ====
  
-Most languages that have an equivalent of associated values also support pattern matching as a way to extract values from the Enum Case.  Pattern matching allows for a single ''%%match%%'' branch to match on, for example, "any Foo::Bar instance where one of its two parameters is the number 5, and the other is extracted out into a variable to be used on the right."  While a powerful feature in its own right, we believe that at this time it is not an MVP for useful Enumerations.  It also has a large number of potential gotchas and complications all on its own, making it worthy of its own stand-alone RFC and development effort+Most languages that have an equivalent of associated values also support pattern matching as a way to extract values from the Enum Case. Pattern matching allows for a single ''%%match%%'' branch to match on, for example, any Foo::Bar instance where one of its two parameters is the number 5, and the other is extracted out into a variable to be used on the right.” While a powerful feature in its own right, we believe that at this time it is not an MVP for useful Enumerations. It also has a large number of potential gotchas and complications all on its own, making it worthy of its own stand-alone RFC and development effort.
- +
-For now, matching against the Enum Case and accessing properties directly (something not supported in most ADT-supporting languages) is "good enough" and has mostly self-evident semantics based on existing PHP patterns. +
- +
-===== Proposed Voting Choices ===== +
- +
-This is a simple yes/no vote to include Enumerations.  2/3 required to pass. +
- +
-===== Patches and Tests ===== +
-Links to any external patches and tests go here. +
- +
-If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.+
  
-Make it clear if the patch is intended to be the final patch, or is just a prototype.+For now, matching against the Enum Case and accessing properties directly (something not supported in most ADT-supporting languages) is “good enough” and has mostly self-evident semantics based on existing PHP patterns.
  
-For changes affecting the core language, you should also provide a patch for the language specification.+===== Voting =====
  
-===== Implementation ===== +This is a simple yes/no vote to include Enumerations. 2/3 required to pass.
-After the project is implemented, this section should contain  +
-  - the version(s) it was merged into +
-  - link to the git commit(s) +
-  - a link to the PHP manual entry for the feature +
-  - a link to the language specification section (if any)+
  
 ===== References ===== ===== References =====
  
-[[https://github.com/Crell/enum-comparison|Survey of enumerations supported by various languages, conducted by Larry]]+[Survey of enumerations supported by various languages, conducted by Larry](https://github.com/Crell/enum-comparison)
  
-===== Rejected Features ===== 
-Keep this updated with features that were discussed on the mail lists. 
rfc/enumerations_and_adts.1600552651.txt.gz · Last modified: 2020/09/19 21:57 by crell