====== PHP RFC: EnumSet ====== * Version: 0.9 * Date: 2021-03-14 * Author: Bob Weinand, bobwei9@hotmail.com * Status: Draft * First Published at: http://wiki.php.net/rfc/enumset ===== Introduction ===== PHP needs first-class support for combined enum values. The current iteration of enums only allows for an unidimensional choice of enum values. It lacks the possibility to combine enums natively. Nowadays we usually are defining some integer constants with powers of two and using bit operations on them, whenever we need a set of flags. This, sadly, is neither type-safe nor trivial to see the complete set of accepted values nor easily debuggable (dump that and see something like ''int(81926)''). While it certainly is possible to emulate aggregations of enum values with an array, this currently is neither ensuring type safety nor uniqueness nor trivial manipulation. This RFC is aiming at making enums usable as a well-typed, easy to manipulate and debuggable unique set of multiple well-defined finite choices. ===== Proposal ===== Introducing a new ''class EnumSet implements Traversable''. It is an immutable ordered collection of zero or more unique instances of its generic ''UnitEnum''. ==== Overall Semantics ==== * ''EnumSet'' overloads three operators: ''&'', ''|'' and ''~''. * It is implemented as a generic class ''EnumSet''. * It also is parent class to all ''UnitEnum''. * Its constructor creates an ''EnumSet'' with all passed enum values. * Doing an (explicit) ''(array)'' cast on the ''EnumSet'' instance returns all contained values. * Doing an (explicit) ''(bool)'' cast on the ''EnumSet'' instance returns ''true'', unless it is empty. Then it returns ''false''. * The ''cases'' static method is promoted to ''EnumSet''. It also returns an ''EnumSet'' containing all enum values instead of an array. * Two ''EnumSet'' instances are only weakly equal (''=='') if the contents are the same. The order is ignored for equivalence checking. //Note:// All examples will assume the following enum: enum Perm { case Read; case Write; case Exec; } ==== Parent class to enums ==== The definition of any enum class which implements UnitEnum, thus not including possible future ADTs, will be changed to ''final class MyEnum extends EnumSet implements UnitEnum''. This allows an ''EnumSet'' consisting of a single value to be trivially identical to that value. Given this, we can: * trivially pass a single enum value to a parameter expecting an ''EnumSet'' * compare the result of a set operation on ''EnumSet'' to an enum value without extra hops * combine simple enum values together and have an ''EnumSet'' without applying special semantics to the individual enum classes ==== Constructor ==== The constructor allows to convert an array of enum values back to an ''EnumSet''. The order of the values in the array is preserved. Later duplicate values are ignored. The keys of the array entries are ignored. The constructor signature is ''public function __construct(array $enums = [])''. More precisely, the array must only consist of enum instances this ''EnumSet'' can contain. I.e. if we had proper array generics, the first parameter would be ''array''. Constructing an ''EnumSet'' with parameters not being an instance of ''MyEnum'' throws a ''TypeError''. Using the constructor via ''new EnumSet'' is the recommended way to get an empty ''EnumSet'' for a given enum class. ==== Set operations ==== There are three operators overloaded to allow for all necessary fundamental set operations: ''&'', ''|'' and ''~''. * binary //or//: ''$enumSetA | $enumSetB'' The new set will contain all elements contained in both operands. The order is determined by first concatenating both ''EnumSet'' instances, then removing later duplicates. // every UnitEnum also extends EnumSet, thus we essentially combine two EnumSet instances with each representing a single value $rx = Perm::Read | Perm::Exec; $rw = Perm::Read | Perm::Write; var_dump($rx | $rw); // Perm::Read | Perm::Exec | Perm::Write * binary //and//: ''$enumSetA & $enumSetB'' The new set will contain all elements contained in both operands in the order they are appearing in the first operand. $rwx = Perm::Read | Perm::Exec | Perm::Write; var_dump($rw & (Perm::Write | Perm::Exec)); // Perm::Exec | Perm::Write * unary //inverse//: ''~$enumSet'' The new set will contain all elements of the enum, with preserved order, except those present in its operand. $rw = Perm::Read | Perm::Write; var_dump(~$rw); // Perm::Exec; var_dump(~Perm::Write); // Perm::Read | Perm::Exec Naturally, these behaviours also extend to the assign-ops ''|='' and ''&=''. Doing a binary operation on incompatible ''EnumSet'' instances will throw a ''TypeError''. ==== Bool cast ==== It will be a common use case to check whether an ''EnumSet'' is empty, in particular when checking whether a specific enum value is contained in an ''EnumSet''. To make this check trivial, the ''EnumSet'' class can be cast to bool: * ''false'' if empty * ''true'' otherwise $rw = Perm::Read | Perm::Write; if ($rw & Perm::Read) { echo "We can read!"; } ==== Array cast, equivalence and Traversable ==== ''EnumSet'' implements ''Traversable''. The order of iteration is deterministic and depends on the order values were added to the ''EnumSet''. The keys of this iterator are continuous and starting at zero. ''EnumSet'' instances can be cast to array like any other object. This is equivalent to applying ''iterator_to_array()'' here. This is not special or different to ''(array)'' casts of other objects. Conversely, ''EnumSet'' being so close to arrays in behavior, the weak comparison (''=='') semantics of ''EnumSet'' are also identical to those of arrays: Two ''EnumSet'' instances are weakly equal if the contents are the same, regardless of the ordering. $rx = Perm::Read | Perm::Exec; $array = []; foreach ($rx as $key => $value) { var_dump($key); // int(0), then int(1) var_dump($value); // Perm::Read, then Perm::Exec $array[$key] = $value; } var_dump($array === (array) $rx); // bool(true) var_dump(new EnumSet((array) $rx) == $rx); // bool(true) var_dump(Perm::Read | Perm::Exec == Perm::Exec | Perm::Read); // bool(true) ==== cases method ==== While ''~(new EnumSet)'' is a valid way to retrieve the full set of enum values, there should be a proper way to do so. Luckily there already is a function returning all the enums: ''cases''. We just need to make it return ''EnumSet'' instead. Its signature thus is ''public static function cases(): EnumSet''. The order of the returned ''EnumSet'' will be the order of definition of the individual enum cases. The old behavior of getting an array from it is trivially restored by applying an array cast: ''(array) MyEnum::cases()''. This explicit casting should usually be unneeded as ''EnumSet'' anyway implements ''Traversable'' for easy looping. ==== Generic class ==== ''EnumSet'' is implemented as a generic class, so that we can check against an ''EnumSet'' type. It will internally be implemented as a monomorphized generic class. As this is the first implementation of a generic class, this entails some further semantics: * ''new EnumSet'' is invalid and will throw an ''Error'' (as opposed to ''new EnumSet'') * ''EnumSet instanceof EnumSet'' is true * This also implies that there is a real (or virtual) parent class to the generic class having its types applied. * This in particular means that both ''new ReflectionClass("EnumSet")'' and ''new ReflectionClass("EnumSet")'' are valid. * The ''new ReflectionClass("EnumSet")'' instance will use the broadest type possible (in accordance with LSP). Concretely the ''cases'' method will have a return type of ''EnumSet''. * ''EnumSet'' is internally implemented as class alias of ''EnumSet''. The proposed implementation being monomorphized should not prevent us from switching to a truly generic implementation in future, the external behaviour of ''EnumSet'' is invariant to this. ===== Examples ===== More examples ... ==== Serializing and unserializing file permissions ==== enum FilePerm { case OTHER_EXEC = 0001; case OTHER_WRITE = 0002; case OTHER_READ = 0004; case GROUP_EXEC = 0010; case GROUP_WRITE = 0020; case GROUP_READ = 0040; case OWNER_EXEC = 0100; case OWNER_WRITE = 0200; case OWNER_READ = 0400; static function toInt(EnumSet $perms) : int { $bits = 0; foreach ($perms as $perm) { $bits |= $perm->value; } return $bits; } static function fromInt(int $bits) : EnumSet { $perms = new EnumSet; foreach (self::cases() as $perm) { if ($perm->value & $bits) { $perms |= $perm; } } return $perms; } } $mode = stat($someFile)["mode"]; // e.g. 0644 $perms = FilePerm::fromInt($mode); // OTHER_READ | GROUP_READ | OWNER_WRITE | OWNER_READ $perms &= FilePerm::OWNER_READ | FilePerm::OWNER_WRITE | FilePerm::OWNER_EXEC; // dismiss all but owner permissions chmod($someFile, FilePerm::toInt($perms)); // saving 0600 ===== FAQ ===== ==== How does it compare to current approaches? ==== In PHP we have a lot of functions which expect a ''$flags'' parameter. These usually are loosely defined constants, usually prefixed with a fixed string. Example: ''json_encode''. There are currently 15 flags, each a distinct integer being a power of two, prefixed with JSON_. If we designed this function on top of this RFC, we would have an enum with cases for every option, to be combined at will: enum Json { case FORCE_OBJECT; case HEX_QUOT; case THROW_ON_ERROR; case ... } json_encode($json, Json::FORCE_OBJECT | Json::THROW_ON_ERROR) The usage on the json_encode method is similar to current usage, but now we have a self-contained enum of options which can be applied. Any bad option is easily seen in code and give a nice error message at runtime. ==== Why internal? ==== It is easy to argue here that this can be done in userland. While certainly true, a lot of the ergonomics are lost: * No trivial emptiness check (needs extra method) * No trivial conversions between ''array'' and ''EnumSet'' * Operations require a method * Boxing and unboxing is necessary (it would be impossible to pass an enum value directly to a function expecting ''EnumSet'') * Ugly class generation via eval() if we want proper typing of the ''EnumSet'' Overall there is so much more flexibility for the user in having enum set operations first class that it warrants an internal implementation. ===== Backward Incompatible Changes ===== This is no impact to backwards compatibility apart from allocating the ''EnumSet'' class name. ===== Proposed PHP Version(s) ===== To be included in PHP 8.1. (Later inclusion may have BC implications.) ===== Proposed Voting Choices ===== Include ''EnumSet'' in PHP 8.1? * Yes * No The vote requires a 2/3 majority. ===== Patches and Tests ===== TBD. ===== Implementation ===== TBD.