PHP RFC: UUID
- Version: 1.0
- Date: 2017-05-25
- Author: Richard Fussenegger, php@fleshgrinder.com
- Status: Declined
- First Published at: http://wiki.php.net/rfc/uuid
Introduction
Universally Unique Identifiers (UUIDs, also known as Globally Unique Identifiers [GUIDs]) are 128 bit integers that guarantee uniqueness across space and time. PHP currently provides the uniqid
function only, however, there are many flaws to it; as is apparent from the many warnings on the manual page. UUIDs are the natural answer to that problem. UUIDs are also gaining more attraction due to emerging technologies like streaming platforms (e.g. Kafka), or event sourcing applications, since uniqueness per record is of paramount importance. Depending on a central (locking) authority increases complexity and decreases throughput of such systems.
UUIDs are defined and standardized in RFC 4122, but were effectively used long before in many systems. The algorithms that are involved are well understood and battle tested through ubiquitous software, like Microsoft’s Windows operating system, since almost 30 years. UUIDs are mainly used to assign identifiers to entities without requiring a central authority. They are thus particularly useful in distributed systems. They also allow very high allocation rates; up to 10 million per second per machine, if necessary. Please refer to the Wikipedia article for more details about UUIDs, their flaws, as well as collision probabilities.
Most high-level programming languages provide support for UUIDs out-of-the-box. The following is a list of widely used languages and other software that provides support for UUIDs out-of-the-box:
- Java (5+ and automatically all of its dialect, e.g. Clojure, Groovy, Scala, Kotlin, …)
- Python (2.5+)
- Ruby (1.9+)
- .NET (C#, VB, F#, C, C++, …)
- D
- Objective-C (iOS SDK)
- Swift
- ActionScript (3+)
- ColdFusion
- Cocoa/Carbon
- Wolfram Language
- SQL (MySQL, PostgreSQL, Oracle, SQL Server, …)
- Boost (C++)
- Windows (C, C++) and console tool
- Linux (C, C++) and device
- BSD (C, C++)
- …
Proposal
Provide a UUID implementation as part of the PHP standard module, that ensures easy and standards compliant construction, generation, comparison, and formatting of UUIDs according to RFC 4122 as an immutable value object.
Why should this be part of PHP’s standard module?
UUIDs are a universally applicable feature based on a very stable standard. A programming language’s standard module should provide the most basic and repeating building blocks, to ensure that developers can achieve things fast without much hassle. Reinventing the wheel, or searching the Internet (finding the right Composer packages or StackOverflow answer and copy-pasting) for basic things results in exactly the opposite. Another important aspect to consider is to ensure compatibility of generated UUIDs across PHP software but equally beyond the boundaries of them (e.g. databases).
The UUID reference implementation in PHP (ramsey/uuid
) has over 10 million downloads on Packagist (including the numbers of its predecessor rhumsaa/uuid
). Other less well known packages count another 1 million downloads overall. These impressive numbers illustrate how important and ubiquitous UUIDs are in today’s software. Note that the aim of this proposal is not to undermine the quality of these libraries, or attack their success in any other way. The functionality provided by ramsey/uuid
outperforms what this proposal’s implementation provides by magnitudes. The goal is it to provide a very stable and standards compliant implementation that accomplishes the most basic requirements.
Why C?
There is no reason why this should be implemented in C. One could argue that it is faster, which it probably is, but this is a weak argument. This RFC would propose the inclusion of UUIDs implemented in PHP if shipping of PHP code as part of the standard module of PHP would be possible. However, there is a C API included that allows other PHP modules to utilize UUIDs.
Why not PECL UUID?
The PECL UUID package is a procedural wrapper around Linux’s libuuid
that could be moved to core instead of this implementation. There are several drawbacks when comparing the two implementations, besides the obvious cross operating system issue, which would be solvable via conditional compilation and linking against the UUID library of each respective operating system. The procedural nature means that generated values must be re-parsed upon every action, users cannot rely on type-safety because they are dealing with primitive strings, and it works outside of the rest of PHP’s ecosystem by not reusing existing functionality but rather re-implementing everything through the inclusion of new external dependencies.
Why a Class?
UUIDs are basically random data. There is no way for an application to distinguish between a string of 16 bytes (since strings in PHP are random bytes too) and a UUID. This problem can be minimized through the implementation of UUIDs as a class. The code that constraints a type to a UUID has the guarantee that the string is of exactly 16 bytes. The developer that constraints a type to a UUID has the guarantee that the other developer passing a value at least had to have a look at the UUID class. Of course, there is nothing that prevents the other developer from creating UUIDs that are highly predictable, as it is impossible to ensure that she does not do that.
Implementation
The implementation consists of two new final classes in the global namespace. Supported generation algorithms are version 3, 4, and 5. Version 3 is provided for backwards compatibility with existing UUID implementations, RFC 4122 recommends version 5 over version 3 because of MD5’s higher collision probability compared to SHA1; even if truncated, as is the case for version 5 UUIDs. Versions 1 and 2 are not supported due to privacy/security concerns. Please refer to the RFC and Wikipedia article for further details on all these technical details.
Fully documented
UUID
class can be found at GitHub.
final class UUID { public const VARIANT_NCS = 0; public const VARIANT_RFC4122 = 1; public const VARIANT_MICROSOFT = 2; public const VARIANT_FUTURE_RESERVED = 4; public const VERSION_1_TIME_BASED = 1; public const VERSION_2_DCE_SECURITY = 2; public const VERSION_3_NAME_BASED_MD5 = 3; public const VERSION_4_RANDOM = 4; public const VERSION_5_NAME_BASED_SHA1 = 5; private $binary; private function __construct() {} /** @throws \InvalidArgumentException */ public static function fromBinary(string $input): UUID {} /** @throws \UUIDParseException */ public static function parse(string $input): UUID {} public static function v3(UUID $namespace, string $name): UUID {} /** @throws Exception */ public static function v4(): UUID {} public static function v5(UUID $namespace, string $name): UUID {} public static function NamespaceDNS(): UUID {} public static function NamespaceOID(): UUID {} public static function NamespaceURL(): UUID {} public static function NamespaceX500(): UUID {} public static function Nil(): UUID {} /** @throws \Error */ public function __set($_, $__): void {} /** @throws \UnexpectedValueException */ public function __wakeup(): void {} public function getVariant(): int {} public function getVersion(): int {} public function isNil(): bool {} public function toBinary(): string {} public function toHex(): string {} public function toString(): string {} /** @throws \Error */ private function __clone() {} }
Fully documented
UUIDParseException
class can be found at GitHub.
final class UUIDParseException extends Exception { private $input; private $position; public function __construct( string $reason, string $input, int $position = 0, ?Throwable $previous = null ) {} public function getInput(): string {} public function getPosition(): int {} }
The named fromBinary
constructor can be utilized to construct an instance from any kind of UUID, including non-standards compliant ones. The only performed validation is a length check that ensures that the given string is exactly 16 bytes long. An InvalidArgumentException
is thrown if that is not the case.
The named parse
constructor tries to parse the given string input from one of the several known string representation forms of UUIDs. Those representations are:
- String
- Hexadecimal
- URNs
- Microsoft
Leading whitespace (spaces
and tabs \t
) and opening braces ({
) are ignored, so are trailing whitespace (spaces
and tabs \t
) and closing braces (}
). Hyphens (-
), regardless of position, are always ignored. The method follows the robustness principle and is not meant for validation. The hexadecimal digits a
through f
are case insensitively parsed.
A UUIDParseException
is thrown if parsing of the input string fails.
The named NamespaceDNS
, NamespaceOID
, NamespaceURL
, NamespaceX500
, and Nil
constructors provide shortcuts for the predefined special UUIDs from RFC 4122.
The getVariant
method returns the UUID variant of the current instance. The returned integer corresponds to one of the following class constants:
UUID::VARIANT_NCS
UUID::VARIANT_RFC4122
UUID::VARIANT_MICROSOFT
UUID::VARIANT_FUTURE_RESERVED
The getVersion
method returns the UUID version of the current instance. The returned integer is in the [0, 15], and the values [1, 5] correspond to the following class constants:
UUID::VERSION_1_TIME_BASED
UUID::VERSION_2_DCE_SECURITY
UUID::VERSION_3_NAME_BASED_MD5
UUID::VERSION_4_RANDOM
UUID::VERSION_5_NAME_BASED_SHA1
The remaining method toBinary
, toHex
, and toString
convert the current UUID instance to their respective string representation form. Both the toHex
and toString
methods always format the hexadecimal digits a
through f
as lower case characters, in accordance with RFC 4122.
Construction via the new
keyword, cloning, and setting of dynamic properties are disabled. The latter is necessary to uphold the promise of a truly immutable object, and to ensure that comparison operations of PHP always yield the desired correct result. Cloning is disabled because it makes no sense to clone an immutable object, this is necessary because cloning in PHP is opt-out and not opt-in. The constructor is disabled once more to uphold the immutability promise, but also to ensure future compatibility in regards to possible new features.
No Magic!
The implementation does not use any of the special capabilities that are available to PHP internal code. No special object structure to store the bytes, instead a simple private property is utilized. No operator overloading, instead the standard PHP comparison for objects is used. No custom serialization logic, instead the standard PHP serialization is used.
No External Dependencies!
The implementation does not rely on anything other than PHP itself, the MD5, random, and SHA1 standard sub-modules, and the SPL Exceptions InvalidArgumentException
and UnexpectedValueException
. The UUID algorithms are implemented by reusing existing PHP functionality, there is no dependency on particular system headers or other external libraries. This means that any improvement of the random sub-module directly improves this implementation as well. The same is true for bug fixes in MD5 or SHA1 (very unlikely, but still).
Backward Incompatible Changes
Both UUID
and UUIDParseException
are now globally defined classes, which might collide with user defined classes of the same name in the global namespace. However, the risk of the introduction of them is considered to be very low, since the global namespace should not be used by PHP users.
Proposed PHP Version(s)
7.3
RFC Impact
To SAPIs
None
To Existing Extensions
None
To Opcache
None
New Constants
None
php.ini Defaults
None
Open Issues
None
Unaffected PHP Functionality
Nothing is affected.
Future Scope
- Addition of a
toURN
method, if we have a proper URL value object.
Proposed Voting Choices
Simple 50%+1 majority vote that ends on September 20, 2017.
Patches and Tests
Implementation
After the project is implemented, this section should contain
- the version(s) it was merged to
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
References
Rejected Features
- Doxygen Documentation (corresponding declined RFC)
- Namespaces (corresponding withdrawn RFC)