rfc:uuid

PHP RFC: UUID

  • Version: 1.0
  • Date: 2017-05-25
  • Author: Richard Fussenegger, php@fleshgrinder.com
  • Status: Under Discussion
  • First Published at: http://wiki.php.net/rfc/uuid

Introduction

Universally Unique Identifiers (UUIDs, also known as Globally Unique Identifiers [GUIDs]) are 128 bit integers that guarantee uniqueness across space and time. PHP currently provides the uniqid function only, however, there are many flaws to it; as is apparent from the many warnings on the manual page. UUIDs are the natural answer to that problem. UUIDs are also gaining more attraction due to emerging technologies like streaming platforms (e.g. Kafka), or event sourcing applications, since uniqueness per record is of paramount importance. Depending on a central (locking) authority increases complexity and decreases throughput of such systems.

UUIDs are defined and standardized in RFC 4122, but where effectively used long before in many systems. The algorithms that are involved are well understood and battle tested through ubiquitous software, like Microsoft’s Windows operating system, since almost 30 years. UUIDs are mainly used to assign identifiers to entities without requiring a central authority. They are thus particularly useful in distributed systems. They also allow very high allocation rates; up to 10 million per second per machine, if necessary. Please refer to the Wikipedia article for more details about UUIDs, their flaws, as well as collision probabilities.

Most high-level programming languages provide support for UUIDs out-of-the-box. The following is a list of widely used languages and other software that provides support for UUIDs out-of-the-box:

  • Java (5+ and automatically all of its dialect, e.g. Clojure, Groovy, Scala, Kotlin, …)
  • Python (2.5+)
  • Ruby (1.9+)
  • .NET (C#, VB, F#, C, C++, …)
  • D
  • Objective-C (iOS SDK)
  • Swift
  • ActionScript (3+)
  • ColdFusion
  • Cocoa/Carbon
  • Wolfram Language
  • SQL (MySQL, PostgreSQL, Oracle, SQL Server, …)
  • Boost (C++)
  • Windows (C, C++) and console tool
  • Linux (C, C++) and device
  • BSD (C, C++)

Proposal

Provide a UUID implementation as part of the PHP standard module, that ensures easy and standards compliant construction, generation, comparison, and formatting of UUIDs according to RFC 4122 as an immutable value object.

Why should this be part of PHP’s standard module?

UUIDs are a universally applicable feature based on a very stable standard. A programming language’s standard module should provide the most basic and repeating building blocks, to ensure that developers can achieve things fast without much hassle. Reinventing the wheel, or searching the Internet (finding the right Composer packages or StackOverflow answer and copy-pasting) for basic things results in exactly the opposite. Another important aspect to consider is to ensure compatibility of generated UUIDs across PHP software but equally beyond the boundaries of them (e.g. databases).

The UUID reference implementation in PHP (ramsey/uuid) has over 10 million downloads on Packagist (including the numbers of its predecessor rhumsaa/uuid). Other less well known packages count another 1 million downloads overall. These impressive numbers illustrate how important and ubiquitous UUIDs are in today’s software. Note that the aim of this proposal is not to undermine the quality of these libraries, or attack their success in any other way. The functionality provided by ramsey/uuid outperforms what this proposal’s implementation provides by magnitudes. The goal is it to provide a very stable and standards compliant implementation that accomplishes the most basic requirements.

Why C?

There is no reason why this should be implemented in C. One could argue that it is faster, which it probably is, but this is a weak argument. This RFC would propose the inclusion of UUIDs implemented in PHP if shipping of PHP code as part of the standard module of PHP would be possible. However, there is a C API included that allows other PHP modules to utilize UUIDs.

Fully documented C API can be found at GitHub.

PHPAPI extern zend_class_entry *php_ce_UUID;
PHPAPI extern zend_class_entry *php_ce_UUIDParseException;
 
PHPAPI typedef uint8_t php_uuid[16];
PHPAPI typedef char php_uuid_hex[33];
PHPAPI typedef char php_uuid_string[37];
 
PHPAPI const uint8_t PHP_UUID_LEN = 16;
PHPAPI const uint8_t PHP_UUID_HEX_LEN = 32;
PHPAPI const uint8_t PHP_UUID_STRING_LEN = 36;
 
PHPAPI extern const php_uuid PHP_UUID_NAMESPACE_DNS;
PHPAPI extern const php_uuid PHP_UUID_NAMESPACE_OID;
PHPAPI extern const php_uuid PHP_UUID_NAMESPACE_URL;
PHPAPI extern const php_uuid PHP_UUID_NAMESPACE_X500;
PHPAPI extern const php_uuid PHP_UUID_NIL;
 
PHPAPI typedef enum php_uuid_variant {
  PHP_UUID_VARIANT_NCS             = 0, /* 0b0xx */
  PHP_UUID_VARIANT_RFC4122         = 1, /* 0b10x */
  PHP_UUID_VARIANT_MICROSOFT       = 2, /* 0b110 */
  PHP_UUID_VARIANT_FUTURE_RESERVED = 3, /* 0b111 */
} php_uuid_variant;
 
PHPAPI const uint8_t PHP_UUID_VERSION_1_TIME_BASED      = 1;
PHPAPI const uint8_t PHP_UUID_VERSION_2_DCE_SECURITY    = 2;
PHPAPI const uint8_t PHP_UUID_VERSION_3_NAME_BASED_MD5  = 3;
PHPAPI const uint8_t PHP_UUID_VERSION_4_RANDOM          = 4;
PHPAPI const uint8_t PHP_UUID_VERSION_5_NAME_BASED_SHA1 = 5;
 
PHPAPI int php_uuid_parse(php_uuid *uuid, const char *input, const size_t input_len, const zend_bool throw);
#define php_uuid_parse_silent(uuid, input, input_len) php_uuid_parse(uuid, input, input_len, 0)
#define php_uuid_parse_throw(uuid, input, input_len) php_uuid_parse(uuid, input, input_len, 1)
 
PHPAPI void php_uuid_create_v3(php_uuid *uuid, const php_uuid *namespace, const char *name, const size_t name_len);
 
PHPAPI int php_uuid_create_v4(php_uuid *uuid, const zend_bool throw);
#define php_uuid_create_v4_silent(uuid) php_uuid_create_v4(uuid, 0)
#define php_uuid_create_v4_throw(uuid) php_uuid_create_v4(uuid, 1)
 
PHPAPI void php_uuid_create_v5(php_uuid *uuid, const php_uuid *namespace, const char *name, const size_t name_len);
 
PHPAPI zend_always_inline const php_uuid_variant php_uuid_get_variant(const php_uuid uuid);
 
PHPAPI zend_always_inline const uint8_t php_uuid_get_version(const php_uuid uuid);
 
PHPAPI zend_always_inline const int php_uuid_is_nil(const php_uuid uuid);
 
PHPAPI zend_always_inline void php_uuid_to_hex(php_uuid_hex *buffer, const php_uuid uuid);
 
PHPAPI zend_always_inline void php_uuid_to_string(php_uuid_string *buffer, const php_uuid uuid);

Why not PECL UUID?

The PECL UUID package is a procedural wrapper around Linux’s libuuid that could be moved to core instead of this implementation. There are several drawbacks when comparing the two implementations, besides the obvious cross operating system issue, which would be solvable via conditional compilation and linking against the UUID library of each respective operating system. The procedural nature means that generated values must be re-parsed upon every action, users cannot rely on type-safety because they are dealing with primitive strings, and it works outside of the rest of PHP’s ecosystem by not reusing existing functionality but rather re-implementing everything through the inclusion of new external dependencies.

Why a Class?

UUIDs are basically random data. There is no way for an application to distinguish between a string of 16 bytes (since strings in PHP are random bytes too) and a UUID. This problem can be minimized through the implementation of UUIDs as a class. The code that constraints a type to a UUID has the guarantee that the string is of exactly 16 bytes. The developer that constraints a type to a UUID has the guarantee that the other developer passing a value at least had to have a look at the UUID class. Of course, there is nothing that prevents the other developer from creating UUIDs that are highly predictable, as it is impossible to ensure that she does not do that.

Implementation

The implementation consists of two new final classes in the global namespace. Supported generation algorithms are version 3, 4, and 5. Version 3 is provided for backwards compatibility with existing UUID implementations, RFC 4122 recommends version 5 over version 3 because of MD5’s higher collision probability compared to SHA1; even if truncated, as is the case for version 5 UUIDs. Versions 1 and 2 are not supported due to privacy/security concerns. Please refer to the RFC and Wikipedia article for further details on all these technical details.

Fully documented UUID class can be found at GitHub.

final class UUID {
  public const VARIANT_NCS             = 0;
  public const VARIANT_RFC4122         = 1;
  public const VARIANT_MICROSOFT       = 2;
  public const VARIANT_FUTURE_RESERVED = 4;
 
  public const VERSION_1_TIME_BASED      = 1;
  public const VERSION_2_DCE_SECURITY    = 2;
  public const VERSION_3_NAME_BASED_MD5  = 3;
  public const VERSION_4_RANDOM          = 4;
  public const VERSION_5_NAME_BASED_SHA1 = 5;
 
  private $binary;
 
  private function __construct() {}
 
  /** @throws \InvalidArgumentException */
  public static function fromBinary(string $input): UUID {}
 
  /** @throws \UUIDParseException */
  public static function parse(string $input): UUID {}
 
  public static function v3(UUID $namespace, string $name): UUID {}
 
  /** @throws Exception */
  public static function v4(): UUID {}
 
  public static function v5(UUID $namespace, string $name): UUID {}
 
  public static function NamespaceDNS(): UUID {}
  public static function NamespaceOID(): UUID {}
  public static function NamespaceURL(): UUID {}
  public static function NamespaceX500(): UUID {}
  public static function Nil(): UUID {}
 
  /** @throws \Error */
  public function __set($_, $__): void {}
 
  /** @throws \UnexpectedValueException */
  public function __wakeup(): void {}
 
  public function getVariant(): int {}
  public function getVersion(): int {}
 
  public function isNil(): bool {}
 
  public function toBinary(): string {}
  public function toHex(): string {}
  public function toString(): string {}
 
  /** @throws \Error */
  private function __clone() {}
}

Fully documented UUIDParseException class can be found at GitHub.

final class UUIDParseException extends Exception {
  private $input;
  private $position;
 
  public function __construct(
    string $reason,
    string $input,
    int $position = 0,
    ?Throwable $previous = null
  ) {}
 
  public function getInput(): string {}
  public function getPosition(): int {}
}

The named fromBinary constructor can be utilized to construct an instance from any kind of UUID, including non-standards compliant ones. The only performed validation is a length check that ensures that the given string is exactly 16 bytes long. An InvalidArgumentException is thrown if that is not the case.

The named parse constructor tries to parse the given string input from one of the several known string representation forms of UUIDs. Those representations are:

  • String
  • Hexadecimal
  • URNs
  • Microsoft

Leading whitespace (spaces and tabs \t) and opening braces ({) are ignored, so are trailing whitespace (spaces and tabs \t) and closing braces (}). Hyphens (-), regardless of position, are always ignored. The method follows the robustness principle and is not meant for validation. The hexadecimal digits a through f are case insensitively parsed.

A UUIDParseException is thrown is parsing of the input string fails.

The named NamespaceDNS, NamespaceOID, NamespaceURL, NamespaceX500, and Nil constructors provide shortcuts for the predefined special UUIDs from RFC 4122.

The getVariant method returns the UUID variant of the current instance. The returned integer corresponds to one of the following class constants:

  • UUID::VARIANT_NCS
  • UUID::VARIANT_RFC4122
  • UUID::VARIANT_MICROSOFT
  • UUID::VARIANT_FUTURE_RESERVED

The getVersion method returns the UUID version of the current instance. The returned integer is in the [0, 15], and the values [1, 5] correspond to the following class constants:

  • UUID::VERSION_1_TIME_BASED
  • UUID::VERSION_2_DCE_SECURITY
  • UUID::VERSION_3_NAME_BASED_MD5
  • UUID::VERSION_4_RANDOM
  • UUID::VERSION_5_NAME_BASED_SHA1

The remaining method toBinary, toHex, and toString convert the current UUID instance to their respective string representation form. Both the toHex and toString methods always format the hexadecimal digits a through f as lower case characters, in accordance with RFC 4122.

Construction via the new keyword, cloning, and setting of dynamic properties are disabled. The latter is necessary to uphold the promise of a truly immutable object, and to ensure that comparison operations of PHP always yield the desired correct result. Cloning is disabled because it makes no sense to clone an immutable object, this is necessary because cloning in PHP is opt-out and not opt-in. The constructor is disabled once more to uphold the immutability promise, but also to ensure future compatibility in regards to possible new features.

No Magic!

The implementation does not use any of the special capabilities that are available to PHP internal code. No special object structure to store the bytes, instead a simple private property is utilized. No operator overloading, instead the standard PHP comparison for objects is used. No custom serialization logic, instead the standard PHP serialization is used.

No External Dependencies!

The implementation does not rely on anything other than PHP itself, the MD5, random, and SHA1 standard sub-modules, and the SPL Exceptions InvalidArgumentException and UnexpectedValueException. The UUID algorithms are implemented by reusing existing PHP functionality, there is no dependency on particular system headers or other external libraries. This means that any improvement of the random sub-module directly improves this implementation as well. The same is true for bug fixes in MD5 or SHA1 (very unlikely, but still).

Backward Incompatible Changes

Both UUID and UUIDParsException are now globally defined classes, which might collide with user defined classes of the same name in the global namespace. However, the risk of the introduction of them is considered to be very low, since the global namespace should not be used by PHP users.

Proposed PHP Version(s)

Next PHP 7.x

RFC Impact

To SAPIs

None

To Existing Extensions

None

To Opcache

None

New Constants

None

php.ini Defaults

None

Open Issues

Namespace

The discussion about namespace in PHP core is on ongoing dispute. This highly self-contained component could easily be provided from an internal namespace and thus lay the grounds for other things to come. This would require additional discussion, but possible namespaces in alphabetical order are (UUID is the class):

  • PHP\UUID
  • PHP\Core\UUID
  • PHP\Lang\UUID
  • PHP\Standard\UUID
  • PHP\UUIDs\UUID

Other variations, based on the premise that the reserved PHP namespace should be used only for things that are directly tied to the language itself (e.g. AST):

  • Core\UUID
  • Lang\UUID
  • Standard\UUID
  • Std\UUID
  • UUIDs\UUID

Doxygen Documentation

The provided implementation uses Doxygen documentation for the C APIs. There were concerns about this, because PHP is currently undocumented and no standard way of providing documentation is in use so far. A discussion on php-internals was started to address this issue separately.

Unaffected PHP Functionality

Nothing is affected.

Future Scope

Proposed Voting Choices

Simple 50%+1 majority vote.

Patches and Tests

Implementation

After the project is implemented, this section should contain

  • the version(s) it was merged to
  • a link to the git commit(s)
  • a link to the PHP manual entry for the feature

References

Rejected Features

None so far.

rfc/uuid.txt · Last modified: 2017/05/29 20:47 by fleshgrinder