rfc:rng_extension

This is an old revision of the document!


PHP RFC: Random Extension 5.x

Introduction

PHP comes with a number of useful PRNGs (Pseudo Random Number Generator).

Mersenne Twister (a.k.a MT19937), the default RNG included in PHP, has a very long period of 2^19937-1 and generates high quality random numbers. To take advantage of this, you can do the following

<?php
// Generate a 31-bit random number
echo mt_rand() . "\n";
echo mt_rand() . "\n";
 
// Generate a number in range
echo mt_rand(5, 15);

It is also possible to specify a seed value so that the same result is always reproduced. This can be useful for testing functions that use random numbers, or for recreating game situations.

<?php
mt_srand(1234);
echo mt_rand(); // result: 411284887
 
$array = range(0, 9);
mt_srand(1234);
shuffle($array); // result: list{7, 1, 8, 0, 4, 2, 9, 6, 3, 5}
 
mt_srand(1234);
str_shuffle('foobar'); // result: afroob

However, at present, Mersenne Twister is facing several challenges.

The first one is that the state is global. The state of Mersenne Twister is implicitly stored in a global area of PHP, and there is no way for the user to access it. This means that code like the following can easily be unrepeatable The shuffle(), str_shuffle(), and array_rand() functions implicitly use the internal Mersenne Twister state to advance the random number table.

<?php
 
function foo(): void {
    // It was added later.
    $foo = str_suffle('foobar');
}
 
mt_srand(1234);
foo();
mt_rand(1, 100); // result: 76 -> 65

This problem can be avoided if you have full ownership of the code, but if you have external packages in place, such as Composer, it is difficult to keep it.

One solution is to implement the PRNG in userland. The following code implements XorShift128+ PRNG in Pure PHP.

class XorShift128Plus
{
    /* constants */
    protected const MASK_S5 = 0x07ffffffffffffff;
    protected const MASK_S18 = 0x00003fffffffffff;
    protected const MASK_S27 = 0x0000001fffffffff;
    protected const MASK_S30 = 0x00000003ffffffff;
    protected const MASK_S31 = 0x00000001ffffffff;
    protected const MASK_LO = 0x00000000ffffffff;
 
    protected const ADD_HI = 0x9e3779b9;
    protected const ADD_LO = 0x7f4a7c15;
    protected const MUL1_HILO = 0x476d;
    protected const MUL1_HIHI = 0xbf58;
    protected const MUL1_LO = 0x1ce4e5b9;
    protected const MUL2_HIHI = 0x94d0;
    protected const MUL2_HILO = 0x49bb;
    protected const MUL2_LO = 0x133111eb;
 
    /* states */
    protected int $s0;
    protected int $s1;
 
    public function __construct(int $seed)
    {
        $s = $seed;
        $this->s0 = $this->splitmix64($s);
        $this->s1 = $this->splitmix64($s);
    }
 
    public function generate(): int
    {
        $s1 = $this->s0;
        $s0 = $this->s1;
 
        $s0h = ($s0 >> 32) & self::MASK_LO;
        $s0l = $s0 & self::MASK_LO;
        $s1h = ($s1 >> 32) & self::MASK_LO;
        $s1l = $s1 & self::MASK_LO;
        $zl = $s0l + $s1l;
        $zh = $s0h + $s1h + ($zl >> 32);
        $z = ($zh << 32) | ($zl & self::MASK_LO);
 
        $this->s0 = $s0;
        $s1 ^= $s1 << 23;
        $this->s1 = $s1 ^ $s0 ^ (($s1 >> 18) & self::MASK_S18) ^ (($s0 >> 5) & self::MASK_S5);
 
        return $z;
    }
 
    protected function splitmix64(int &$s): int
    {
        $zl = $s & self::MASK_LO;
        $zh = ($s >> 32) & self::MASK_LO;
        $carry = $zl + self::ADD_LO;
        $z = $s = (($zh + self::ADD_HI + ($carry >> 32)) << 32) | ($carry & self::MASK_LO);
 
        $z ^= ($z >> 30) & self::MASK_S30;
        $zl = $z & self::MASK_LO;
        $zh = ($z >> 32) & self::MASK_LO;
        $lo = self::MUL1_LO * $zl;
        $zll = $zl & 0xffff;
        $zlh = $zl >> 16;
        $mul1l = $zll * self::MUL1_HILO;
        $mul1h = $zll * self::MUL1_HIHI + $zlh * self::MUL1_HILO + (($mul1l >> 16) & 0xffff);
        $mul1 = (($mul1h & 0xffff) << 16) | ($mul1l & 0xffff);
        $mul2 = ((self::MUL1_LO * $zh) & self::MASK_LO);
        $carry = (($lo >> 32) & self::MASK_LO);
        $hi = $mul1 + $mul2 + $carry;
        $z = ($hi << 32) | ($lo & self::MASK_LO);
 
        $z ^= ($z >> 27) & self::MASK_S27;
        $zl = $z & self::MASK_LO;
        $zh = ($z >> 32) & self::MASK_LO;
        $lo = self::MUL2_LO * $zl;
 
        $zll = $zl & 0xffff;
        $zlh = $zl >> 16;
        $mul1l = $zll * self::MUL2_HILO;
        $mul1h = $zll * self::MUL2_HIHI + $zlh * self::MUL2_HILO + (($mul1l >> 16) & 0xffff);
        $mul1 = (($mul1h & 0xffff) << 16) | ($mul1l & 0xffff);
 
        $mul2 = (self::MUL2_LO * $zh) & self::MASK_LO;
        $carry = ($lo >> 32) & self::MASK_LO;
        $hi = $mul1 + $mul2 + $carry;
        $z = ($hi << 32) | ($lo & self::MASK_LO);
 
        return $z ^ (($z >> 31) & self::MASK_S31);
    }
}
 
$xs128pp = new \XorShift128Plus(1234);
 
// Benchmarking
for ($i = 0; $i < 1000000000; $i++) {
    $xs128pp->generate();
}

There are also third-party PRNG implementations. The following libraries provide XorShift and Mersenne Twister as PRNG algorithms.

https://github.com/savvot/random

However, they all have some performance issues in common. The following is the speed of the various implementations when running under PHP 8.1.

XorShift128+ (iter:1000000000) MtRand (savvot/random) (iter: 10000000) Native MT (iter: 10000000)
PHP 8.1 0m3.218s 0m4.161s 0m0.160s
PHP 8.1 with JIT 0m1.836s (64M buffer) 0m2.184s (64M buffer) 0m0.184s (64M buffer)

JIT can help significantly in improving speed, but the difference is still significant compared to native implementations. PHP has not language features suitable for numeric computations such as unsigned integers, and PRNG implementations require a variety of workarounds.

Next, Mersenne Twister (MT19937) can only generate 32-bit integers (31-bit in user space). In recent years, most server environments running PHP have a 64-bit architecture. However, the Mersenne Twister implementation is only 32 bits wide internally and 31 bits wide in userland.

And next, PRNG / CSPRNG implementation in PHP is disjointed. Each of these implementations is implemented separately in a single ext/standard huge extension. The correspondence between them is as follows

extension header source
Combined LCG standard php_lcg.h lcg.c
libc rand* standard php_rand.h rand.c
MT19937 standard php_mt_rand.h mt_rand.c
CSPRNG standard php_random.h random.c

*libc rand is already an alias for MT19937

The current situation is difficult to use for extension developers who want to use PHPAPI, and it is not very good from a maintenance point of view. Implementing a new RNG in PHP, for example, would require even more files.

Proposal

Separate the RNG implementation into ext/random and implement various RNG classes with class-level scope.

First, all RNG implementations in ext/standard will be migrated to the new ext/random. To maintain compatibility of the extensions, the various ext/standard header files will be kept as reading ext/random header files. Since ext/random will always be bundled, there will be no compatibility impact on userland.

For example, ext/standard/php_rand.h would look like this

#include "ext/random/php_random.h"

This will allow the extension to maintain compatibility.

The next step is to change the internal PHP implementation to handle random numbers in object scope. This will allow the classes and methods shown in the following stub to be used.

<?php
 
/** @generate-class-entries */
 
namespace {
    function lcg_value(): float {}
 
    function mt_srand(int $seed = 0, int $mode = MT_RAND_MT19937): void {}
 
    /** @alias mt_srand */
    function srand(int $seed = 0, int $mode = MT_RAND_MT19937): void {}
 
    function rand(int $min = UNKNOWN, int $max = UNKNOWN): int {}
 
    function mt_rand(int $min = UNKNOWN, int $max = UNKNOWN): int {}
 
    function mt_getrandmax(): int {}
 
    /** @alias mt_getrandmax */
    function getrandmax(): int {}
 
    /** @refcount 1 */
    function random_bytes(int $length): string {}
 
    function random_int(int $min, int $max): int {}
}
 
namespace Random\Engine
{
    class CombinedLCG implements Random\SeedableEngine, Random\SerializableEngine
    {
        public function __construct(int|null $seed = null) {}
 
        public function generate(): string {}
 
        public function __serialize(): array {}
 
        public function __unserialize(array $data): void {}
 
        public function __debugInfo(): array {}
    }
 
    class MersenneTwister implements Random\SeedableEngine, Random\SerializableEngine
    {
        public function __construct(int|null $seed = null, int $mode = MT_RAND_MT19937) {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::generate */
        public function generate(): string {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::__serialize */
        public function __serialize(): array {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::__unserialize */
        public function __unserialize(array $data): void {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::__debugInfo */
        public function __debugInfo(): array {}
    }
 
    class PCG64 implements Random\SeedableEngine, Random\SerializableEngine
    {
        public function __construct(string|int|null $seed = null) {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::generate */
        public function generate(): string {}
 
        public function jump(int $advance): void {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::__serialize */
        public function __serialize(): array {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::__unserialize */
        public function __unserialize(array $data): void {}
 
        /** @implementation-alias Random\Engine\CombinedLCG::__debugInfo */
        public function __debugInfo(): array {}
    }
 
    /** @not-serializable */
    class Secure implements Random\CryptoSafeEngine
    {
        /** @implementation-alias Random\Engine\CombinedLCG::generate */
        public function generate(): string {}
    }
}
 
namespace Random
{
    interface Engine
    {
        public function generate(): string;
    }
 
    interface CryptoSafeEngine extends Engine
    {
    }
 
    interface SeedableEngine extends Engine
    {
    }
 
    interface SerializableEngine extends Engine
    {
        public function __serialize(): array {}
 
        public function __unserialize(array $data): void {}
 
        public function __debugInfo(): array {}
    }
 
    final class Randomizer
    {
        public readonly Engine $engine;
 
        public function __construct(?Engine $engine = null) {}
 
        public function getInt(int $min = UNKNOWN, int $max = UNKNOWN): int {}
 
        public function getBytes(int $length): string {}
 
        public function shuffleArray(array $array): array {}
 
        public function shuffleString(string $string): string {}
 
        public function __serialize(): array {}
 
        public function __unserialize(array $data): void {}
    }
}

The Random\Engine provides an interface to implement a random number generator. This interface is also available from userland and allows for the implementation of RNGs in PHP. This interface has a single generate(): string method, which should return a random number value as a binary string.

implemented of:

  • Random\Engine\CombinedLCG
  • Random\Engine\MersenneTwister
  • Random\Engine\PCG64
  • Random\Engine\Secure

Random\CryptoSafeEngine is an interface implemented by the CSPRNG (Cryptographically Secure Random Number Generator).

implemented of:

  • Random\Engine\Secure

Random\SeedableEngine is an interface implemented by RNG that guarantees reproducibility by providing a user-defined seed value.

implemented of:

  • Random\Engine\CombinedLCG
  • Random\Engine\PCG64
  • Random\Engine\MersenneTwister

Random\Engine\CombinedLCG, Random\Engine\MersenneTwister and Random\Engine\PCG64 each provide a native implementation of the RNG of the same name. native implementations of the RNGs of the same name. If null is passed as the seed value argument in the constructor argument, the seed value is generated using a mechanism equivalent to the random_bytes() function.

$mt = new \Random\Engine\MersenneTwister(1234); // OK
$pcg64 = new \Random\Engine\PCG64(1234); // OK
 
$mt = new \Random\Engine\MersenneTwister(); // OK
$pcg64 = new \Random\Engine\PCG64(); // OK

Random\Engine\CombinedLCG provides a simple implementation of Combined LCG. The width that this algorithm can generate is 32-bit, and the quality of the random numbers is not very high.

Random\Engine\MersenneTwister provides an implementation of Mersenne Twister that can be used in PHP. The width this algorithm can generate is 32-bit and the quality of the random numbers is high. The constructor has the same signature as mt_srand(). This means that you can still use PHP's broken Mersenne Twister by specifying MT_RAND_PHP as the second argument.

$mt = new \Random\Engine\MersenneTwister(1234, MT_RAND_MT19937);
$mt = new \Random\Engine\MersenneTwister(1234, MT_RAND_PHP);

Random\Engine\PCG64 is an implementation of the Permuted congruential generator. It is based on PCG64 (pcg_state_setseq_128 XSL-RR-RR) which is a commonly used PRNG, such as Rust and Python (NumPy). This algorithm has a jump() method that can arbitrarily advance the state of the RNG. The size of the state of this RNG is 128-bits, so it cannot be fully seeded by simply passing a number as the seed value. If a numeric value is passed, the RNG internally uses the PCG64 mechanism to generate 128-bit internal state. However, it is possible to seed with completely user-defined data by passing a 128-bit string to the constructor.

Random\Engine\Secure provides the functionality of CSPRNG. It is almost exactly the same as the random_int() and random_bytes() functions.

The Random\Randomizer class provides various methods for manipulating values based on the class that implements the Random\Engine interface. It throws a RuntimeException if the Randomizer fails to generate a random number internally.

The Random\Randomizer class provides various methods for manipulating values based on the class that implements the Random\Engine interface. It throws a \RuntimeException if the Randomizer fails to generate a random number internally.

Random\Randomizer::getInt() has the same signature as mt_rand() and returns a number in the range if the argument is specified, or a 1-bit left-shifted value of the RNG-generated value if it is not.

Random\Randomizer::getBytes() has the same signature as random_bytes() and generates a binary string of the specified length. If the binary string produced by the RNG is shorter than the specified length, it will repeat internally to produce the specified length. However, if the generated length exceeds the specified length, the value is simply discarded. In other words, if you were to run getBytes(7) with a 64-bit wide (8-byte) RNG, the last byte would be discarded.

Examples of these uses are as follows:

// Use different RNGs for different environments.
$rng = $is_production
    ? new Random\Engine\Secure()
    : new Random\Engine\PCG64(1234);
 
$randomizer = new Random\Randomizer($rng);
$randomizer->shuffleString('foobar');
// Safely migrate the existing mt_rand() state.
 
// before
mt_srand(1234, MT_RAND_PHP);
foobar();
$result = str_shuffle('foobar');
 
mt_srand(1234, MT_RAND_MT19937);
foobar();
$next = mt_rand();
 
// after
$randomizer = new Random\Randomizer(new Random\Engine\MersenneTwister(1234, MT_RAND_PHP));
foobar();
$result = $randomizer->shuffleString('foobar');
 
$randommizer = new Random\Randomizer(new Random\Engine\MersenneTwister(1234, MT_RAND_MT19937));
foobar();
$next = $randomizer->getInt();

PRNG shootout

When introducing a new RNG algorithm, the selection of the algorithm is very important. The following table shows the RNG algorithms that I considered and their characteristics.

Generate size State size Performance Issues Implemented applications
MT19937 32-bit 32-bit x 624 Normal Some statistical test failed PHP, Python, Ruby
XorShift128+ 64-bit 64-bit x 2 Excellent Failure BigCrush V8, Unity
Xoshiro256++ 64-bit 64-bit x 4 Excellent, SIMD-frendly Currently none Rust
Xoshiro256* * 64-bit 64-bit x 4 Excellent Currently none Rust, .NET 6.0
PCG64 (XSL-RR-RR) 64-bit 128-bit x 2 Good Currently none Rust, NumPy

MT19937 and XorShift128+ are already widely used, but they have failed several statistical tests and are not recommended for new use. So I adopted a more modern PRNG called PCG64 which do not have any statistical test problems.

PCG64 (pcg_state_setseq_128 XSL-RR-RR) looked like a good fit since it uses 64-bit wide values. PCG64 uses 128-bit integers, which cannot be used natively in 32-bit environments and needs to be emulated, but I think this is not a problem since most environments are now using 64-bit architectures.

I also considered Xoshiro256** but chose PCG because all of the issues raised against PCG64 appeared to have been resolved appropriately. The issues raised and the inventor's discussion of them can be found at

It is interesting to note that these algorithms have been heavily criticized by each other. Both opinions were respectable, which made the selection process very difficult.

I considered implementing both but adding unnecessary choices would have caused confusion for the users so the idea was dropped. If anyone thinks one is needed it can be added through PHP extensions.

Future Scope

These are not within the scope of this RFC, but are worth considering in the future:

  • Remove old header files for compatibility (php_lcg.h, php_rand.h, php_mt_rand.h, php_random.h)
  • Deprecate lcg_value(), mt_srand(), srand()

Backward Incompatible Changes

The following names have been reserved and will no longer be available

  • “Random”
  • “Random\Engine”
  • “Random\Engine\CombinedLCG”
  • “Random\Engine\MersenneTwister”
  • “Random\Engine\PCG64”
  • “Random\Engine\Secure”
  • “Random\Randomizer”

Proposed PHP Version(s)

8.2

RFC Impact

To SAPIs

none

To Existing Extensions

In the future, it may be necessary to change the included header files to point to ext/random/php_random.h. However, compatibility will be maintained for now.

To Opcache

none

New Constants

none

php.ini Defaults

none

Open Issues

none

Vote

Voting opens 2022-MM-DD and 2021-MM-DD at 00:00:00 EDT. 2/3 required to accept.

Add Random extension
Real name Yes No
asgrim (asgrim)  
beberlei (beberlei)  
bmajdak (bmajdak)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
crell (crell)  
dharman (dharman)  
jbnahan (jbnahan)  
kalle (kalle)  
kguest (kguest)  
kocsismate (kocsismate)  
lbarnaud (lbarnaud)  
lufei (lufei)  
marandall (marandall)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
pierrick (pierrick)  
sergey (sergey)  
svpernova09 (svpernova09)  
timwolla (timwolla)  
twosee (twosee)  
Final result: 21 0
This poll has been closed.

Patches and Tests

rfc/rng_extension.1646287028.txt.gz · Last modified: 2022/03/03 05:57 by zeriyoshi