This is an old revision of the document!
Introduction
There are several problems with the current implementation of PHP's random functionality, so some improvements.
Problems
There are four main problems
- Global state
- Mersenne Twister
- Randomness
- Internals
Global state
Mersenne Twister can be seeded with any value using mt_srand(). This allows users to obtain consistent results. However, the Mersenne Twister state is global to the PHP language runtime and is easily unrepeatable when call functions (shuffle(), str_shuffle(), array_rand()) that use random numbers, following:
<?php function foo(): void { // It was added later. $foo = str_suffle('foobar'); } mt_srand(1234); foo(); mt_rand(1, 100); // result: 76 -> 65
This problem can be avoided if you have full ownership of the code, but if you have external packages in place, such as Composer, it is difficult to keep it. Also, by using Generator and Fiber introduced in PHP 8.1, the current state can be easily lost. Output results are approximately unpredictable.
Given the above, mt_srand() and srand(), which should provide reproducible values, already do not meet that requirement.
An additional problem is that Swoole, a PHP extension, copies global random state to child processes due to its structure, making random number-related operations unsafe unless they are reseeded.
https://wiki.swoole.com/#/getting_started/notice?id=mt_rand%e9%9a%8f%e6%9c%ba%e6%95%b0
Mersenne Twister
Mersenne Twister is an excellent pseudo random number generator. But, it is already old and no longer suitable for the current needs.
It has a very long period of 2^19937 - 1. In general, a long period is a good thing, but nevertheless it fails several statistical tests (BigCrush and Crush).
Also, the size that Mersenne Twister can generate is limited to 32-bit. This is not compatible with the current situation where many execution environments are 64-bit and zend_long has a length of 64-bit.
Randomness
PHP's built-in functions (shuffle(), str_shuffle(), array_rand()) use Mersenne Twister as the default random number source. This is inappropriate if you are need for cryptographically secure random numbers. If a similar function that meets that requirement is needed, the user will need to implement a new function using random_int() or similar functions.
Internals
The implementation of random numbers in PHP is scattered within the standard module for historical reasons.
The following are different header files, but some are interdependent, which can be very confusing to extension developers.
extension | header | source | |
---|---|---|---|
Combined LCG | standard | php_lcg.h | lcg.c |
libc rand* | standard | php_rand.h | rand.c |
MT19937 | standard | php_mt_rand.h | mt_rand.c |
CSPRNG | standard | php_random.h | random.c |
Userland approach
Think about how the above problems could be solved in userland.
Implement a random number generator in PHP. Here I will consider an already existing implementation (https://github.com/savvot/random) and our implementation of XorShift128+.
class XorShift128Plus { /* constants */ protected const MASK_S5 = 0x07ffffffffffffff; protected const MASK_S18 = 0x00003fffffffffff; protected const MASK_S27 = 0x0000001fffffffff; protected const MASK_S30 = 0x00000003ffffffff; protected const MASK_S31 = 0x00000001ffffffff; protected const MASK_LO = 0x00000000ffffffff; protected const ADD_HI = 0x9e3779b9; protected const ADD_LO = 0x7f4a7c15; protected const MUL1_HILO = 0x476d; protected const MUL1_HIHI = 0xbf58; protected const MUL1_LO = 0x1ce4e5b9; protected const MUL2_HIHI = 0x94d0; protected const MUL2_HILO = 0x49bb; protected const MUL2_LO = 0x133111eb; /* states */ protected int $s0; protected int $s1; public function __construct(int $seed) { $s = $seed; $this->s0 = $this->splitmix64($s); $this->s1 = $this->splitmix64($s); } public function generate(): int { $s1 = $this->s0; $s0 = $this->s1; $s0h = ($s0 >> 32) & self::MASK_LO; $s0l = $s0 & self::MASK_LO; $s1h = ($s1 >> 32) & self::MASK_LO; $s1l = $s1 & self::MASK_LO; $zl = $s0l + $s1l; $zh = $s0h + $s1h + ($zl >> 32); $z = ($zh << 32) | ($zl & self::MASK_LO); $this->s0 = $s0; $s1 ^= $s1 << 23; $this->s1 = $s1 ^ $s0 ^ (($s1 >> 18) & self::MASK_S18) ^ (($s0 >> 5) & self::MASK_S5); return $z; } protected function splitmix64(int &$s): int { $zl = $s & self::MASK_LO; $zh = ($s >> 32) & self::MASK_LO; $carry = $zl + self::ADD_LO; $z = $s = (($zh + self::ADD_HI + ($carry >> 32)) << 32) | ($carry & self::MASK_LO); $z ^= ($z >> 30) & self::MASK_S30; $zl = $z & self::MASK_LO; $zh = ($z >> 32) & self::MASK_LO; $lo = self::MUL1_LO * $zl; $zll = $zl & 0xffff; $zlh = $zl >> 16; $mul1l = $zll * self::MUL1_HILO; $mul1h = $zll * self::MUL1_HIHI + $zlh * self::MUL1_HILO + (($mul1l >> 16) & 0xffff); $mul1 = (($mul1h & 0xffff) << 16) | ($mul1l & 0xffff); $mul2 = ((self::MUL1_LO * $zh) & self::MASK_LO); $carry = (($lo >> 32) & self::MASK_LO); $hi = $mul1 + $mul2 + $carry; $z = ($hi << 32) | ($lo & self::MASK_LO); $z ^= ($z >> 27) & self::MASK_S27; $zl = $z & self::MASK_LO; $zh = ($z >> 32) & self::MASK_LO; $lo = self::MUL2_LO * $zl; $zll = $zl & 0xffff; $zlh = $zl >> 16; $mul1l = $zll * self::MUL2_HILO; $mul1h = $zll * self::MUL2_HIHI + $zlh * self::MUL2_HILO + (($mul1l >> 16) & 0xffff); $mul1 = (($mul1h & 0xffff) << 16) | ($mul1l & 0xffff); $mul2 = (self::MUL2_LO * $zh) & self::MASK_LO; $carry = ($lo >> 32) & self::MASK_LO; $hi = $mul1 + $mul2 + $carry; $z = ($hi << 32) | ($lo & self::MASK_LO); return $z ^ (($z >> 31) & self::MASK_S31); } } $xs128pp = new \XorShift128Plus(1234); // Benchmarking for ($i = 0; $i < 1000000000; $i++) { $xs128pp->generate(); }
Compare the speed of these implementations with the PHP's mt_rand().
XorShift128+ (iter:1000000000) | MtRand (savvot/random) (iter: 10000000) | Native MT (iter: 10000000) | |
---|---|---|---|
PHP 8.1 | 0m3.218s | 0m4.161s | 0m0.160s |
PHP 8.1 with JIT | 0m1.836s (64M buffer) | 0m2.184s (64M buffer) | 0m0.184s (64M buffer) |
Even with JIT enabled, the native implementation is not far behind. This is due to PHP's inability to handle unsigned integers well.
JIT can help significantly in improving speed, but the difference is still significant compared to native implementations.
PHP has no language features suitable for numeric computations such as unsigned integers, and PRNG implementations require a variety of workarounds.
Nikita also says this case: https://externals.io/message/115918#115959
Proposal
User merits
Users will be able to use the OO API to:
Cryptographically secure random operations
The following code is cryptographically insecure because Mersenne Twister is used internally.
$items = range(1, 10); shuffle($items);
The OO API can be used to safely perform the same process.
$engine = new Random\Engine\Secure(); $randomizer = new Random\Randomizer($engine); $items = range(1, 10); $items = $randomizer->shuffleArray($items);
Choose RNG by environment
Depending on the environment, the appropriate RNG can be selected.
$rng = $is_production ? new Random\Engine\Secure() : new Random\Engine\PCG64(1234); $randomizer = new Random\Randomizer($rng); $randomizer->shuffleString('foobar');
Fixed random number sequence
Processes that continue to generate random numbers until certain requirements are met may make it difficult to measure the processing load.
$required_result = mt_rand(1, 100); while (($generated = mt_rand(1, 100)) !== $required_result) { echo "retry\n"; } echo "done\n";
interface and dynamic injections, allowing for the fixed sequences at test time.
$engine = new class () implements Random\Engine { public function generate(): string { // Result must be a string. return pack('V', 1); } }; $randomizer = new Random\Randomizer($engine); $required_result = $randomizer->getInt(1, 100); while (($generated = $randomizer->getInt(1, 100)) !== $required_result) { echo "retry\n"; } echo "done\n";
Approach
Merge and create a new random extension and bundled PHP.
The following headers are identical ext/random/php_random.h
- php_lcg.h
- php_rand.h
- php_mt_rand.h
- php_random.h
#include “ext/random/php_random.h”
Implement the following new interfaces and classes.
interface Random\Engine
Interface to provide random number generator engine.
It has a single generate(): string method that generates random numbers as a binary string. If you implement a random number generator in PHP, you must use the pack() function appropriately.
interface Random\CryptoSafeEngine
A marker interface to indicate that the implemented random number generator is cryptographically secure.
interface SerializableEngine
An interface indicating that the implemented random number generator is serializable.
The following methods must be implemented:
- _serialize(): array
- _unserialize(array $data): void
class Random\Engine\CombinedLCG
Generate random numbers using the CombinedLCG algorithm.
By passing a value to the constructor, it can be seeded with any value. If omitted or null, the seed value is generated by CSPRNG.
The following interface are implemented:
- Random\Engine
- Random\SerializableEngine
The following methods are implemented:
- _construct(int|null $seed = null)
- generate(): string
- _serialize(): array
- _unserialize(array $data): void
class Random\Engine\MersenneTwister
Generate random numbers using the MT19937 (a.k.a Mersenne Twister) algorithm.
By passing a value to the constructor, it can be seeded with any value. If omitted or null, the seed value is generated by CSPRNG. The second argument passing MT_RAND_PHP, allows the use of PHP's broken Mersenne Twister.
The following interface are implemented:
- Random\Engine
- Random\SerializableEngine
The following methods are implemented:
- _construct(int|null $seed = null, int $mode = MT_RAND_MT19937)
- generate(): string
- _serialize(): array
- _unserialize(array $data): void
class Random\Engine\PCG64
Generate random numbers using the PCG64 (Permuted Congruential Generator, pcg_state_setseq_128 XSL-RR-RR) algorithm.
By passing a value to the constructor, it can be seeded with any value. If omitted or null, the seed value is generated by CSPRNG. A string can also be passed as the seed value, and a string is required to seed with 64-bit or higher values. The string must be 128-bit.
The following interface are implemented:
- Random\Engine
- Random\SerializableEngine
The following methods are implemented:
- _construct(string|int|null $seed = null, string|int|null $sequence = null)
- generate(): string
- _serialize(): array
- _unserialize(array $data): void
class Random\Engine\Secure
Generate random numbers using the CSPRNG, by host provided solutions. Internally, php_random_bytes() is used. Not reproducible.
The following interface are implemented:
- Random\Engine
- Random\CryptoSafeEngine
The following methods are implemented:
- _construct()
- generate(): string
final class Random\Randomizer
A single class for processing with random numbers using the engine.
The following methods are implemented:
- _construct(Random\Engine $engine)
- getInt(): int - replaces mt_rand()
- getInt(int $min, int $max) - replaces mt_rand()
- getBytes(int length): string - replaces random_bytes()
- shuffleArray(array $array): array - replaces shuffle()
- shuffleString(string $string): string = replaces str_shuffle()
- _serialize(): array
- _unserialize(array $data): void
This class is serializable, but will fail if the given Engine is not serializable.
PRNG shootout
Since MT19937 has the aforementioned problems, an alternative algorithm must be chosen.
When introducing a new RNG algorithm, the selection of the algorithm is very important. The following table shows the RNG algorithms that I considered and their characteristics.
Generate size | State size | Performance | Issues | Implemented applications | |
---|---|---|---|---|---|
MT19937 | 32-bit | 32-bit x 624 | Normal | Some statistical test failed | PHP, Python, Ruby |
XorShift128+ | 64-bit | 64-bit x 2 | Excellent | Failure BigCrush | V8, Unity |
Xoshiro256++ | 64-bit | 64-bit x 4 | Excellent, SIMD-frendly | Currently none | Rust |
Xoshiro256* * | 64-bit | 64-bit x 4 | Excellent | Currently none | Rust, .NET 6.0 |
PCG64 (XSL-RR-RR) | 64-bit | 128-bit x 2 | Good | Currently none | Rust, NumPy |
MT19937 and XorShift128+ are already widely used, but they have failed several statistical tests and are not recommended for new use. So I adopted a more modern PRNG called PCG64 which do not have any statistical test problems.
PCG64 is the only implementation of a reproducible RNG, except for MT19937 for compatibility and the special uses User and Secure.
PCG64 (pcg_state_setseq_128 XSL-RR-RR) looked like a good fit since it uses 64-bit wide values. PCG64 uses 128-bit integers, which cannot be used natively in 32-bit environments and needs to be emulated, but I think this is not a problem since most environments are now using 64-bit architectures.
I also considered Xoshiro256** but chose PCG because all of the issues raised against PCG64 appeared to have been resolved appropriately. The issues raised and the inventor's discussion of them can be found at
It is interesting to note that these algorithms have been heavily criticized by each other. Both opinions were respectable, which made the selection process very difficult.
I considered implementing both but adding unnecessary choices would have caused confusion for the users so the idea was dropped. If anyone thinks one is needed it can be added through PHP extensions.
Future Scope
These are not within the scope of this RFC, but are worth considering in the future:
- Remove old header files for compatibility (php_lcg.h, php_rand.h, php_mt_rand.h, php_random.h)
- Deprecate lcg_value(), mt_srand(), srand()
Backward Incompatible Changes
The following names have been reserved and will no longer be available
- “Random”
- “Random\Engine”
- “Random\CryptoSafeEngine”
- “Random\SerializableEngine”
- “Random\Engine\CombinedLCG”
- “Random\Engine\MersenneTwister”
- “Random\Engine\PCG64”
- “Random\Engine\Secure”
- “Random\Randomizer”
Proposed PHP Version(s)
8.2
RFC Impact
To SAPIs
none
To Existing Extensions
In the future, it may be necessary to change the included header files to point to ext/random/php_random.h. However, compatibility will be maintained for now.
To Opcache
none
New Constants
none
php.ini Defaults
none
Open Issues
none
Vote
Voting opens 2022-MM-DD and 2021-MM-DD at 00:00:00 EDT. 2/3 required to accept.