rfc:rng_extension

This is an old revision of the document!


PHP RFC: Random Extension 4.0

Introduction

PHP implements several useful RNGs. However, they are currently only available in the global scope.

Mersenne Twister, PHP's default RNG, provides a function mt_srand() to initialize with a user-specified seed value, but the scope is global, which may cause unintended user behavior.

When a user executes mt_srand(), one would expect it to only affect result of mt_rand(), however, the following functions implicitly affect the result of mt_rand()

  • shuffle()
  • str_shuffle()
  • array_rand()

For example, in the following code, the result of the second mt_rand() is not reproducible. This is because shuffle() uses a MT RNG internally, which changes the state.

mt_srand(1234);
$next = mt_rand();
 
mt_srand(1234);
$arr = range(0, 9);
shuffle($arr);
$next2 = mt_rand();
 
die("next: ${next}, next2: ${next2}"); // next: 411284887, next2: 1848274264

These behaviors were unintuitive and often led to unintended execution results, but were not that problematic for general web application use.

However, in more complex and repeatable applications (such as games), this can be a problem.

There is also the issue of state management difficulties with Fiber, which was added in PHP 8.1. Nikita had this to say:

https://externals.io/message/115918#115959

In addition, the Mersenne Twister, can only generate 32-bit values. In recent years, many of the environments where PHP runs have been migrating to 64-bit platforms. In order to generate more secure values, an RNG that can generate 64-bit wide values should be provided by the language.

Proposal

Implement the XorShift128Plus algorithm for generating new 64-bit wide random numbers, along with a random extension that includes an object scope RNG, and bundle it with PHP. XorShift128Plus is a fast, high-quality RNG that is proven in major web browsers. Many of the major hardware architectures are now 64-bit, so it makes sense to use this RNG.

In addition to the new algorithm, the following classes will be added to fix the global scope issue.

  • class Random\NumberGenerator\XorShift128Plus
  • class Random\NumberGenerator\MersenneTwister
  • class Random\NumberGenerator\CombinedLCG
  • class Random\NumberGenerator\Secure (aka php_random_bytes())

These classes will hold independent RNG state and will not affect the global scope.

RNGs other than XorShift128Plus are based on the RNGs currently implemented in PHP.

XorShift128Plus is special in that it accepts a string as a constructor argument. Normally, XorShift128Plus takes a 64-bit number as its seed value and uses the SplitMix64 algorithm to generate a 128-bit internal state, but you can use a user-defined internal state for the full 128 bits by specifying string.

$xorshift128plus = new \Random\NumberGenerator\XorShift128Plus(1234); // OK (use SplitMix64 internally)
$xorshift128plus = new \Random\NumberGenerator\XorShift128Plus(\random_bytes(16)); // OK (fully user-defined state)
$xorshift128plus = new \Random\NumberGenerator\XorShift128Plus('foobar'); // NG (thrown ValueError)

To avoid misuse, however, string must be a 128-bit string (16 characters). Otherwise, an exception will be thrown.

An interface Random\NumberGenerator is also added and are implmeneted by the classes above.

This interface has only a single generate() method which makes it possible to switch between RNG implementations depending on the situation, allowing alternative implementations to be done by PHP in userland. This is useful, for example, running tests.

final class FixedForUnitTest implements \Random\NumberGenerator
{
    private int $count = 0;
 
    public function generate(): int
    {
        return ++$this->count;
     }
}

However, the width of the random number that can be generated by a userland implementation depends on the size of the int in PHP on that platform. This means that you can only generate up to 32 bits in a 32-bit environment, and up to 64 bits in a 64-bit environment. This likewise means that 32-bit RNGs cannot be implemented in userland in a 64-bit environment.

// on 64-bit machine
 
final class UserMersenneTwister extends \Random\NumberGenerator\MersenneTwister
{
    // uses 64-bit internally, if generated: 1 (zend_long), bits: 0000000000000000000000000000000000000000000000000000000000000001 (64-bit)
    // normally MersenneTwister bits: 00000000000000000000000000000001 (32-bit)
    public function generate(): int
    {
        return parent::generate() - 1;
     }
}

I don't think this is a problem, as most requests to generate random numbers in userland are likely to return a fixed value or reproduce a specific scenario.

Random\Randomizer class will be added to manipulate data using these RNGs.

This class provides the following methods:

  • __constructor(\Random\NumberGenerator $generator = null) [defaults to XorShift128Plus if null]
  • getInt(int $min, int $max): int [replacement for mt_rand() / rand()]
  • getBytes(int $length): string [replacement for random_bytes()]
  • shuffleArray(array $array): array [replacement for shuffle()]
  • shuffleString(string $string): string [replacement for str_shuffle()]

Method equivalent to array_rand() was not implemented at this time because the implementation is complex and can be easily implemented in userland if necessary.

The stubs of functionality provided by this extension are as follows:

https://github.com/colopl/php-src/blob/upstream_rfc/scoped_rng_for_pr/ext/random/random.stub.php

Examples of these uses are as follows:

// Use different RNGs for different environments.
$rng = $is_production
    ? new Random\NumberGenerator\Secure()
    : new Random\NumberGenerator\XorShift128Plus(1234);
 
$randomizer = new Random\Randomizer($rng);
$randomizer->shuffleString('foobar');
// Safely migrate the existing mt_rand() state.
 
// before
mt_srand(1234, MT_RAND_PHP);
foobar();
$result = str_shuffle('foobar');
 
// after
$randomizer = new Random\Randomizer(new Random\NumberGenerator\MersenneTwister(1234, MT_RAND_PHP));
foobar();
$result = $randomizer->shuffleString('foobar');

As a side effect of this RFC, the following PHP functions have been moved to the new ext/random extension

This is because ext/standard/random.c reserves the name RANDOM and cannot be used by the extension. In addition, all RNG-related implementations will be moved to the new random extension in order to standardize the RNG implementation.

  • lcg_value()
  • srand()
  • rand()
  • mt_srand()
  • mt_rand()
  • random_int()
  • random_bytes()

The following internal APIs will also be moved to the ext/random extension:

  • php_random_int_throw()
  • php_random_int_silent()
  • php_combined_lcg()
  • php_mt_srand()
  • php_mt_rand()
  • php_mt_rand_range()
  • php_mt_rand_common()
  • php_srand()
  • php_rand()
  • php_random_bytes()
  • php_random_int()

All of these features are available from the extension by simply including a single ext/random/php_random.h

The following header files are left in for extension compatibility. The contents all include ext/random/php_random.h.

  • ext/standard/php_lcg.h
  • ext/standard/php_rand.h
  • ext/standard/php_mt_rand.h
  • ext/standard/php_random.h

Future Scope

These are not within the scope of this RFC, but are worth considering in the future:

  • Remove old header files for compatibility (php_lcg.h, php_rand.h, php_mt_rand.h, php_random.h)
  • Deprecate lcg_value(), mt_srand(), srand()

Backward Incompatible Changes

The following names have been reserved and will no longer be available

  • “Random”
  • “Random\NumberGenerator”
  • “Random\NumberGenerator\XorShift128Plus”
  • “Random\NumberGenerator\MersenneTwister”
  • “Random\NumberGenerator\CombinedLCG”
  • “Random\NumberGenerator\Secure”

Proposed PHP Version(s)

8.2

RFC Impact

To SAPIs

none

To Existing Extensions

In the future, it may be necessary to change the included header files to point to ext/random/php_random.h. However, compatibility will be maintained for now.

To Opcache

none

New Constants

none

php.ini Defaults

none

Open Issues

It is not possible to reproduce 32-bit Mersenne Twister in userland in a 64-bit environment

https://github.com/php/php-src/pull/8094#pullrequestreview-881660425

Currently, the width of PHP's NumberGenerator::generate() generation is implicitly assumed to be zend_long (the size of an int in PHP). This means that it is not possible to implement a RNG with a generation width other than 64-bit width.

However, this RFC assumes that userland RNG implementations are often only used to reproduce certain scenarios in tests, and I personally think that this is sufficient.

Should generate() really return a number?

Tim says NumberGenerator::generate() should return a string instead of an int.

https://externals.io/message/117026#117032

While it is true that returning a string allows for more flexibility in the range of RNG generation, it poses a convenience problem, In particular, it makes it difficult to implement a userland RNG to reproduce a particular scenario.

Is adopting XorShift128Plus a good choices?

As mentioned in the Internals ML, there are a few known issues with XorShift128+.

https://prng.di.unimi.it/

https://externals.io/message/117026#117030

May need to consider a better candidate as an RNG to add.

Vote

Voting opens 2022-MM-DD and 2021-MM-DD at 00:00:00 EDT. 2/3 required to accept.

Add Random extension
Real name Yes No
asgrim (asgrim)  
beberlei (beberlei)  
bmajdak (bmajdak)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
crell (crell)  
dharman (dharman)  
jbnahan (jbnahan)  
kalle (kalle)  
kguest (kguest)  
kocsismate (kocsismate)  
lbarnaud (lbarnaud)  
lufei (lufei)  
marandall (marandall)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
pierrick (pierrick)  
sergey (sergey)  
svpernova09 (svpernova09)  
timwolla (timwolla)  
twosee (twosee)  
Final result: 21 0
This poll has been closed.

Patches and Tests

rfc/rng_extension.1644925391.txt.gz · Last modified: 2022/02/15 11:43 by zeriyoshi