rfc:rng_extension

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
rfc:rng_extension [2021/05/25 15:40]
zeriyoshi str_shuffle has drop-in replacement API
rfc:rng_extension [2022/08/01 16:52] (current)
timwolla Errata
Line 1: Line 1:
-====== PHP RFC: Add Random class ====== +====== PHP RFC: Random Extension 5.x ====== 
-  * Version: 0.3 +  * Version: 5.x 
-  * Date: 2021-05-18 +  * Date: 2022-02-24 
-  * Author: Go Kudo <zeriyoshi@gmail.com> +  * Author: Go Kudo <zeriyoshi@gmail.com> <g-kudo@colopl.co.jp
-  * Status: Draft+  * Status: Implemented 
 +  * Implementation: https://github.com/php/php-src/pull/8094
   * First Published at: http://wiki.php.net/rfc/object_scope_prng   * First Published at: http://wiki.php.net/rfc/object_scope_prng
  
 ===== Introduction ===== ===== Introduction =====
-PHP is currently having problems with RNG reproducibility. 
  
-PHP'RNG has been unified into an implementation using the Mersenne twisterwith the rand() and srand() functions becoming aliases for mt_rand() and mt_srand() respectively in PHP 7.1.+There are several problems with the current implementation of PHP'random functionalityso some proposed improvements.
  
-But, these functions still store the state in the global state of PHP and are not easily reproducibleLook at the following example.+==== Problems ==== 
 + 
 +There are four main problems 
 + 
 +  * Global state 
 +  * Mersenne Twister 
 +  * Randomness 
 +  * Internals 
 + 
 +=== Global state === 
 + 
 +Mersenne Twister state is implicitly stored in global area of PHPand there is no way for the user to access it, so adding any randomization functions between the seeding and the intended usage would break the code. 
 + 
 +Let's say you have the following code.
  
 <code php> <code php>
-echo foo(1234, function (): void {}) . PHP_EOL; // Result: 1480009472 +<?php
-echo foo(1234, function (): void { mt_rand(); }) . PHP_EOL; // Result: 1747253290+
  
-function foo(int $seed, callable $bar): int +function foo(): void 
-    mt_srand($seed); +    // do nothing;
-    $result = mt_rand(); +
-    $bar(); +
-    $result += mt_rand(); +
-    return $result;+
 } }
 +
 +mt_srand(1234);
 +foo();
 +mt_rand(1, 100); // result: 76
 </code> </code>
  
-As mentioned above, the reproducibility of random numbers can easily be lost if additional processing is added later.+Then at some point in time the function was edited like below.
  
-In addition, the fiber extension was introduced in PHP 8.1. This makes it more difficult to keep track of the execution order. However, this problem has existed since the inception of Generator.+<code php> 
 +<?php
  
-There is also the problem of functions that implicitly use the state stored in PHP's global state. shuffle()str_shuffle(), and array_rand() functions implicitly advance the state of a random number. This means that the following code is not reproducible, but it is difficult for the user to notice this.+function foo(): void { 
 +    str_shuffle('abc'); // added randomization 
 +}
  
-<code php> 
 mt_srand(1234); mt_srand(1234);
-echo mt_rand() . PHP_EOL; // Result411284887+foo(); 
 +mt_rand(1, 100); // result65 
 +</code>
  
-mt_srand(1234); +As you can see, the result of mt_rand has changed from 76 to 65 because str_shuffle() changed the state of Mersenne Twister internally. 
-str_shuffle('foobar'); + 
-echo mt_rand() . PHP_EOL; // Result1314500282+Maintaining such code can be difficult when your code utilizes external packages. 
 +Also, by using Generator and Fiber introduced in PHP 8.1, the current state can be easily lost. 
 + 
 +Given the above, mt_srand() and srand(), can not provide reproducible values in a consistent manner.  
 + 
 +Another problem which may occur is when using extensions like Swoole, which copy global random state to child processes due to its structure, making random number-related operations unsafe unless they are reseeded. 
 + 
 +https://wiki.swoole.com/#/getting_started/notice?id=mt_rand%e9%9a%8f%e6%9c%ba%e6%95%b0 
 + 
 +=== Mersenne Twister === 
 + 
 +Mersenne Twister is an excellent pseudo random number generator. But, it is old and no longer suitable for the current needs. 
 + 
 +It has a very long period of 2^19937 - 1. In general, a long period is a good thing, but nevertheless it fails several statistical tests (BigCrush and Crush). 
 + 
 +Also, the size that Mersenne Twister can generate is limited to 32-bit. This is not compatible with the current situation where many execution environments are 64-bit and zend_long has a length of 64-bit. 
 + 
 +=== Randomness === 
 + 
 +PHP's built-in functions (<php>shuffle()</php>, <php>str_shuffle()</php>, <php>array_rand()</php>) use Mersenne Twister as the default random number source. This is inappropriate if you need cryptographically secure random numbers. If a similar function that meets that requirement is needed, the user will need to implement a new function using <php>random_int()</php> or similar functions. 
 + 
 +=== Internals === 
 + 
 +The implementation of random numbers in PHP is scattered within the standard module for historical reasons. 
 + 
 +The following are different header files, but some are interdependent, which can be very confusing to extension developers. 
 + 
 +|                            ^ extension ^ header              ^ source      ^ 
 +^ Combined LCG  | standard    | php_lcg.h           | lcg.c           | 
 +^ libc rand*           | standard    | php_rand.h        | rand.c        | 
 +^ MT19937          | standard    | php_mt_rand.h | mt_rand.c | 
 +^ CSPRNG             | standard    | php_random.h   | random.c   | 
 + 
 + 
 +==== Userland approach ==== 
 + 
 +Think about how the above problems could be solved in userland. 
 + 
 +Implement a random number generator in PHP. Here I will consider an already existing implementation (https://github.com/savvot/randomand our implementation of XorShift128+. 
 + 
 +<code php> 
 +class XorShift128Plus 
 +
 +    /* constants */ 
 +    protected const MASK_S5 = 0x07ffffffffffffff; 
 +    protected const MASK_S18 = 0x00003fffffffffff; 
 +    protected const MASK_S27 = 0x0000001fffffffff; 
 +    protected const MASK_S30 = 0x00000003ffffffff; 
 +    protected const MASK_S31 = 0x00000001ffffffff; 
 +    protected const MASK_LO = 0x00000000ffffffff; 
 +  
 +    protected const ADD_HI = 0x9e3779b9; 
 +    protected const ADD_LO = 0x7f4a7c15; 
 +    protected const MUL1_HILO = 0x476d; 
 +    protected const MUL1_HIHI = 0xbf58; 
 +    protected const MUL1_LO = 0x1ce4e5b9; 
 +    protected const MUL2_HIHI = 0x94d0; 
 +    protected const MUL2_HILO = 0x49bb; 
 +    protected const MUL2_LO = 0x133111eb; 
 +  
 +    /* states */ 
 +    protected int $s0; 
 +    protected int $s1; 
 +  
 +    public function __construct(int $seed) 
 +    { 
 +        $s = $seed; 
 +        $this->s0 = $this->splitmix64($s); 
 +        $this->s1 = $this->splitmix64($s); 
 +    } 
 +  
 +    public function generate()int 
 +    { 
 +        $s1 = $this->s0; 
 +        $s0 = $this->s1; 
 +  
 +        $s0h = ($s0 >> 32) & self::MASK_LO; 
 +        $s0l = $s0 & self::MASK_LO; 
 +        $s1h = ($s1 >> 32) & self::MASK_LO; 
 +        $s1l = $s1 & self::MASK_LO; 
 +        $zl = $s0l + $s1l; 
 +        $zh = $s0h + $s1h + ($zl >> 32); 
 +        $z = ($zh << 32) | ($zl & self::MASK_LO); 
 +  
 +        $this->s0 = $s0; 
 +        $s1 ^= $s1 << 23; 
 +        $this->s1 = $s1 ^ $s0 ^ (($s1 >> 18) & self::MASK_S18) ^ (($s0 >> 5) & self::MASK_S5); 
 +  
 +        return $z; 
 +    } 
 +  
 +    protected function splitmix64(int &$s): int 
 +    { 
 +        $zl = $s & self::MASK_LO; 
 +        $zh = ($s >> 32) & self::MASK_LO; 
 +        $carry = $zl + self::ADD_LO; 
 +        $z = $s = (($zh + self::ADD_HI + ($carry >> 32)) << 32) | ($carry & self::MASK_LO); 
 +  
 +        $z ^= ($z >> 30) & self::MASK_S30; 
 +        $zl = $z & self::MASK_LO; 
 +        $zh = ($z >> 32) & self::MASK_LO; 
 +        $lo = self::MUL1_LO * $zl; 
 +        $zll = $zl & 0xffff; 
 +        $zlh = $zl >> 16; 
 +        $mul1l = $zll * self::MUL1_HILO; 
 +        $mul1h = $zll * self::MUL1_HIHI + $zlh * self::MUL1_HILO + (($mul1l >> 16) & 0xffff); 
 +        $mul1 = (($mul1h & 0xffff) << 16) | ($mul1l & 0xffff); 
 +        $mul2 = ((self::MUL1_LO * $zh) & self::MASK_LO); 
 +        $carry = (($lo >> 32) & self::MASK_LO); 
 +        $hi = $mul1 + $mul2 + $carry; 
 +        $z = ($hi << 32) | ($lo & self::MASK_LO); 
 +  
 +        $z ^= ($z >> 27) & self::MASK_S27; 
 +        $zl = $z & self::MASK_LO; 
 +        $zh = ($z >> 32) & self::MASK_LO; 
 +        $lo = self::MUL2_LO * $zl; 
 +  
 +        $zll = $zl & 0xffff; 
 +        $zlh = $zl >> 16; 
 +        $mul1l = $zll * self::MUL2_HILO; 
 +        $mul1h = $zll * self::MUL2_HIHI + $zlh * self::MUL2_HILO + (($mul1l >> 16) & 0xffff); 
 +        $mul1 = (($mul1h & 0xffff) << 16) | ($mul1l & 0xffff); 
 +  
 +        $mul2 = (self::MUL2_LO * $zh) & self::MASK_LO; 
 +        $carry = ($lo >> 32) & self::MASK_LO; 
 +        $hi = $mul1 + $mul2 + $carry; 
 +        $z = ($hi << 32) | ($lo & self::MASK_LO); 
 +  
 +        return $z ^ (($z >> 31) & self::MASK_S31); 
 +    } 
 +
 +  
 +$xs128pp = new \XorShift128Plus(1234); 
 +  
 +// Benchmarking 
 +for ($i = 0; $i < 1000000000; $i++) { 
 +    $xs128pp->generate(); 
 +}
 </code> </code>
 +
 +Compare the speed of these implementations with the PHP's mt_rand().
 +
 +|                                ^ PHP - XorShift128+ (iter:1000000000) ^ PHP - MtRand (savvot/random) (iter: 10000000) ^ Native - MT (iter: 10000000) ^
 +^ PHP 8.1                | 0m3.218s                                            | 0m4.161s                                                           | 0m0.160s                                 |
 +^ PHP 8.1 with JIT  | 0m1.836s (64M buffer)                    | 0m2.184s (64M buffer)                                    | 0m0.184s (64M buffer)         |
 +
 +Native implementation is much faster than userland ones, even with JIT enabled.
 +
 +More about this can be read here: https://externals.io/message/115918#115959
  
 ===== Proposal ===== ===== Proposal =====
-Implements Random class. 
  
-This class implement to ext/standardalways bundled PHP core.+Create a single Randomizer class which provides various randomization methods (like get int/bytesshuffle string/arrays). This class will take an Engine interface in the constructor which can be swapped based on users needs. Some essential RNG engines will be prepackaged for convenience but an Interface will also be provided so that algorithms can be easily added.
  
-The PHP code that represents the implementation is as follows:+I believe this proposal has the following benefits.
  
-<code php> +=== Swapping RNG Based on Environment ===
-const RANDOM_XORSHIFT128PLUS 'xorshift128plus'; // fast, lightweight, default +
-const RANDOM_MT19937 'mt19937'; // for compatibility +
-const RANDOM_SECURE 'secure'; // required Cryptographically-Secure PRNG +
-const RANDOM_USER 'user'; // userland implementation+
  
-class Random +The appropriate RNG can be selected depending on the environment.
-+
-    public function __construct(string $algo = RANDOM_XORSHIFT128PLUS, ?int $seed = null);+
  
-    // For user. +For examplesay you want to use PRNG with a seed in development, but would like to use CSPRNG in production. This would be easily achievable with the following code.
-    public static function getNonBiasedMax(string $algo): int; +
-    public function getInt(int $min = PHP_INT_MINint $max = PHP_INT_MAX): int; +
-    public function getBytes(int $length): string; +
-    public function shuffleArray(array $array): array; +
-    public function shuffleString(string $string): string;+
  
-    // For serialize / unserialize. (but, NOT always available.+<code php> 
-    public function __serialize(): array+$rng = $is_production 
-    public function __unserialize(array $data): void;+    ? new Random\Engine\Secure() 
 +    : new Random\Engine\PCG64(1234); 
 +  
 +$randomizer = new Random\Randomizer($rng); 
 +$randomizer->shuffleString('foobar'); 
 +</code>
  
-    // MUST override in RANDOM_USER+=== Fixed Random Number Sequence === 
-    protected function next(): int;+ 
 +Processes that continue to generate random numbers until certain requirements are met may make it difficult to measure the processing load
 + 
 +<code php> 
 +$required_result = mt_rand(1, 100)
 +while (($generated = mt_rand(1, 100)) !== $required_result) { 
 +    echo "retry\n";
 } }
 +
 +echo "done\n";
 </code> </code>
  
-This single class is used to provide the processing using PRNG.+Interface and dynamic injections, allowing for the fixed sequences at test time.
  
-This class switches the PRNG implementation to be used by the constructor argument $algo. It is just like the password_hash() function.+<code php> 
 +$engine = new class () implements Random\Engine { 
 +    public function generate(): string 
 +    { 
 +        // Result must be a string. 
 +        return pack('V', 1); 
 +    } 
 +}; 
 +$randomizer = new Random\Randomizer($engine);
  
-Also, the static method getNonBiasedMax() allows the user to get the non-biased RNG range.+$required_result = $randomizer->getInt(1, 100)
 +while (($generated = $randomizer->getInt(1, 100)) !== $required_result) { 
 +    echo "retry\n"; 
 +}
  
-This allows us to rewrite the first example as follows:+echo "done\n"; 
 +</code> 
 + 
 +=== Cryptographically Secure Random Operations === 
 + 
 +Shuffling strings and arrays using CSPRNG (or any other RNG besides Mersenne Twister) was only achievable by implementing it in userland. This can now be done without writing userland code.
  
 <code php> <code php>
-// example 1 +$engine = new Random\Engine\Secure(); 
-echo foo(1234, function (): void {}) . PHP_EOL// Result: 1480009472 +$randomizer = new Random\Randomizer($engine);
-echo foo(1234, function (): void { mt_rand(); }. PHP_EOL// Result: 1480009472+
  
-function foo(int $seed, callable $bar): int { +$items range(110); 
-    $random new Random(RANDOM_MT19937$seed); +$items = $randomizer->shuffleArray($items); 
-    $max = Random::getNonBiasedMax(RANDOM_MT19937); +</code>
-    $result = $random->getInt(0, $max); +
-    $bar(); +
-    $result += $random->getInt(0, $max); +
-    return $result; +
-}+
  
-// example 2 +=== State safe ===
-$random new Random(RANDOM_MT19937, 1234); +
-$max Random::getNonBiasedMax(RANDOM_MT19937); +
-echo $random->getInt(0, $max) . PHP_EOL; // Result: 411284887+
  
-$random = new Random(RANDOM_MT199371234); +Since the scope is limited to the engine instance, unintentional state changes caused by things such as external packages and Fiber are completely prevented.  
-$max = Random::getNonBiasedMax(RANDOM_MT19937); + 
-str_shuffle('foobar'); +==== Approach ==== 
-echo $random->getInt(0, $max) . PHP_EOL; // Result: 411284887+ 
 +Implement the following new interfaces and classes. 
 + 
 +=== interface Random\Engine === 
 + 
 +Interface to provide random number generator engine. 
 + 
 +It has a single <php>generate(): string</php> method that generates random numbers as a binary string. This string must be non-empty and attempting to return an empty will result in a RuntimeException.  
 + 
 +If you implement a random number generator in PHPthe generated numbers must be converted to binary using the <php>pack()</php> function, and the values must be little-endian. 
 + 
 +Engine::generate() always returns the result as a string, so it is independent of the bit and endianness in the execution environment and always returns the same result for the same seed and sequence. 
 + 
 +However, if a string of 64-bit or greater is returned, it may be truncated for internal processing reasons. This currently applies only to user-defined classes. 
 + 
 +=== interface Random\CryptoSafeEngine === 
 + 
 +A marker interface to indicate that the implemented random number generator is cryptographically secure. 
 + 
 +=== interface Random\SerializableEngine === 
 + 
 +An interface indicating that the implemented random number generator is serializable. 
 + 
 +The following methods must be implemented: 
 + 
 +  * <php>__serialize(): array</php> 
 +  * <php>__unserialize(array $data): void</php> 
 + 
 +==class Random\Engine\CombinedLCG === 
 + 
 +Generate random numbers using the CombinedLCG algorithm. 
 + 
 +By passing a value to the constructor, it can be seeded with any value. If omitted or null, the seed value is generated by CSPRNG. 
 + 
 +The following interfaces are implemented: 
 + 
 +  * <php>Random\Engine</php> 
 +  * <php>Random\SerializableEngine</php> 
 + 
 +The following methods are implemented: 
 + 
 +  * <php>__construct(int|null $seed = null)</php> 
 +  * <php>generate(): string</php> 
 +  * <php>__serialize(): array</php> 
 +  * <php>__unserialize(array $data): void</php> 
 + 
 +The values generated by calling CombinedLCG::generate() with the same seed will always return the same sequence of results, e.g.: 
 + 
 +<code php> 
 +$seed = 1234
 + 
 +$engine = new \Random\Engine\CombinedLCG($seed); 
 +var_dump(bin2hex($engine->generate())); // "fc6ff102" 
 +var_dump(bin2hex($engine->generate())); // "40e0ce05" 
 + 
 +// same seed results in same sequence of results. 
 +$engine = new \Random\Engine\CombinedLCG($seed); 
 +var_dump(bin2hex($engine->generate())); // "fc6ff102" 
 +var_dump(bin2hex($engine->generate())); // "40e0ce05"
 </code> </code>
  
-Similarly, several C APIs have been added to the PHP core. This can be used to add non-standard PRNGs.+=== class Random\Engine\MersenneTwister ==
  
-<code c+Generate random numbers using the MT19937 (a.k.a Mersenne Twister) algorithm. 
-// Note: The detailed implementation is tentative. + 
-typedef struct _php_random_class_algo { +By passing a value to the constructor, it can be seeded with any value. If omitted or null, the seed value is generated by CSPRNG. 
-    int64_t max; +The second argument, passing MT_RAND_PHP, allows the use of PHP's broken Mersenne Twister. 
-    int64_t (*next)(void *state); + 
-    void (*seed)(void *stateconst zend_long *seed)// allows NULL. +The following interfaces are implemented: 
-    int (*serialize)(void *state, zval *data); // allows NULL. + 
-    int (*unserialize)(void *state, zval *data); // allows NULL. +  * <php>Random\Engine</php
-    void *state; +  * <php>Random\SerializableEngine</php> 
-} php_random_class_algo;+ 
 +The following methods are implemented: 
 + 
 +  <php>__construct(int|null $seed = null, int $mode = MT_RAND_MT19937)</php> 
 +  * <php>generate(): string</php> 
 +  <php>__serialize(): array</php> 
 +  * <php>__unserialize(array $data): void</php> 
 + 
 +The values generated by calling MersenneTwister::generate() with the same seed will always return the same sequence of resultse.g.: 
 + 
 +<code php> 
 +$seed = 1234
 + 
 +$engine = new \Random\Engine\MersenneTwister($seed)
 +var_dump(bin2hex($engine->generate())); // "2f6b0731" 
 +var_dump(bin2hex($engine->generate())); // "d3e2667f"
  
-int php_random_class_algo_register(const char *ident, const php_random_class_algo *algo); +// same seed results in same sequence of results. 
-void php_random_class_algo_unregister(const char *ident);+$engine = new \Random\Engine\MersenneTwister($seed); 
 +var_dump(bin2hex($engine->generate())); // "2f6b0731" 
 +var_dump(bin2hex($engine->generate())); // "d3e2667f"
 </code> </code>
  
-In php_random_class_algo, the implementation of serialize / unserialize is optional. If the RNG you implement does not support it, you can specify NULL as a member of the structure, In that case, an exception will be thrown during serialization.+=== class Random\Engine\PCG64 ===
  
-Alsofor RNGs that do not (or cannotuse seed values, the function pointer for seed is optional. If this is passed to a null RNG, an exception will be thrown if a seed value is passed.+Generate random numbers using the PCG64 (Permuted Congruential Generatorpcg_oneseq_128algorithm.
  
-This class also supports the RNG implementation of userlandThis is useful for tests that want to fix the expected calculation cost.+By passing a value to the constructor, it can be seeded with any valueIf omitted or null, the seed value is generated by CSPRNG. 
 +A string can also be passed as the seed value, and a string is required to seed with 64-bit or higher values. The string must be 128-bit. 
 + 
 +The following interfaces are implemented: 
 + 
 +  * <php>Random\Engine</php> 
 +  * <php>Random\SerializableEngine</php> 
 + 
 +The following methods are implemented: 
 + 
 +  * <php>__construct(string|int|null $seed = null)</php> 
 +  * <php>generate(): string</php> 
 +  * <php>jump(int $advance): void</php> 
 +  * <php>__serialize(): array</php> 
 +  * <php>__unserialize(array $data): void</php> 
 + 
 +PCG64::jump() can be used to advance the state an arbitrary number of times. 
 + 
 +The values generated by calling PCG64::generate() with the same seed will always return the same sequence of results, e.g.:
  
 <code php> <code php>
-class FixedNumberForTest extends Random +$seed 1234;
-+
-    protected int $current 0;+
  
-    public function __construct() +$engine = new \Random\Engine\PCG64($seed); 
-    { +var_dump(bin2hex($engine->generate())); // "ecfbe5990a319380" 
-        parent::__construct(RANDOM_USER, null); +var_dump(bin2hex($engine->generate())); // "4f6b4a5b53b10e3f"
-    }+
  
-    protected function next(): int +// same seed results in same sequence of results. 
-    { +$engine = new \Random\Engine\PCG64($seed); 
-        return ++$this->current+var_dump(bin2hex($engine->generate()))// "ecfbe5990a319380" 
-    } +var_dump(bin2hex($engine->generate())); // "4f6b4a5b53b10e3f"
-}+
 </code> </code>
  
-===== Backward Incompatible Changes ===== +=== class Random\Engine\Secure ===
-The class name Random is reserved and will not be available in userland.+
  
-===== Proposed PHP Version(s===== +Secure::generate() cannot be seeded, and are non-reproducible because they are based on lower-layer true random numbers.
-8.1+
  
-===== RFC Impact ===== +Random number generated by this class is guaranteed to be CSPRNG.
-==== To SAPIs ==== +
-none+
  
-==== To Existing Extensions ==== +The following interfaces are implemented:
-none+
  
-==== To Opcache ==== +  * <php>Random\Engine</php> 
-none+  * <php>Random\CryptoSafeEngine</php>
  
-==== New Constants ==== +The following methods are implemented:
-  * RANDOM_XORSHIFT128PLUS +
-  * RANDOM_MT19937 +
-  * RANDOM_SECURE +
-  * RANDOM_USER+
  
-==== php.ini Defaults ==== +  * <php>__construct()</php> 
-none+  * <php>generate(): string</php>
  
-===== Open Issues =====+The sequence generated by Secure::generate() is not reproducible in any way.
  
-=== When $seed is null, what is used for the seed value? === +=== final class Random\Randomizer ===
-Depends on the implementation of algo, but basically it is using internal php_random_int(). +
-It is similar to mt_srand() from PHP 8.1.+
  
-- https://github.com/php/php-src/commit/53ee3f7f897f7ee33a4c45210014648043386e13+A single class for processing with random numbers using the engine.
  
-=== Why cancelled RNG Extension? === +The following methods are implemented:
-As a result of discussions during the draft, the functions became a single class and no longer need to be separated. +
-The functionality for random numbers is now included in ext/standard and will conform to this. (e.g. rand.c random.c)+
  
-=== Why not take an object oriented approach? === +  * <php>__construct(Random\Engine $engine new Random\Engine\Secure())</php> 
-This is because it is overly complex and difficult to useSee my previous proposal and the discussion in the internals ML for more details.+  * <php>getInt(): int // replaces mt_rand()</php> 
 +  * <php>getInt(int $minint $max) // replaces mt_rand() and random_int()</php> 
 +  * <php>getBytes(int length): string // replaces random_bytes()</php> 
 +  * <php>shuffleArray(array $array): array // replaces shuffle()</php> 
 +  * <php>shuffleString(string $string): string // replaces str_shuffle()</php> 
 +  * <php>__serialize(): array</php> 
 +  * <php>__unserialize(array $data): void</php>
  
-  * https://wiki.php.net/rfc/object_scope_prng +The engine in the constructor is optional, and if it is omitted, Secure is automatically used.
-  * https://externals.io/message/114378 +
-  * https://externals.io/message/112819+
  
-=== Why XorShift128+ as the default algorithm? === +This class is serializablebut will fail if the given Engine is not serializable.
-This algorithm is capable of generating 64-bit random numbers, is used by major browsers, and is well validated. On the other hand, MT19937, currently used by PHP, can only generate 32-bit random numbers.+
  
-=== Why keep the MT19937 implementation? === +The Randomizer calls the $this->engine->generate() method to generate valuesBut, Engine implemented by native is not limited to this and uses faster methods to generate values from the Engine.
-This is for compatibilityIt facilitates quick and easy migration.+
  
-=== What algorithm does RANDOM_SECURE use exactly? === +Randomizer methods are fully reproducible in any environment as long as the result of Engine::generate() is consistent. This means that it is independent of the endianness and bit depth of the execution environment, guaranteeing future compatibility.
-It uses php_random_bytes() internally. This API is guaranteed to be a CSPRNG under any circumstances.+
  
-=== Why support CSPRNG? Isn't random_int() good enough? === +However, if Randomizer::getInt() with no arguments is executed in a 32-bit environment using an Engine that generates values above 32-bita RuntimeException will be thrown to avoid incompatibility.
-The goal is to be able to migrate all RNG provided functions to this class in the future. +
-In other wordsto be able to write code without using any of the following functions:+
  
 +The engines implement a specific well-defined random number generator.
 +For a given seed it is guaranteed that they return the same sequence as the reference implementation.
 +For the Randomizer it is considered a breaking change if the observable behavior of the methods changes. 
 +For a given seeded engine and identical method parameters the following must hold:
 +
 +  * The number of calls to the Engine::generate() method remains the same.
 +  * The return value remains the same for a given result retrieved from Engine::generate().
 +
 +Any changes to the Randomizer that violate these guarantees require a separate RFC.
 +
 +===== PRNG shootout =====
 +
 +Since MT19937 has the aforementioned problems, an alternative algorithm must be chosen.
 +
 +When introducing a new RNG algorithm, the selection of the algorithm is very important. The following table shows the RNG algorithms that I considered and their characteristics.
 +
 +|                                      ^ Generate size  ^ State size     ^ Performance                          ^ Issues                                   ^ Implemented applications   ^
 +^ MT19937                   | 32-bit                | 32-bit x 624  | Normal                                    | Some statistical test failed  | PHP, Python, Ruby                 |
 +^ XorShift128+             | 64-bit                | 64-bit x 2       | Excellent                                | Failure BigCrush                     | V8, SpiderMonkey, JavaScriptCore |
 +^ Xoshiro256++            | 64-bit                | 64-bit x 4       | Excellent, SIMD-frendly        | Currently none                       | Rust                                        |
 +^ Xoshiro256* *             | 64-bit                | 64-bit x 4       | Excellent                                | Currently none                       | Rust, .NET 6.0                       |
 +^ PCG64 (XSL-RR)        | 64-bit                 | 128-bit x 2    | Good                                       | Currently none                       | Rust, NumPy                          |
 +
 +MT19937 and XorShift128+ are already widely used, but they have failed several statistical tests and are not recommended for new use.
 +So I adopted a more modern PRNG called PCG64 which does not have any statistical test problems.
 +
 +PCG64 is the only implementation of a reproducible RNG, except for MT19937 for compatibility and the special uses User and Secure.
 +
 +PCG64 (pcg_state_oneseq_128 XSL-RR) looked like a good fit since it uses 64-bit wide values.
 +PCG64 uses 128-bit integers, which cannot be used natively in 32-bit environments and needs to be emulated, 
 +but I think this is not a problem since most environments are now using 64-bit architectures.
 +
 +I also considered Xoshiro256** but chose PCG because all of the issues raised against PCG64 appeared to have been resolved appropriately.
 +The issues raised and the inventor's discussion of them can be found at
 +
 +  * https://pcg.di.unimi.it/pcg.php
 +  * https://www.pcg-random.org/posts/on-vignas-pcg-critique.html
 +
 +It is interesting to note that these algorithms have been heavily criticized by each other.
 +Both opinions were respectable, which made the selection process very difficult.
 +
 +I considered implementing both but adding unnecessary choices would have caused confusion for the users so the idea was dropped. If anyone thinks one is needed it can be added through PHP extensions.
 +
 +===== Internal Changes =====
 +
 +As a side effect of this RFC, the following PHP functions have been moved to the new ext/random extension.
 +
 +  * lcg_value()
   * srand()   * srand()
   * rand()   * rand()
   * mt_srand()   * mt_srand()
   * mt_rand()   * mt_rand()
-  * shuffle() 
-  * str_shuffle() 
-  * array_rand() 
   * random_int()   * random_int()
   * random_bytes()   * random_bytes()
  
-In order to use these functions properly, you need to understand PHP core. For many users, this can be difficult.+The following internal APIs will also be moved to the ext/random extension:
  
-=== Why isn't there a drop-in replacement API? === +  * php_random_int_throw() 
-There is no API that can simply replace the following functions:+  * php_random_int_silent() 
 +  * php_combined_lcg() 
 +  * php_mt_srand() 
 +  * php_mt_rand() 
 +  * php_mt_rand_range() 
 +  * php_mt_rand_common() 
 +  * php_srand() 
 +  * php_rand() 
 +  * php_random_bytes() 
 +  * php_random_int()
  
-  * shuffle() +This is because ext/standard/random.c reserves the name RANDOM and cannot be used by the extension. In addition, all RNG-related implementations will be moved to the new random extension in order to standardize the RNG implementation.
-  * array_rand()+
  
-The approach of these functions is not compatible with recent implementations. +The following header files are left in for extension compatibility
-shuffle() uses pass-by-reference, and array_rand() is too complex.+
  
-=== Why stop deprecation for some functions? === +  * ext/standard/php_lcg.h 
-The following functions have been removed from deprecation:+  * ext/standard/php_rand.h 
 +  * ext/standard/php_mt_rand.h 
 +  * ext/standard/php_random.h
  
-  * srand() +The contents all include ext/random/php_random.h.
-  * rand() +
-  * mt_srand() +
-  * mt_rand()+
  
-This is because it is still too early and inappropriate include it in one RFC.+<code c> 
 +#include "ext/random/php_random.h" 
 +</code>
  
-=== What will be the concrete C implementation? === +===== Future Scope ===== 
-Please waitIf the discussion in ML is goodI' ll start the implementation.+ 
 +These are not within the scope of this RFC, but are worth considering in the future: 
 + 
 +  * Remove old header files for compatibility (php_lcg.h, php_rand.h, php_mt_rand.h, php_random.h) 
 +  * Deprecate lcg_value(), mt_srand(), srand() 
 + 
 +===== Backward Incompatible Changes ===== 
 + 
 +The following names have been reserved and will no longer be available 
 + 
 +  * Random 
 +  * Random\Engine 
 +  * Random\CryptoSafeEngine 
 +  * Random\SerializableEngine 
 +  * Random\Engine\CombinedLCG 
 +  * Random\Engine\MersenneTwister 
 +  * Random\Engine\PCG64 
 +  * Random\Engine\Secure 
 +  * Random\Randomizer 
 + 
 +===== Proposed PHP Version(s) ===== 
 +8.
 + 
 +===== RFC Impact ===== 
 +==== To SAPIs ==== 
 +none 
 + 
 +==== To Existing Extensions ==== 
 +In the futureit may be necessary to change the included header files to point to ext/random/php_random.h. However, compatibility will be maintained for now. 
 + 
 +==== To Opcache ==== 
 +none 
 + 
 +==== New Constants ==== 
 +none 
 + 
 +==== php.ini Defaults ==== 
 +none 
 + 
 +===== Open Issues ===== 
 +none
  
 ===== Vote ===== ===== Vote =====
-Voting opens 2021-MM-DD and 2021-MM-DD at 00:00:00 EDT. 2/3 required to accept.+Voting opens 2022-06-14 and 2022-06-28 at 00:00:00 UTC. 2/3 required to accept.
  
-<doodle title="Add Random class" auth="zeriyoshi" voteType="single" closed="true"> +<doodle title="Add Random extension" auth="zeriyoshi" voteType="single" closed="true">
    * Yes    * Yes
    * No    * No
Line 247: Line 586:
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
-TBD+  * https://github.com/php/php-src/pull/8094 
 + 
 +===== Errata ===== 
 + 
 +==== Follow Up RFC: Random Extension Improvement ==== 
 + 
 +The [[rfc:random_extension_improvement|Random Extension Improvement]] follow-up RFC made several adjustments before the initial implementation was merged.  
 + 
 +==== Split of Randomizer::getInt() into ::getInt() and ::nextInt() ==== 
 + 
 +The parameter-less variant of <php>Randomizer::getInt()</php> was split into <php>Randomizer::nextInt()</php>. See pull request GH-9057: https://github.com/php/php-src/pull/9057
rfc/rng_extension.1621957202.txt.gz · Last modified: 2021/05/25 15:40 by zeriyoshi