rfc:deprecations_php_8_3

PHP RFC: Deprecations for PHP 8.3

Introduction

The RFC proposes to deprecate the listed functionality in PHP 8.3 and remove it in PHP 9.

The following list provides a short overview of the functionality targeted for deprecation, while more detailed explanation is provided in the Proposal section:

  • Passing negative $widths to mb_strimwidth()
  • The NumberFormatter::TYPE_CURRENCY constant
  • Unnecessary crypt() related constants
  • MT_RAND_PHP
  • Global Mersenne Twister

Proposal

Each feature proposed for deprecation is voted separately and requires a 2/3 majority. All votes refer to deprecation in PHP 8.3 and removal in PHP 9.0.

Passing negative $widths to mb_strimwidth()

Regarding the $width argument of mb_strimwidth(), the PHP manual states: “Negative widths count from the end of the string.” In other words, if $width is -2, then either two halfwidth characters or one fullwidth character should be trimmed from the end of the string.

This feature was introduced in git revision 70187267b4 (January 2016), and was contributed by Francois Laupretre francois@php.net.

Although I have not seen anything written by F. Laupretre to explain what the anticipated usage of the feature was, it seems to have very limited utility. mb_strimwidth() is typically used to trim strings down to a length which can be printed in a terminal without wrapping. It seems very unusual that anyone would want to set the terminal width of a string to “its current value less N”. (One possibility is that the feature was introduced for consistency with other standard library functions which accept a negative argument to indicate “count back from the end of a string”.)

From the time the feature was merged until now, it has always had a bug when combined with a non-zero $from argument. The implementation does arithmetic which combines codepoint counts with terminal width counts, with erroneous results. (This is just like the proverbial “apples and oranges”.) If there are any fullwidth characters in the prefix which is skipped because of non-zero $from, then mb_strimwidth() will not trim the requested width from the end of the string.

It is notable that in 9 years, no user ever noticed and reported this bug. My guess is that almost no-one uses the negative width feature. This would explain why the bug was not noticed.

To implement the feature correctly, without the bug mentioned above, an extra pass over the prefix of the input string identified by $from would be needed to determine its terminal width. This operation requires O(n) time.

When preparing this RFC, I searched the Internet for existing open-source software which uses mb_strimwidth() and reviewed over 100 such projects. None of them used negative $width arguments. However, if readers are aware of existing projects which rely on the negative $width feature, please add that information here.

Deprecate passing negative $widths to mb_strimwidth()
Real name Yes No
alcaeus (alcaeus)  
alec (alec)  
ashnazg (ashnazg)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
colinodell (colinodell)  
crell (crell)  
derick (derick)  
dharman (dharman)  
galvao (galvao)  
girgias (girgias)  
heiglandreas (heiglandreas)  
kalle (kalle)  
kocsismate (kocsismate)  
mauricio (mauricio)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
nikic (nikic)  
ocramius (ocramius)  
petk (petk)  
pierrick (pierrick)  
ramsey (ramsey)  
sebastian (sebastian)  
sergey (sergey)  
theodorejb (theodorejb)  
trowski (trowski)  
Final result: 27 0
This poll has been closed.

The NumberFormatter::TYPE_CURRENCY constant

This constant is unused since it was added to PHP (in PHP 5.3 or before), likely it was meant to call NumberFormatter::formatCurrency()/NumberFormatter::parseCurrency() from NumberFormatter::format()/NumberFormatter::parse() respectively. However, this was never implemented, likely because there is no way to pass the necessary currency argument.

Deprecate and remove the NumberFormatter::TYPE_CURRENCY constant
Real name Yes No
alcaeus (alcaeus)  
alec (alec)  
ashnazg (ashnazg)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
colinodell (colinodell)  
crell (crell)  
derick (derick)  
dharman (dharman)  
galvao (galvao)  
girgias (girgias)  
heiglandreas (heiglandreas)  
kalle (kalle)  
kocsismate (kocsismate)  
levim (levim)  
mauricio (mauricio)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
nikic (nikic)  
ocramius (ocramius)  
petk (petk)  
pierrick (pierrick)  
ramsey (ramsey)  
sebastian (sebastian)  
sergey (sergey)  
theodorejb (theodorejb)  
timwolla (timwolla)  
trowski (trowski)  
Final result: 29 0
This poll has been closed.

There are a couple of constants which store some information about the current system related to the crypt() function:

  • CRYPT_STD_DES: whether standard DES salts are supported
  • CRYPT_BLOWFISH: whether BlowFish salts are supported
  • CRYPT_EXT_DES: whether extended DES salts are supported
  • CRYPT_MD5: whether MD5 salts are supported
  • CRYPT_SHA256: whether SHA-256 salts are supported
  • CRYPT_SHA512: whether SHA-512 salts are supported

However, these constants are always 1 for a long time (at least 6 years), because the above salts are supported on all systems. Therefore, they are unnecessary, and they give a false impression that the capabilities in question are conditionally available.

Deprecate and remove the CRYPT_* constants?
Real name Yes No
alcaeus (alcaeus)  
alec (alec)  
ashnazg (ashnazg)  
brzuchal (brzuchal)  
bukka (bukka)  
bwoebi (bwoebi)  
crell (crell)  
derick (derick)  
dharman (dharman)  
galvao (galvao)  
kalle (kalle)  
kocsismate (kocsismate)  
mauricio (mauricio)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
petk (petk)  
pollita (pollita)  
ramsey (ramsey)  
sergey (sergey)  
stas (stas)  
theodorejb (theodorejb)  
timwolla (timwolla)  
Final result: 14 10
This poll has been closed.

MT_RAND_PHP

Authors: Tim Düsterhus timwolla@php.net, Go Kudo zeriyoshi@php.net

The implementation of Mt19937 (“Mersenne Twister”) in PHP versions before 7.1 contains two bugs:

  • mt_rand() without parameters returns different numbers than the reference implementation, due to a typo in a variable name.
  • mt_rand($min, $max) with a restricted range uses a broken scaling algorithm based on floating point arithmetic. As doubles have only 53 Bits of precision, this will introduce a bias if a range larger than 53 Bits is requested.

Both of these issues were fixed in PHP 7.1 by the RNG fixes and changes RFC that also aliased rand() to mt_rand(). The MT_RAND_PHP constant was added to allow developers that rely on a specific sequence for a given seed to opt into the old implementation with the non-standard Mt19937 algorithm and the biased scaler.

The object-oriented Mt19937 engine that was introduced in the Random Extension 5.x RFC in PHP 8.2 also supports the MT_RAND_PHP option to allow developers a smooth migration from the global instance of Mt19937 as used with mt_rand() to the object-oriented Random\Randomizer.

Supporting the biased scaler with the object-oriented API requires a special case for Mt19937 in the internal implementation. It furthermore required a fix (PR 9197), because the biased scaler in PHP 8.1 and earlier exhibits undefined behavior that was only caught by the newly added tests in PHP 8.2. Thus the behavior for certain inputs changed between PHP 8.1 and PHP 8.2 and could not be relied on even in older versions, as the results depend on the compiler used.

The only purpose of the constant/broken mode is backwards compatibility, but this cannot be achieved, due to the bad scaler being broken by design and relying on undefined behavior. As such it fails at its sole purpose. The bad scaling is also intransparent to the developer, as it silently returns biased results for certain inputs.

To clean up the special cases internal implementation and to clean up the API for the developer, MT_RAND_PHP should be deprecated.

Deprecate and remove the broken pre-PHP 7.1 Mt19937 implementation (MT_RAND_PHP)?
Real name Yes No
alcaeus (alcaeus)  
alec (alec)  
ashnazg (ashnazg)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
colinodell (colinodell)  
crell (crell)  
dharman (dharman)  
galvao (galvao)  
girgias (girgias)  
heiglandreas (heiglandreas)  
kalle (kalle)  
kocsismate (kocsismate)  
mauricio (mauricio)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
nielsdos (nielsdos)  
nikic (nikic)  
ocramius (ocramius)  
petk (petk)  
ramsey (ramsey)  
sebastian (sebastian)  
sergey (sergey)  
theodorejb (theodorejb)  
timwolla (timwolla)  
trowski (trowski)  
zeriyoshi (zeriyoshi)  
Final result: 28 0
This poll has been closed.

Global Mersenne Twister

Authors: Tim Düsterhus timwolla@php.net, Go Kudo zeriyoshi@php.net

Before PHP 8.2, PHP provided two kinds of random number generators. A seedable random number generator using the Mt19937 (“Mersenne Twister”) algorithm and a cryptographically secure random number generator (CSPRNG). The former stores its state in an implicit global variable and the latter is not seedable. This made it hard to achieve reproducible results for testing, as the use of an implicit global variable makes it hard to determine what operation modify the Mt19937 state, thus changing future values.

To fix this Random Extension 5.x RFC for PHP 8.2 introduces object-based random number algorithms (“Engines”) that store their entire state within an object. Generating numbers with one object, won't affect the sequence of another object.

The API of the random extension that relates to the generation of random integers currently looks like the following:

  • Global Mt19937
  • CSPRNG
    • random_int()
  • \Random\Randomizer
    • ->getInt()

Thus there are three functions returning a random integer (mt_rand(), rand(), random_int()) and the object-based and pluggable Randomizer::getInt().

The functions using the global Mt19937 instance are the worst choice:

  • They are not cryptographically secure.
  • They are not properly reproducible, because the state could be changed unpredictably by any function call, e.g. when a library is updated and adds additional calls to mt_rand().
  • Any function calling mt_srand() with a fixed seed for whatever reason, will ruin randomness for the remainder of the request.
  • PHP's Mt19937 implementation only supports passing a single 32 bit integer as the initial seed. Thus there are only ~4 billion possible sequences of random integers generated by Mt19937. If a random seed is used, collisions are expected with 50% probability after roughly 2**16 seeds (as per the birthday paradox). In other words: After roughly 80000 requests without explicit seeding it is likely that a seed and thus a sequence is reused.
  • Both the quality of the returned randomness as well as the generation performance of Mt19937 is worse than that of the Xoshiro256StarStar and PcgOneseq128XslRr64 engines that are provided in the object-oriented API.
  • They are functions with overloaded signatures, which are problematic for the reasons outlined in the “Deprecate functions with overloaded signaturesRFC.

But at the same time the functions using the global Mt19937 instance are the obvious choice for less experienced developers: The function names are the shortest ones and they appear to do the right thing at a first glance. Due to their age they also made their way into various tutorials and StackOverflow answers and thus code bases, possibly causing security issues when the use case would have required the use of cryptographically secure randomness. As an example a GitHub Search reveals that UUIDv4 implementations based on a highly-voted Y2010 StackOverflow answer that uses mt_rand() are not uncommon, as per above, UUID collisions are expected after 80000 requests if nothing else uses randomness within the request.

The limitations of the global Mt19937 are effectively unfixable by modifying the existing functions, because developers expect to be able to retrieve a specific sequence with a given seed (even when considering that any function call could unpredictably change the state).

To clean up the API and to guide developers to better alternatives, the global Mt19937 should be deprecated and then removed. The function-based API will then provide just the random_int() function which is the “secure by default” choice based on the CSPRNG. If reproducible sequences are required, the Xoshiro256StarStar and PcgOneseq128XslRr64 engines of the object-oriented API provide high-quality randomness, great performance and much larger initial seeds.

If reproducing existing Mt19937 sequences is required, the object-oriented API provides a drop-in replacement for the global Mt19937 using the \Random\Engine\Mt19937 class.

The following functions shall be deprecated:

A userland replacement can be written by leveraging $GLOBALS to store an \Random\Randomizer object with a \Random\Engine\Mt19937 engine (https://3v4l.org/B18he):

<?php
 
function my_mt_srand(?int $seed = null) {
	$GLOBALS['my_mt_rand'] = new \Random\Randomizer(new \Random\Engine\Mt19937($seed));
}
function my_mt_rand($min = null, $max = null) {
	if (!isset($GLOBALS['my_mt_rand'])) {
		$GLOBALS['my_mt_rand'] = new \Random\Randomizer(new \Random\Engine\Mt19937());
	}
 
	if ($min === null && $max === null) {
		return $GLOBALS['my_mt_rand']->nextInt();
	}
 
	return $GLOBALS['my_mt_rand']->getInt($min, $max);
}
 
mt_srand(1234);
my_mt_srand(1234);
 
var_dump(mt_rand());
var_dump(my_mt_rand());
var_dump(mt_rand(1, 1000));
var_dump(my_mt_rand(1, 1000));
Deprecate the global Mt19937?
Real name Yes No
alcaeus (alcaeus)  
alec (alec)  
ashnazg (ashnazg)  
brzuchal (brzuchal)  
bukka (bukka)  
bwoebi (bwoebi)  
colinodell (colinodell)  
crell (crell)  
derick (derick)  
dharman (dharman)  
galvao (galvao)  
kalle (kalle)  
kocsismate (kocsismate)  
mauricio (mauricio)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
nielsdos (nielsdos)  
nikic (nikic)  
ocramius (ocramius)  
petk (petk)  
pollita (pollita)  
ramsey (ramsey)  
sergey (sergey)  
stas (stas)  
theodorejb (theodorejb)  
timwolla (timwolla)  
trowski (trowski)  
zeriyoshi (zeriyoshi)  
Final result: 13 16
This poll has been closed.

In addition to the 6 functions related to integer generation, the global Mt19937 is also used for the following functions:

The proposed deprecation and removal of mt_srand() will affect these functions, as they will no longer be seedable, thus a decision needs to be taken with regard to the behavior of these functions, if the global Mt19937 is removed. There are two options:

  • Deprecate and remove array_rand(), shuffle(), str_shuffle(). A drop-in replacement is available by using \Random\Randomizer
  • Migrate array_rand(), shuffle(), str_shuffle() to use the CSPRNG internally. The functions will still be available, but no longer seedable. This also adds new failure cases, because the CSPRNG might fail.
What to do with the non-integer functions using the global Mt19937 if the previous vote passes?
Real name Deprecate together with mt_srand() Convert to CSPRNG
alcaeus (alcaeus)  
ashnazg (ashnazg)  
bukka (bukka)  
bwoebi (bwoebi)  
crell (crell)  
derick (derick)  
galvao (galvao)  
girgias (girgias)  
heiglandreas (heiglandreas)  
kalle (kalle)  
kocsismate (kocsismate)  
mauricio (mauricio)  
mbeccati (mbeccati)  
nicolasgrekas (nicolasgrekas)  
nielsdos (nielsdos)  
ocramius (ocramius)  
petk (petk)  
ramsey (ramsey)  
sergey (sergey)  
stas (stas)  
thekid (thekid)  
theodorejb (theodorejb)  
timwolla (timwolla)  
trowski (trowski)  
zeriyoshi (zeriyoshi)  
Final result: 1 24
This poll has been closed.

Deprecate calling ''ldap_connect'' with 2 parameters

Author: Andreas Heigl heiglandreas@php.net

Currently there are three ways one can call ldap_connect.

  • With a string $uri
  • With a string $host and an int $port,
  • With even more parameters for those that did compile PHP with OracleLDAP.

The 3rd way of calling it is not even documented in the docs as it is a very niche edge-case that would only confuse most people.

The 2nd way of calling the function is based on the since some years deprecated underlying ldap_open function. Internally we already moved to the underlying ldap_initialize-function that requires passing an LDAP-URI. For that we are already converting the passed host and port into an LDAP-URI of the form ldap://$host:$port.

This already illustrates one of the issues that this way of calling the function implies: It is not possible to use ldaps as a schema using that way of calling ldap_connect as it will always use ldap as schema. No matter which port is passed.

A second reason why we should deprecate calling ldap_connect with two parameters is, that it does not allow one to pass multiple ldap-servers as it is possible using the LDAP-URI.

The impact of the BC break should be rather small as there are not many users actually using the LDAP-extension and there is a clear and easy migration path for those that use it: Instead of calling

ldap_connect($host, $port)

one calls

ldap_connect("ldap://$host:$port??369")

Also most of the users should not be affected at all as they are using 3rd party libraries that are already only using an LDAP-URI when calling ldap_connect like Laminas\Ldap or Symfony\Ldap

The documentation at https://www.php.net/ldap_connect also explicitly states (since 2018) that using host and port is considered deprecated.

Named parameters also only support ldap_connect(uri: 'ldap://example.com'). Calling ldap_connect(host:'example.com', port:369) will throw an error.

There already is a PR open that implements the deprecation so that for the PHP8.3 releases each call to ldap_connect with 2 parameters will emit a deprecation message so that people have enough time to adapt their code before we can actually remove using two parameters in PHP9.

Deprecate and remove calling ldap_connect with 2 parameters $host and $port
Real name Yes No
alcaeus (alcaeus)  
alec (alec)  
ashnazg (ashnazg)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
colinodell (colinodell)  
crell (crell)  
derick (derick)  
dharman (dharman)  
galvao (galvao)  
girgias (girgias)  
heiglandreas (heiglandreas)  
kalle (kalle)  
kocsismate (kocsismate)  
levim (levim)  
mauricio (mauricio)  
mbeccati (mbeccati)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
petk (petk)  
pollita (pollita)  
ramsey (ramsey)  
sergey (sergey)  
theodorejb (theodorejb)  
trowski (trowski)  
Final result: 26 0
This poll has been closed.

Backward Incompatible Changes

For PHP 8.3 additional deprecation notices will be emitted. The actual removal of the affected functionality will happen no earlier than PHP 9.0. For the Global Mersenne Twister specifically the removal will be left to an additional later vote, allowing to defer the removal based on the remaining usage.

rfc/deprecations_php_8_3.txt · Last modified: 2023/07/17 04:03 by girgias