rfc:restrict_globals_usage

PHP RFC: Restrict $GLOBALS usage

Introduction

The $GLOBALS variable currently provides a direct reference to PHP's internal symbol table. Supporting this requires significant technical complexity, affects performance of all array operations in PHP, but is only rarely used. This RFC restricts supported usages of $GLOBALS to disallow the problematic cases, while allowing most code to continue working as-is.

First, some technical background on how $GLOBALS currently works is necessary. Consider this simple example:

$a = 1;
$GLOBALS['a'] = 2;
var_dump($a); // int(2)

The variable $a is stored inside a compiled-variable (CV) call frame slot on the virtual machine stack, which allows it to be accessed efficiently. In order to allow modification of the variable through $GLOBALS, the $GLOBALS array stores array elements of type INDIRECT, which contain a pointer to the CV slot.

As such, array operations on $GLOBALS need to check whether the accessed element is INDIRECT and perform a de-indirection operation. However, as any array could potentially be the $GLOBALS array, this check has to be performed for essentially all array operations on all arrays. This imposes an implementation and performance cost to account for a rarely used edge-case.

Additionally, the $GLOBALS array is excluded from the usual by-value behavior of PHP arrays:

$a = 1;
$globals = $GLOBALS; // Ostensibly by-value copy
$globals['a'] = 2;
var_dump($a); // int(2)

According to normal PHP semantics, $globals should be a copy of $GLOBALS and modifications of $globals should not have any impact on the global symbol table.

Finally, there currently is a mismatch between handling of integer keys between $GLOBALS and normal PHP arrays:

${1} = 1;
$GLOBALS[1] = 2;
var_dump(${1}); // int(1)

Normal PHP arrays will canonicalize integral string keys to integers, while symbol tables canonicalize integer keys to strings. As $GLOBALS interfaces between these two worlds, it cannot satisfy the rules of either.

An area where INDIRECT elements present only in $GLOBALS are particularly problematic are standard library functions. While array_* functions generally contain the necessary extra code to correctly handle $GLOBALS, this does not extend to the broader standard library. Functions that do not explicitly account for $GLOBALS will either silently misbehave, cause assertion failures, or crash. Functions from 3rd-party extensions almost certainly do not handle $GLOBALS.

Proposal

The core idea of this proposal is to move $GLOBALS from being a “real” variable with non-standard semantics, towards being a syntactical variable with two semantics:

  • Accesses of the form $GLOBALS[$var] will refer to the global variable $$var, and support all the usual variable operations, including writes. $GLOBALS[$var] = $value remains supported. A good way to think of this is that $GLOBALS[$var] works the same way as a variable-variable $$var, just accessing the global instead of the local scope.
  • Accesses of the form $GLOBALS (without a direct array dereference) will return the a read-only copy of the global symbol table.

This means that all operations in the following code will continue to work as they do now:

// Continues to work:
$GLOBALS['x'] = 1;
$GLOBALS['x']++;
isset($GLOBALS['x']);
unset($GLOBALS['x']);
// ...anything else using $GLOBALS['x'].

Read-only usage of $GLOBALS will also continue to work:

// Continues to work:
foreach ($GLOBALS as $var => $value) {
    echo "$var => $value\n";
}

In this case the only difference is that there will no longer be a recursive “GLOBALS” key, which currently needs to be filtered out from most uses of $GLOBALS.

What is no longer supported are writes to $GLOBALS taken as a whole. All of the following will generate a compile-time error:

// Generates compile-time error:
$GLOBALS = [];
$GLOBALS += [];
$GLOBALS =& $x;
$x =& $GLOBALS;
unset($GLOBALS);
// ...and any other write/read-write operation on $GLOBALS

Passing $GLOBALS by reference will trigger a runtime Error exception, as by-reference passing can generally only be established at runtime:

// Generates run-time Error exception:
by_ref($GLOBALS);

As $GLOBALS is now a read-only copy of the global symbol table, the previously incorrect behavior of this code is fixed:

// This no longer modifies $a. The previous behavior violated by-value semantics.
$globals = $GLOBALS;
$globals['a'] = 1;

The read-only copy will also use correct key canonicalization, as such the behavior of this code is fixed:

${1} = 1;
$GLOBALS[1] = 2;
var_dump(${1}); // int(2)

Impact on internals

From an implementation perspective, these changes mean that INDIRECT elements no longer have to be considered when working on ordinary PHP arrays (though many places didn't do so in the first place). However, INDIRECT elements still need to be considered when working with certain special hashtables. In particular, internal symbol tables, as well as object property tables may still contain INDIRECT elements. However, these special hashtables will never escape into ordinary PHP arrays.

Apart from manual checks for IS_INDIRECT, the use of the following APIs is no longer necessary:

  • *_IND() HT iteration macros. Suffix-free macros can be used instead.
  • *_ind() HT functions. Suffix-free functions can be used instead.
  • zend_array_count(). Use of zend_hash_num_elements() is now safe.

Backward Incompatible Changes

Indirect modification of $GLOBALS will no longer be supported, which is a backwards-incompatible change.

In the top 2k composer packages I found 23 cases that use $GLOBALS without directly dereferecing it (full list: https://gist.github.com/nikic/9fd95866f9811b349b947f63214ad7a9). Based on a cursory inspection, there are only two instances where $GLOBALS is not used in a read-only way:

  • By-ref passing of $GLOBALS in phpseclib: This code is doing something very peculiar, and I wasn't able to figure out what the purpose of the safe_serialize() function is. Possibly very old versions of PHP caused infinite recursion when serializing $GLOBALS?
  • $GLOBALS = array() in phpstorm-stubs: This is not real code, so the usage is not problematic.

As such, the impact of this change is expected to be fairly low. Which isn't to say non-existent: bwoebi has shared an example from his codebase that would be affected:

extract($GLOBALS, EXTR_REFS);
// ...
$GLOBALS += get_defined_vars();

Both of these lines constitute indirect modification and will no longer work. It is possible to rewrite them in terms of explicit loops:

foreach ($GLOBALS as $var => $_) $$var =& $GLOBALS[$var];
// ...
foreach (get_defined_vars() as $var => $value) $GLOBALS[$var] = $value;

Vote

Yes/No. Voting started 2020-12-23 and closes 2021-01-06.

Restrict $GLOBALS usage as specified?
Real name Yes No
as (as)  
asgrim (asgrim)  
ashnazg (ashnazg)  
beberlei (beberlei)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
cmb (cmb)  
crell (crell)  
cschneid (cschneid)  
derick (derick)  
dharman (dharman)  
dmitry (dmitry)  
doubaokun (doubaokun)  
duncan3dc (duncan3dc)  
ehrhardt (ehrhardt)  
galvao (galvao)  
girgias (girgias)  
ilutov (ilutov)  
jasny (jasny)  
jhdxr (jhdxr)  
kalle (kalle)  
kelunik (kelunik)  
kocsismate (kocsismate)  
lcobucci (lcobucci)  
levim (levim)  
marandall (marandall)  
mariano (mariano)  
mauricio (mauricio)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
nikic (nikic)  
ocramius (ocramius)  
patrickallaert (patrickallaert)  
petk (petk)  
pmjones (pmjones)  
pollita (pollita)  
ramsey (ramsey)  
reinders (reinders)  
reywob (reywob)  
sebastian (sebastian)  
sergey (sergey)  
sirsnyder (sirsnyder)  
svpernova09 (svpernova09)  
tandre (tandre)  
theodorejb (theodorejb)  
twosee (twosee)  
villfa (villfa)  
wyrihaximus (wyrihaximus)  
Final result: 48 0
This poll has been closed.
rfc/restrict_globals_usage.txt · Last modified: 2021/01/06 11:47 by nikic