[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*
PHP RFC: On-demand Name Mangling
- Version: 1.4
- Created Date: 2016-01-01
- Updated Date: 2019-07-16
- Author: Bishop Bettini bishop@php.net
- Status: Under Discussion
- First Published at: http://wiki.php.net/rfc/on_demand_name_mangling
Introduction
PHP marshals external key-value pairs into super-globals by mangling some disallowed characters to underscores:
- “
.
” and “_
” - If there is no “
]
”, then the first “[
” also becomes an “_
” -- all other left brackets remain intact - Other characters are left as is
As seen here, both for both $_ENV
and $_GET
:
# the shell environment variable "a.b" becomes "a_b" inside $_ENV $ /usr/bin/env "a.b=foo" php -r 'echo $_ENV["a_b"];' foo # a "[" also mangles to an underscore $ /usr/bin/env "a[b=foo" php -r 'echo $_ENV["a_b"];' foo # same mangling rules for $_REQUEST # Note how $ is ignored $ cat mangle.phpt --TEST-- How does $_REQUEST handle HTML form variables with unusual names? --GET-- a.b=dot&a$b=dollar&a%20b=space&a[b=bracket --FILE-- <?php print_r($_GET); ?> --EXPECTF-- Array ( [a_b] => bracket [a$b] => dollar ) $ pear run-tests --cgi=/usr/bin/php-cgi mangle.phpt Running 1 tests PASS How does $_REQUEST handle HTML form variables with unusual names?[mangle.phpt] TOTAL TIME: 00:00 1 PASSED TESTS 0 SKIPPED TESTS
Mangling has the undesirable consequence that many external variables may map to one PHP variable. For example, three separate HTML form elements named a.b
, a_b
and a[b
will all resolve to a_b
in the corresponding super-global, with the value from a[b
winning (because it was last). This leads to user confusion and userland work arounds, not to mention bug reports: #34882 and #42055 for example.
Automatic name mangling supported register_globals
and its kin like import_request_variables()
, but those features ended in August 2014. Name mangling isn't required for super-global marshaling, because the associative array nature of super-globals can accommodate any string variable name. So do we need automatic name mangling? Consider this hypothetical new test:
--TEST-- Name mangling logic removed from engine, placed in polyfill --GET-- a.b=dot&a_b=underscore&a$b=dollar&a%20b=space&a[b=bracket --FILE-- <?php print_r(get_defined_vars()); php_mangle_superglobals(); print_r(get_defined_vars()); ?> --EXPECTF-- Array ( [_GET] => Array ( [a.b] => dot [a_b] => underscore [a$b] => dollar [a b] => space [a[b] => bracket ) ) Array ( [_GET] => Array ( [a_b] => bracket [a$b] => dollar ) )
In this new implementation, the engine no longer mangles marshaled superglobals at startup. Instead, the ability to mangle names has moved to an optional, userland-provided polyfill function php_mangle_superglobals()
.
In the example above, an a_b
key was externally supplied. The call to php_mangle_superglobals
clobbered the original value of a_b
with the value of the last seen mangle-equivalent key (a[b
).
Importantly, the user made this mangling happen: the engine did not do it automatically.
The polyfill algorithm is simple:
- find all superglobal keys that violate the PHP unquoted variable name regex 1)
- for each, create a new mangled key linked to the corresponding value
Applications requiring name mangling may call the polyfill during their bootstrap phase to emulate prior engine behavior.
Proposal
This RFC proposes to remove automatic name mangling, with backward compatibility maintained through a userspace polyfill function that mangles super-globals on-demand:
- Upon acceptance:
- Update documentation that name mangling is deprecated and will be removed in 8.0
- Release a userland polyfill that implements the historic mangling behavior
- Polyfill shall be available via composer (but not PEAR)
- Next major release (currently 8.0):
- Remove all name mangling code in super-global marshaling functions
Discussion
These questions were raised in the mailing list discussion.
Should a notice be raised if the engine mangles a superglobal?
Before version 1.3, this RFC proposed raising an E_DEPRECATED
message (once per startup) when the engine mangled a name, so that developers were made aware of future changes. However, Rouven Weßling asked:
If I have a well behaved application that doesn’t rely on name mangling or have included the polyfill, how can I prevent a log message from being emitted when a user appends (unused) parameters to the query string that require mangling?
and Nikita Popov commented:
Even if it's only a single deprecation warning instead of multiple, it's still a deprecation warning that I, as the application author, have absolutely no control over. For me, a deprecation warning indicates that there is some code I must change to make that warning *go away*.
Sure, it's informative. But it's enough to be informative about this *once*, rather than every time a user makes an odd-ish request.
Given that (a) an application could get spammed by malicious users2), and (b) that documentation suffices to notify users of this change, then the RFC changed as of 1.3 to only document the removal of name mangling as of the next major version.
Should an INI configuration control mangling?
Nikita Popov suggested (and Stanislav Malyshev seconded) a counter-proposal to use an INI setting:
I would favor the introduction of a new ini setting. E.g. mangle_names=0 disables name mangling, while mangle_names=1 throws a deprecation warning on startup and enables name mangling. mangle_names=0 should be the default. That is essentially disable name mangling, but leave an escape hatch for those people who rely on it (for whatever reason).
An INI setting to disable mangling must be engine-wide (e.g., PHP_INI_SYSTEM
or PHP_INI_PERDIR
) as its historical effect occurs before userland code runs. Engine-wide settings are tricky because they force conditions across all instances of PHP running in a given SAPI process. In a hosted environment where many unrelated sites share the same engine configuration, it's possible that one site might require mangling while another site requires no-mangling. These two sites could not co-exist unless the site operator allows per directory configuration, which they may not. Thus, an INI setting would introduce operational problems for some definable sub-set of users.
It's still possible to provide an “escape hatch” for applications requiring name mangling: the polyfill described earlier. Applications need only include the polyfill code and add it to their bootstrapping. The polyfill would be available via Composer, and the polyfill would populate all the mangled variables as before.
The polyfill approach is considered superior to the INI approach for three reasons:
- Userland can maintain BC independent of system INI settings (which they may not control)
- The engine is completely cleaned of all mangling behavior (which means less code to fuss over)
- No additional weight of configuration values (which is a complaint point for many)
Should ''extract()'' automatically mangle names?
Early versions of this proposal (< v1.2) proposed using extract
to mangle names. Rowan Collins and others pointed out this was an unnecessary complication: preg_match
could also accomplish the goal. Thus, all references to extract
in this RFC have been removed.
However, extract()
should have the option to emit mangled names with a new constant (EXTR_MANGLE
). extract()
should also be fixed to export variables with any variable name, because they are all technically valid with the quoted variable syntax (${'foo.bar'}
). These will be handled as function fixes and not with this RFC.
Backward Incompatible Changes
This proposal introduces backward incompatible changes: any userland code relying on mangled names would have to either (a) change to using original external variable names or (b) re-mangle the super-globals with a polyfill.
The polyfill could be accomplished with code like:
function php_mangle_name($name) { $name = preg_replace('/[^a-zA-Z0-9_\x7f-\xff]/', '_', $name); return preg_replace('/^[0-9]/', '_', $name); } function php_mangle_superglobals() { if (version_compare(PHP_VERSION, '8.0.0', '<')) { return; } foreach ($_ENV as $var => &$val) { $mangled = php_mangle_name($var); if ($mangled !== $var) { $_ENV[$mangled] =& $val; } } // similar loops for $_GET, $_POST // similar logic for $_COOKIE and $_FILES }
To reduce the burden on userland, this polyfill library could be made available via Composer:
$ composer require php/mangle-superglobals ^1.0 $ cat app/bootstrap.php <?php require __DIR__ . '/vendor/autoload.php'; php_mangle_superglobals(); // ...
Proposed PHP Version(s)
PHP 8.0.
RFC Impact
To SAPIs
No impact.
To Existing Extensions
No impact.
To Opcache
No impact.
New Constants
None.
php.ini Defaults
None.
Open Issues
None.
Proposed Voting Choices
A simple yes/no voting option with a 2/3 majority required: “Remove name mangling in PHP 8.0?”
Patches and Tests
None yet. Implementations will follow vote.
Implementation
TODO: After the project is implemented, this section should contain
- the version(s) it was merged to
- a link to the git commit(s)
- a link to the PHP manual entry for the feature
Rejected Features
None so far.
max_input_vars
configuration option behaves similarly with the once-per-startup deprecation message proposed prior to version 1.3. The difference is the max_input_vars
message could be squelched by increasing the limit, whereas the proposed mangling message could never be squelched by user code