rfc:on_demand_name_mangling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:on_demand_name_mangling [2016/01/02 03:38] – Fixed incorrect constant (typo) bishoprfc:on_demand_name_mangling [2019/07/16 12:25] (current) – Settled on formal polyfill name, php_mangle_superglobal bishop
Line 1: Line 1:
 ====== PHP RFC: On-demand Name Mangling ====== ====== PHP RFC: On-demand Name Mangling ======
-  * Version: 1.1 +  * Version: 1.4 
-  * Date: 2016-01-01 +  * Created Date: 2016-01-01 
-  * Author: Bishop Bettinibishop@php.net+  * Updated Date: 2019-07-16 
 +  * Author: Bishop Bettini <bishop@php.net>
   * Status: Under Discussion   * Status: Under Discussion
-  * First Published at: http://wiki.php.net/rfc/remove_name_mangling+  * First Published at: http://wiki.php.net/rfc/on_demand_name_mangling
  
 ===== Introduction ===== ===== Introduction =====
-PHP marshals external key-value pairs into super-globals by mangling some disallowed characters to underscores:+PHP marshals external key-value pairs into super-globals by [[https://github.com/php/php-src/blob/master/main/php_variables.c#L93|mangling]] some disallowed characters to underscores
 + 
 +  * "''.''" and "'' ''" become an "''_''" 
 +  * If there is no "'']''", then the first "''[''" also becomes an "''_''" -- all other left brackets remain intact 
 +  * Other characters are left as is 
 + 
 +As seen here, both for both ''$_ENV'' and ''$_GET'':
  
 <code> <code>
 # the shell environment variable "a.b" becomes "a_b" inside $_ENV # the shell environment variable "a.b" becomes "a_b" inside $_ENV
-$ /usr/bin/env "a.b=foo" php -d variables_order=E -r 'echo $_ENV["a_b"];'+$ /usr/bin/env "a.b=foo" php -r 'echo $_ENV["a_b"];'
 foo foo
  
 # a "[" also mangles to an underscore # a "[" also mangles to an underscore
-$ /usr/bin/env "a[b=foo" php -d variables_order=E -r 'echo $_ENV["a_b"];'+$ /usr/bin/env "a[b=foo" php -r 'echo $_ENV["a_b"];'
 foo foo
  
 # same mangling rules for $_REQUEST # same mangling rules for $_REQUEST
-curiously "$" does not mangle, even though it's not a valid PHP variable name+Note how is ignored
 $ cat mangle.phpt $ cat mangle.phpt
 --TEST-- --TEST--
Line 43: Line 50:
 </code> </code>
  
-Mangling has the undesirable consequence that //many// external variables may map to //one// PHP variable. For example, three separate HTML form elements named ''a.b'', ''a_b'' and ''a b'' will all resolve to ''a_b'' in the corresponding super-global, with the last seen value winning. This leads to user confusion and userland work arounds, not to mention bug reports: [[https://bugs.php.net/bug.php?id=34882|#34882]] and [[https://bugs.php.net/bug.php?id=42055|#42055]] for example.+Mangling has the undesirable consequence that //many// external variables may map to //one// PHP variable. For example, three separate HTML form elements named ''a.b'', ''a_b'' and ''a[b'' will all resolve to ''a_b'' in the corresponding super-global, with the value from ''a[b'' winning (because it was last). This leads to user confusion and userland work arounds, not to mention bug reports: [[https://bugs.php.net/bug.php?id=34882|#34882]] and [[https://bugs.php.net/bug.php?id=42055|#42055]] for example.
  
-Automatic name mangling supported ''[[http://php.net/manual/en/ini.core.php#ini.register-globals|register_globals]]'' and ''[[http://php.net/manual/en/function.import-request-variables.php|import_request_variables()]]'', but those features ended in August 2014. Name mangling isn't required for super-global marshaling, because the associative array nature of super-globals can accommodate any string variable name. So do we need automatic name mangling? Consider this hypothetical new test:+Automatic name mangling supported ''[[http://php.net/manual/en/ini.core.php#ini.register-globals|register_globals]]'' and its kin like ''[[http://php.net/manual/en/function.import-request-variables.php|import_request_variables()]]'', but those features ended in August 2014. Name mangling isn't required for super-global marshaling, because the associative array nature of super-globals can accommodate any string variable name. So do we need automatic name mangling? Consider this hypothetical new test:
  
 <code> <code>
 --TEST-- --TEST--
-Name mangling logic moved to extract()+Name mangling logic removed from engine, placed in polyfill
 --GET-- --GET--
-a.b=dot&a$b=dollar&a%20b=space&a[b=bracket+a.b=dot&a_b=underscore&a$b=dollar&a%20b=space&a[b=bracket
 --FILE-- --FILE--
 <?php <?php
-extract($_GET, EXTR_MANGLE);+print_r(get_defined_vars()); 
 +php_mangle_superglobals();
 print_r(get_defined_vars()); print_r(get_defined_vars());
 ?> ?>
Line 63: Line 71:
         (         (
             [a.b] => dot             [a.b] => dot
 +            [a_b] => underscore
             [a$b] => dollar             [a$b] => dollar
             [a b] => space             [a b] => space
             [a[b] => bracket             [a[b] => bracket
         )         )
- +
-    [a_b] => bracket+Array 
 +( 
 +    [_GET] => Array 
 +        ( 
 +            [a_b] => bracket 
 +            [a$b] => dollar 
 +        )
 ) )
 </code> </code>
  
-In this new implementation, marshaled superglobals are no longer mangled.  Instead, the //ability// to mangle names has moved to ''extract()'' This has the happy side effect of fixing ''extract()'' bug reports like [[https://bugs.php.net/bug.php?id=70344|#70344]].((''extract()'' should be able to extract non-conformant keys anyway, because they're accessible with the ''${'foo.bar'}'' syntaxThathowever, is out of scope for this RFC.))+In this new implementation, the engine no longer mangles marshaled superglobals at startup.  Instead, the //ability// to mangle names has moved to an optional, userland-provided polyfill function ''php_mangle_superglobals()''. 
 + 
 +In the example above, an ''a_b'' key was externally suppliedThe call to ''php_mangle_superglobals'' clobbered the original value of ''a_b'' with the value of the //last// seen mangle-equivalent key (''a[b''). 
 + 
 +Importantlythe user made this mangling happen: the engine did not do it automatically. 
 + 
 +The polyfill algorithm is simple: 
 + 
 +  * find all superglobal keys that violate the PHP unquoted variable name regex ((Unquoted variable names must match the regex ''[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'')) 
 +  * for each, create a new mangled key linked to the corresponding value 
 + 
 +Applications requiring name mangling may call the polyfill during their bootstrap phase to emulate prior engine behavior.
  
 ===== Proposal ===== ===== Proposal =====
-This RFC proposes to phase out automatic name mangling, replacing it with on-demand mangling in ''extract()''+This RFC proposes to remove automatic name mangling, with backward compatibility maintained through a userspace polyfill function that mangles super-globals on-demand: 
  
-  * Next minor release (currently 7.1)+  * Upon acceptance
-    * Emit an ''E_DEPRECATED'' warning the first time a variable is mangled. The warning indicates that name mangling on import will be removed in the next major PHP version.+    * Update documentation that name mangling is deprecated and will be removed in 8.
 +    * Release a userland polyfill that implements the historic mangling behavior 
 +    * Polyfill shall be available via composer (but not PEAR)
   * Next major release (currently 8.0):   * Next major release (currently 8.0):
-    * Remove all name mangling code in super-global marshalling functions +    * Remove all name mangling code in super-global marshaling functions
-    * Update ''extract()'' to mangle names, subject to the following additional rules: +
-      * Add a new constant, ''EXTR_MANGLE'', which converts any character outside the variable documented regex ((Unquoted variable names must match the regex ''[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'')) to an ''_''  +
-      * If a prefix is given by any of the ''EXTR_PREFIX_*'' constants, prepend that to any resulting mangled name +
-      * Honor ''EXTR_OVERWRITE'' and ''EXTR_SKIP'' using the mangled name as the check+
  
 ==== Discussion ==== ==== Discussion ====
Line 90: Line 114:
 These questions were raised in the mailing list discussion. These questions were raised in the mailing list discussion.
  
-=== Should multiple ''E_DEPRECATED'' be emitted? ===+=== Should a notice be raised if the engine mangles a superglobal? === 
 + 
 +Before version 1.3, this RFC proposed raising an ''E_DEPRECATED'' message (once per startup) when the engine mangled a name, so that developers were made aware of future changes. However, Rouven Weßling asked: 
 + 
 +> If I have a well behaved application that doesn’t rely on name mangling or have included the polyfill, how can I prevent a log message from being emitted when a user appends (unused) parameters to the query string that require mangling? 
 + 
 +and Nikita Popov commented:
  
-No, because we do not know how many instances of mangling may be present and we do not want to flood application logs.+> Even if it's only a single deprecation warning instead of multiple, it's still a deprecation warning that I, as the application author, have absolutely no control over. For me, a deprecation warning indicates that there is some code I must change to make that warning *go away*. 
 +> Sure, it's informative. But it's enough to be informative about this *once*, rather than every time a user makes an odd-ish request.
  
-The message intends to provide //some// warning to application developers when there is //known// use of name manglingAs sucha single warning when the mangler runs is sufficient to meet this intent.+Given that (a) an application could get spammed by malicious users((The ''max_input_vars'' configuration option behaves similarly with the once-per-startup deprecation message proposed prior to version 1.3. The difference is the ''max_input_vars'' message could be squelched by increasing the limitwhereas the proposed mangling message could never be squelched by user code)), and (b) that documentation suffices to notify users of this change, then the RFC changed as of 1.3 to only document the removal of name mangling as of the next major version.
  
 === Should an INI configuration control mangling? === === Should an INI configuration control mangling? ===
  
-Nikita Popov suggested:+Nikita Popov suggested (and Stanislav Malyshev seconded) a counter-proposal to use an INI setting:
  
 >  I would favor the introduction of a new ini setting. E.g. mangle_names=0 disables name mangling, while mangle_names=1 throws a deprecation warning on startup and enables name mangling. mangle_names=0 should be the default. That is essentially disable name mangling, but leave an escape hatch for those people who rely on it (for whatever reason). >  I would favor the introduction of a new ini setting. E.g. mangle_names=0 disables name mangling, while mangle_names=1 throws a deprecation warning on startup and enables name mangling. mangle_names=0 should be the default. That is essentially disable name mangling, but leave an escape hatch for those people who rely on it (for whatever reason).
  
-An INI setting to disable mangling must be engine-wide (i.e., ''PHP_INI_SYSTEM'') as its historical effect occurs before userland code runs. Engine-wide settings are tricky because they force conditions across all instances of PHP running in a given SAPI process.  In a hosted environment where many unrelated sites share the same engine configuration, it's possible that one site might require mangling while another site requires no-mangling.  These two sites could not co-exist. Thus, an INI setting would introduce operational problems for users.+An INI setting to disable mangling must be engine-wide (e.g., ''PHP_INI_SYSTEM'' or ''PHP_INI_PERDIR'') as its historical effect occurs before userland code runs. Engine-wide settings are tricky because they force conditions across all instances of PHP running in a given SAPI process.  In a hosted environment where many unrelated sites share the same engine configuration, it's possible that one site might require mangling while another site requires no-mangling.  These two sites could not co-exist unless the site operator allows per directory configuration, which they may not. Thus, an INI setting would introduce operational problems for some definable sub-set of users.
  
-However, there is an "escape hatch": userland code can emulate engine super-global mangling using the mangle-aware +It's still possible to provide an "escape hatch" for applications requiring name manglingthe polyfill described earlier. Applications need only include the polyfill code and add it to their bootstrapping. The polyfill would be available via Composer, and the polyfill would populate all the mangled variables as before. 
-''extract()''. An implementation is given in the Backward Compatibility section.+ 
 +The polyfill approach is considered superior to the INI approach for three reasons: 
 + 
 +  * Userland can maintain BC independent of system INI settings (which they may not control) 
 +  * The engine is completely cleaned of all mangling behavior (which means less code to fuss over) 
 +  * No additional weight of configuration values (which is a complaint point for many)
  
 === Should ''extract()'' automatically mangle names? === === Should ''extract()'' automatically mangle names? ===
  
-No, because this would introduce new, unnecessary BC breakageInstead, ''extract()'' should have the option to emit mangled names.+Early versions of this proposal (< v1.2) proposed using ''extract'' to mangle names. Rowan Collins and others pointed out this was an unnecessary complication: ''preg_match'' could also accomplish the goal Thus, all references to ''extract'' in this RFC have been removed. 
 + 
 +However, ''extract()'' should have the option to emit mangled names with a new constant (''EXTR_MANGLE'').  ''extract()'' should also be fixed to export variables with any variable name, because they are all technically valid with the quoted variable syntax (''${'foo.bar'}'').  These will be handled as function fixes and not with this RFC.
  
 ===== Backward Incompatible Changes ===== ===== Backward Incompatible Changes =====
-This proposal introduces backward incompatible changes: any userland code relying on mangled names would have to either (a) change to using original variable names or (b) re-mangle the super-globals with a polyfill. The latter case could be accomplished with code like:+This proposal introduces backward incompatible changes: any userland code relying on mangled names would have to either (a) change to using original external variable names or (b) re-mangle the super-globals with a polyfill.
  
-<code php> +The polyfill could be accomplished with code like:
-$mangler = function () { +
-    // mangle names like before +
-    extract($_ENV, EXTR_MANGLE);+
  
-    // push them into env +<code php> 
-    foreach (get_defined_vars() as $var => $val) { +function php_mangle_name($name) { 
-        if (! array_key_exists($var, $_ENV)) { +    $name = preg_replace('/[^a-zA-Z0-9_\x7f-\xff]/', '_', $name); 
-            $_ENV[$var] = $val;+    return preg_replace('/^[0-9]/', '_', $name); 
 +
 +function php_mangle_superglobals() 
 +    if (version_compare(PHP_VERSION, '8.0.0', '<')) { 
 +        return; 
 +    } 
 +    foreach ($_ENV as $var => &$val) { 
 +        $mangled = php_mangle_name($var); 
 +        if ($mangled !== $var) { 
 +            $_ENV[$mangled] =$val;
         }         }
     }     }
-}; +    // similar loops for $_GET, $_POST 
-$mangler();+    // similar logic for $_COOKIE and $_FILES 
 +}
 </code> </code>
  
-Similar algorithms could be applied to the other super-globals. +To reduce the burden on userland, this polyfill library could be made available via Composer:
- +
-To reduce the burden on userland, polyfill library could be made available to simplify this:+
  
 <code> <code>
-$ composer require php/mangle-polyfill ^1.0 +$ composer require php/mangle-superglobals ^1.0 
-$ cat example.php +$ cat app/bootstrap.php 
-<?php mangle_superglobals();+<?php 
 +require __DIR__ . '/vendor/autoload.php'; 
 + 
 +php_mangle_superglobals();
  
 +// ...
 </code> </code>
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
-PHP 7.1 (for notice of impending BC break) and PHP 8.0 (for actual implementation and corresponding BC break).+PHP 8.0.
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 160: Line 207:
  
 ===== Open Issues ===== ===== Open Issues =====
-None so far.+None.
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-A simple yes/no voting option with a 2/3 majority required.+A simple yes/no voting option with a 2/3 majority required: "Remove name mangling in PHP 8.0?"
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
rfc/on_demand_name_mangling.1451705881.txt.gz · Last modified: 2017/09/22 13:28 (external edit)