rfc:on_demand_name_mangling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:on_demand_name_mangling [2016/01/04 16:15] – Example implementation need not run in anything less than proposed implementation version. bishoprfc:on_demand_name_mangling [2019/07/16 12:25] (current) – Settled on formal polyfill name, php_mangle_superglobal bishop
Line 1: Line 1:
 ====== PHP RFC: On-demand Name Mangling ====== ====== PHP RFC: On-demand Name Mangling ======
-  * Version: 1.2+  * Version: 1.4
   * Created Date: 2016-01-01   * Created Date: 2016-01-01
-  * Updated Date: 2016-01-04 +  * Updated Date: 2019-07-16 
-  * Author: Bishop Bettinibishop@php.net+  * Author: Bishop Bettini <bishop@php.net>
   * Status: Under Discussion   * Status: Under Discussion
-  * First Published at: http://wiki.php.net/rfc/remove_name_mangling+  * First Published at: http://wiki.php.net/rfc/on_demand_name_mangling
  
 ===== Introduction ===== ===== Introduction =====
-PHP marshals external key-value pairs into super-globals by mangling some disallowed characters to underscores:+PHP marshals external key-value pairs into super-globals by [[https://github.com/php/php-src/blob/master/main/php_variables.c#L93|mangling]] some disallowed characters to underscores
 + 
 +  * "''.''" and "'' ''" become an "''_''" 
 +  * If there is no "'']''", then the first "''[''" also becomes an "''_''" -- all other left brackets remain intact 
 +  * Other characters are left as is 
 + 
 +As seen here, both for both ''$_ENV'' and ''$_GET'':
  
 <code> <code>
 # the shell environment variable "a.b" becomes "a_b" inside $_ENV # the shell environment variable "a.b" becomes "a_b" inside $_ENV
-$ /usr/bin/env "a.b=foo" php -d variables_order=E -r 'echo $_ENV["a_b"];'+$ /usr/bin/env "a.b=foo" php -r 'echo $_ENV["a_b"];'
 foo foo
  
 # a "[" also mangles to an underscore # a "[" also mangles to an underscore
-$ /usr/bin/env "a[b=foo" php -d variables_order=E -r 'echo $_ENV["a_b"];'+$ /usr/bin/env "a[b=foo" php -r 'echo $_ENV["a_b"];'
 foo foo
  
 # same mangling rules for $_REQUEST # same mangling rules for $_REQUEST
-curiously "$" does not mangle, even though it's not a valid PHP variable name+Note how is ignored
 $ cat mangle.phpt $ cat mangle.phpt
 --TEST-- --TEST--
Line 44: Line 50:
 </code> </code>
  
-Mangling has the undesirable consequence that //many// external variables may map to //one// PHP variable. For example, three separate HTML form elements named ''a.b'', ''a_b'' and ''a b'' will all resolve to ''a_b'' in the corresponding super-global, with the last seen value winning. This leads to user confusion and userland work arounds, not to mention bug reports: [[https://bugs.php.net/bug.php?id=34882|#34882]] and [[https://bugs.php.net/bug.php?id=42055|#42055]] for example.+Mangling has the undesirable consequence that //many// external variables may map to //one// PHP variable. For example, three separate HTML form elements named ''a.b'', ''a_b'' and ''a[b'' will all resolve to ''a_b'' in the corresponding super-global, with the value from ''a[b'' winning (because it was last). This leads to user confusion and userland work arounds, not to mention bug reports: [[https://bugs.php.net/bug.php?id=34882|#34882]] and [[https://bugs.php.net/bug.php?id=42055|#42055]] for example.
  
 Automatic name mangling supported ''[[http://php.net/manual/en/ini.core.php#ini.register-globals|register_globals]]'' and its kin like ''[[http://php.net/manual/en/function.import-request-variables.php|import_request_variables()]]'', but those features ended in August 2014. Name mangling isn't required for super-global marshaling, because the associative array nature of super-globals can accommodate any string variable name. So do we need automatic name mangling? Consider this hypothetical new test: Automatic name mangling supported ''[[http://php.net/manual/en/ini.core.php#ini.register-globals|register_globals]]'' and its kin like ''[[http://php.net/manual/en/function.import-request-variables.php|import_request_variables()]]'', but those features ended in August 2014. Name mangling isn't required for super-global marshaling, because the associative array nature of super-globals can accommodate any string variable name. So do we need automatic name mangling? Consider this hypothetical new test:
Line 50: Line 56:
 <code> <code>
 --TEST-- --TEST--
-Name mangling logic moved to extract()+Name mangling logic removed from engine, placed in polyfill
 --GET-- --GET--
-a.b=dot&a$b=dollar&a%20b=space&a[b=bracket+a.b=dot&a_b=underscore&a$b=dollar&a%20b=space&a[b=bracket
 --FILE-- --FILE--
 <?php <?php
 print_r(get_defined_vars()); print_r(get_defined_vars());
-mangle_superglobals();+php_mangle_superglobals();
 print_r(get_defined_vars()); print_r(get_defined_vars());
 ?> ?>
Line 65: Line 71:
         (         (
             [a.b] => dot             [a.b] => dot
 +            [a_b] => underscore
             [a$b] => dollar             [a$b] => dollar
             [a b] => space             [a b] => space
Line 74: Line 81:
     [_GET] => Array     [_GET] => Array
         (         (
-            [a.b] => dot+            [a_b] => bracket
             [a$b] => dollar             [a$b] => dollar
-            [a b] => space 
-            [a[b] => bracket 
         )         )
- 
-    [a_b] => bracket 
 ) )
 </code> </code>
  
-In this new implementation, the engine no longer mangles marshaled superglobals at startup.  Instead, the //ability// to mangle names has moved to an optional, userland-provided polyfill function ''mangle_superglobals()'' The polyfill algorithm is simple:+In this new implementation, the engine no longer mangles marshaled superglobals at startup.  Instead, the //ability// to mangle names has moved to an optional, userland-provided polyfill function ''php_mangle_superglobals()''. 
 + 
 +In the example above, an ''a_b'' key was externally supplied. The call to ''php_mangle_superglobals'' clobbered the original value of ''a_b'' with the value of the //last// seen mangle-equivalent key (''a[b''). 
 + 
 +Importantly, the user made this mangling happen: the engine did not do it automatically. 
 + 
 +The polyfill algorithm is simple:
  
   * find all superglobal keys that violate the PHP unquoted variable name regex ((Unquoted variable names must match the regex ''[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*''))   * find all superglobal keys that violate the PHP unquoted variable name regex ((Unquoted variable names must match the regex ''[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*''))
-  * for each, create a new mangled key linked to the corresponding value.+  * for each, create a new mangled key linked to the corresponding value
  
 Applications requiring name mangling may call the polyfill during their bootstrap phase to emulate prior engine behavior. Applications requiring name mangling may call the polyfill during their bootstrap phase to emulate prior engine behavior.
  
 ===== Proposal ===== ===== Proposal =====
-This RFC proposes to phase out automatic name mangling, replacing it with on-demand mangling in ''extract()''+This RFC proposes to remove automatic name mangling, with backward compatibility maintained through a userspace polyfill function that mangles super-globals on-demand: 
  
-  * Next minor release (currently 7.1)+  * Upon acceptance
-    * Emit an ''E_DEPRECATED'' warning the first time a variable is mangled. The warning indicates that name mangling on import will be removed in the next major PHP version.+    * Update documentation that name mangling is deprecated and will be removed in 8.
 +    * Release a userland polyfill that implements the historic mangling behavior 
 +    * Polyfill shall be available via composer (but not PEAR)
   * Next major release (currently 8.0):   * Next major release (currently 8.0):
-    * Remove all name mangling code in super-global marshalling functions +    * Remove all name mangling code in super-global marshaling functions
-    * Release a userland polyfill implementing ''mangle_superglobals'' that is available as a composable package+
  
 ==== Discussion ==== ==== Discussion ====
Line 104: Line 114:
 These questions were raised in the mailing list discussion. These questions were raised in the mailing list discussion.
  
-=== Should multiple ''E_DEPRECATED'' be emitted? ===+=== Should a notice be raised if the engine mangles a superglobal? ===
  
-No, because we do not know how many instances of mangling may be present and we do not want to flood application logs The proposed single message intends to provide //some// warning to application developers when there is //known// use of name mangling. This notice is similar to the behavior of the ''datetime.timezone'' INI option: at most once per startup.+Before version 1.3, this RFC proposed raising an ''E_DEPRECATED'' message (once per startup) when the engine mangled a name, so that developers were made aware of future changes. However, Rouven Weßling asked: 
 + 
 +> If I have a well behaved application that doesn’t rely on name mangling or have included the polyfill, how can I prevent a log message from being emitted when a user appends (unused) parameters to the query string that require mangling? 
 + 
 +and Nikita Popov commented: 
 + 
 +> Even if it's only a single deprecation warning instead of multiple, it's still a deprecation warning that I, as the application author, have absolutely no control over. For me, a deprecation warning indicates that there is some code I must change to make that warning *go away*. 
 +> Sure, it's informative. But it's enough to be informative about this *once*, rather than every time a user makes an odd-ish request. 
 + 
 +Given that (a) an application could get spammed by malicious users((The ''max_input_vars'' configuration option behaves similarly with the once-per-startup deprecation message proposed prior to version 1.3. The difference is the ''max_input_vars'' message could be squelched by increasing the limit, whereas the proposed mangling message could never be squelched by user code)), and (b) that documentation suffices to notify users of this change, then the RFC changed as of 1.3 to only document the removal of name mangling as of the next major version.
  
 === Should an INI configuration control mangling? === === Should an INI configuration control mangling? ===
Line 116: Line 135:
 An INI setting to disable mangling must be engine-wide (e.g., ''PHP_INI_SYSTEM'' or ''PHP_INI_PERDIR'') as its historical effect occurs before userland code runs. Engine-wide settings are tricky because they force conditions across all instances of PHP running in a given SAPI process.  In a hosted environment where many unrelated sites share the same engine configuration, it's possible that one site might require mangling while another site requires no-mangling.  These two sites could not co-exist unless the site operator allows per directory configuration, which they may not. Thus, an INI setting would introduce operational problems for some definable sub-set of users. An INI setting to disable mangling must be engine-wide (e.g., ''PHP_INI_SYSTEM'' or ''PHP_INI_PERDIR'') as its historical effect occurs before userland code runs. Engine-wide settings are tricky because they force conditions across all instances of PHP running in a given SAPI process.  In a hosted environment where many unrelated sites share the same engine configuration, it's possible that one site might require mangling while another site requires no-mangling.  These two sites could not co-exist unless the site operator allows per directory configuration, which they may not. Thus, an INI setting would introduce operational problems for some definable sub-set of users.
  
-It's still possible to provide an "escape hatch" for applications requiring name mangling: the polyfill described eariler. Applications need only include the polyfill code and add it to their bootstrapping. The polyfill would be available via Composer, and the polyfill would populate all the mangled variables as before.+It's still possible to provide an "escape hatch" for applications requiring name mangling: the polyfill described earlier. Applications need only include the polyfill code and add it to their bootstrapping. The polyfill would be available via Composer, and the polyfill would populate all the mangled variables as before.
  
 The polyfill approach is considered superior to the INI approach for three reasons: The polyfill approach is considered superior to the INI approach for three reasons:
Line 126: Line 145:
 === Should ''extract()'' automatically mangle names? === === Should ''extract()'' automatically mangle names? ===
  
-Early versions of this proposal (< v1.2) proposed using extract to mangle names. Rowan Collins and others pointed out this was an unnecessary complication: ''preg_match'' could also accomplish the goal.  Thus, all references to ''extract'' in this RFC have been removed.+Early versions of this proposal (< v1.2) proposed using ''extract'' to mangle names. Rowan Collins and others pointed out this was an unnecessary complication: ''preg_match'' could also accomplish the goal.  Thus, all references to ''extract'' in this RFC have been removed.
  
 However, ''extract()'' should have the option to emit mangled names with a new constant (''EXTR_MANGLE'').  ''extract()'' should also be fixed to export variables with any variable name, because they are all technically valid with the quoted variable syntax (''${'foo.bar'}'').  These will be handled as function fixes and not with this RFC. However, ''extract()'' should have the option to emit mangled names with a new constant (''EXTR_MANGLE'').  ''extract()'' should also be fixed to export variables with any variable name, because they are all technically valid with the quoted variable syntax (''${'foo.bar'}'').  These will be handled as function fixes and not with this RFC.
Line 136: Line 155:
  
 <code php> <code php>
-function mangle_name($name) {+function php_mangle_name($name) {
     $name = preg_replace('/[^a-zA-Z0-9_\x7f-\xff]/', '_', $name);     $name = preg_replace('/[^a-zA-Z0-9_\x7f-\xff]/', '_', $name);
     return preg_replace('/^[0-9]/', '_', $name);     return preg_replace('/^[0-9]/', '_', $name);
 } }
-function mangle_superglobals() {+function php_mangle_superglobals() {
     if (version_compare(PHP_VERSION, '8.0.0', '<')) {     if (version_compare(PHP_VERSION, '8.0.0', '<')) {
         return;         return;
     }     }
     foreach ($_ENV as $var => &$val) {     foreach ($_ENV as $var => &$val) {
-        $mangled = mangle_name($var);+        $mangled = php_mangle_name($var);
         if ($mangled !== $var) {         if ($mangled !== $var) {
             $_ENV[$mangled] =& $val;             $_ENV[$mangled] =& $val;
Line 159: Line 178:
 <code> <code>
 $ composer require php/mangle-superglobals ^1.0 $ composer require php/mangle-superglobals ^1.0
-$ cat app/boostrap.php+$ cat app/bootstrap.php
 <?php <?php
 require __DIR__ . '/vendor/autoload.php'; require __DIR__ . '/vendor/autoload.php';
  
-mangle_superglobals();+php_mangle_superglobals();
  
 // ... // ...
Line 169: Line 188:
  
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
-PHP 7.1 (for notice of impending BC break) and PHP 8.0 (for actual implementation and corresponding BC break).+PHP 8.0.
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 188: Line 207:
  
 ===== Open Issues ===== ===== Open Issues =====
-None so far.+None.
  
 ===== Proposed Voting Choices ===== ===== Proposed Voting Choices =====
-A simple yes/no voting option with a 2/3 majority required.+A simple yes/no voting option with a 2/3 majority required: "Remove name mangling in PHP 8.0?"
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
rfc/on_demand_name_mangling.1451924116.txt.gz · Last modified: 2017/09/22 13:28 (external edit)