rfc:saner-inc-dec-operators
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
rfc:saner-inc-dec-operators [2022/12/02 02:37] – Created first draft girgias | rfc:saner-inc-dec-operators [2023/07/17 14:52] (current) – Implemented girgias | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Path to Saner Increment/ | ====== PHP RFC: Path to Saner Increment/ | ||
- | * Version: 0.1 | + | * Version: 0.3 |
* Date: 2022-11-21 | * Date: 2022-11-21 | ||
* Author: George Peter Banyard, < | * Author: George Peter Banyard, < | ||
- | * Status: | + | * Status: |
- | * Target Version: PHP 8.3 | + | * Target Version: PHP 8.3, PHP 8.(3+x), and PHP 9.0 |
- | * Implementation: | + | * Implementation: |
* First Published at: [[http:// | * First Published at: [[http:// | ||
===== Introduction ===== | ===== Introduction ===== | ||
- | PHP's increment and decrement operators can have some surprising behaviours when used with types other than int and float. Various previous attempts [https:// | + | PHP's increment and decrement operators can have some surprising behaviours when used with types other than int and float. Various previous attempts |
+ | (([[rfc:alpanumeric_decrement|PHP RFC: Alphanumeric Decrement]])) | ||
+ | (([[rfc:increment_decrement_fixes|PHP RFC: Increment/ | ||
+ | have been made to improve the behaviour of these operators, but none have been implemented. | ||
+ | The goal of this RFC is to normalize the behaviour of '' | ||
- | We will first detail the current behaviour of the two operators, | + | Therefore, we will first look at the behaviour of arithmetic operators with various types, then detail the current behaviour of the increment and decrement |
- | ==== Current behaviour ==== | + | ==== Behaviour of arithmetic operators |
- | If the value is of type '' | + | Arithmetic operators perform a numeric |
- | If the a value of type '' | + | < |
+ | In this context if either operand is a float (or not interpretable as an int), both operands are interpreted as floats, | ||
+ | </ | ||
- | If the value is of type '' | ||
- | If the value is of type '' | + | The following types (other than int and float) are considered interpretable as int/float: |
- | If the value is of type '' | + | * '' |
+ | * '' | ||
+ | * '' | ||
- | If the value is of type '' | + | < |
+ | var_dump(null + 1); // int(1) | ||
+ | var_dump(null - 1); // int(-1) | ||
- | * If the string is numeric, the is cast to the corresponding type ('' | + | var_dump(false + 1); // int(1) |
- | * -- TODO REWRITE THIS to better indicate that the "1" is part of the PERL increment | + | var_dump(false |
- | * If the string is empty, the value is changed to the string ''" | + | |
- | * Else, the string is a non-empty non-numeric string. No action is performed on the value if the decrement operator is used. In contrast, if the increment operator is used, a PERL alphanumeric string increment is performed. | + | |
+ | var_dump(true + 1); // int(2) | ||
+ | var_dump(true - 1); // int(0) | ||
- | Note the behaviour around the empty string is identical in PHP 7 and PHP 8 and was not affected by the changes around | + | var_dump(" |
+ | var_dump(" | ||
+ | var_dump(" | ||
+ | var_dump(" | ||
+ | </ | ||
+ | |||
+ | Resources, non-numeric strings, arrays (except when adding two arrays together), and objects that are instances of userland classes throw a '' | ||
+ | |||
+ | Object values that are instances of an internal class that overload the arithmetic operator (by implementing the '' | ||
+ | If an internal class implements a custom '' | ||
+ | Otherwise, a '' | ||
+ | |||
+ | One example of an internal class that implements a '' | ||
+ | < | ||
+ | $o = gmp_init(36); | ||
+ | var_dump($o + 1); | ||
+ | /* | ||
+ | object(GMP)# | ||
+ | [" | ||
+ | string(2) " | ||
+ | } | ||
+ | */ | ||
+ | </ | ||
+ | |||
+ | The only examples of an internal class that does not implement a '' | ||
+ | < | ||
+ | $o = tidy_parse_string("< | ||
+ | var_dump($o + 1); // int(1) | ||
+ | </ | ||
+ | |||
+ | |||
+ | Note: the empty string has **// | ||
+ | |||
+ | Note: If an internal class implements a custom '' | ||
+ | < | ||
+ | $o = curl_init(); | ||
+ | var_dump((int) $o); // e.g. int(1) | ||
+ | var_dump($o + 1); // Fatal error: Uncaught TypeError: Unsupported operand types: CurlHandle + int | ||
+ | </ | ||
+ | |||
+ | ==== Current behaviour of the increment and decrement operators ==== | ||
+ | |||
+ | The current behaviour of these operators is rather complex and depends on which operator is used with which type. First, we will describe the common behaviour between both operators: | ||
+ | |||
+ | * the value is of type '' | ||
+ | * the value is of type '' | ||
+ | * the value is of type '' | ||
+ | * the value is of type '' | ||
+ | |||
+ | < | ||
+ | $int = 10; | ||
+ | var_dump(++$int); | ||
+ | $int = 10; | ||
+ | var_dump(--$int); | ||
+ | |||
+ | $float = 5.7; | ||
+ | var_dump(++$float); | ||
+ | $float = 5.7; | ||
+ | var_dump(--$float); | ||
+ | |||
+ | $false = false; | ||
+ | var_dump(++$false); | ||
+ | var_dump(--$false); | ||
+ | $true = true; | ||
+ | var_dump(++$true); | ||
+ | var_dump(--$true); | ||
+ | |||
+ | $stringInt = " | ||
+ | var_dump(++$stringInt); | ||
+ | var_dump(--$stringInt); | ||
+ | $stringFloat = " | ||
+ | var_dump(++$stringFloat); | ||
+ | var_dump(--$stringFloat); | ||
+ | </ | ||
+ | |||
+ | Object values that are instances of an internal class that overload the arithmetic operator (by implementing the '' | ||
+ | < | ||
+ | $o = gmp_init(36); | ||
+ | var_dump(++$o); | ||
+ | /* | ||
+ | object(GMP)# | ||
+ | [" | ||
+ | string(2) " | ||
+ | } | ||
+ | */ | ||
+ | |||
+ | $o = tidy_parse_string("< | ||
+ | var_dump(++$o); | ||
+ | </ | ||
+ | |||
+ | For non-numeric '' | ||
+ | |||
+ | === Current behaviour of the decrement operator with values of type null and non-numeric string === | ||
+ | |||
+ | If the value is of type '' | ||
+ | |||
+ | If the value is a non-numeric '' | ||
+ | |||
+ | < | ||
+ | $n = null; | ||
+ | --$n; | ||
+ | var_dump($n); | ||
+ | |||
+ | $s = " | ||
+ | --$s; | ||
+ | var_dump($s); | ||
+ | |||
+ | $e = ""; | ||
+ | --$e; | ||
+ | var_dump($e); | ||
+ | </ | ||
+ | |||
+ | === Current behaviour of the increment operator with values of type null and non-numeric string === | ||
+ | |||
+ | If the value is of type '' | ||
+ | |||
+ | If the value is a non-numeric '' | ||
+ | |||
+ | < | ||
+ | $n = null; | ||
+ | ++$n; | ||
+ | var_dump($n); | ||
+ | |||
+ | $s = " | ||
+ | ++$s; | ||
+ | var_dump($s); | ||
+ | |||
+ | $e = ""; | ||
+ | ++$e; | ||
+ | var_dump($e); | ||
+ | </ | ||
+ | |||
+ | Note: this means that the behaviour around the empty string | ||
+ | |||
+ | < | ||
+ | <?php | ||
+ | |||
+ | $s1 = $s2 = ""; | ||
+ | var_dump(++$s1, | ||
+ | /* this results in | ||
+ | string(1) " | ||
+ | int(2) | ||
+ | int(-1) | ||
+ | int(-2) | ||
+ | */ | ||
+ | </ | ||
+ | |||
+ | === Details about the PERL String increment feature === | ||
+ | |||
+ | If the string to increment is the empty string, return the string ''" | ||
+ | |||
+ | Otherwise, the last byte of the string is inspected: | ||
+ | * If it is in-between " | ||
+ | * If if is " | ||
+ | * Otherwise, do nothing. | ||
+ | |||
+ | If, and only if, a carry value is held after having inspected the first byte of the string. The string is prepended the character " | ||
+ | |||
+ | Here are a couple examples demonstrating these rules: | ||
+ | <PHP> | ||
+ | <?php | ||
+ | |||
+ | // Empty string | ||
+ | $s = ""; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // String increments are unaware of being " | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Carrying values of different cases/ | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Carrying values until the beginning of the string | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Trailing whitespace | ||
+ | $s = "Z "; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Leading whitespace | ||
+ | $s = " Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Whitespace in-between | ||
+ | $s = "C Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Non-ASCII characters | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // With period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // With multiple period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | The behaviour is slightly different than that of [[https:// | ||
+ | |||
+ | <code raku> | ||
+ | sub var_dump(Str $v) { | ||
+ | say ' | ||
+ | } | ||
+ | |||
+ | # Empty string | ||
+ | my $s = ""; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # String increments are unaware of being " | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Carrying values of different cases/ | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Carrying values until the beginning of the string | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Trailing whitespace | ||
+ | $s = "Z "; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Leading whitespace | ||
+ | $s = " Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Whitespace | ||
+ | $s = "C Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Non-ASCII characters | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # With period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # With multiple period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | However, the biggest problem is with strings that can be interpreted as a number in scientific notation, because they will never be interpreted as an alphanumeric string to be incremented using the PERL increment feature, but converted to float first: | ||
+ | <PHP> | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | While Raku also supports arithmetic operations with strings that represent number in scientific notation, it does not perform any type juggling at all for the increment and decrement operators (therefore having the same behaviour as currently for boolean and its corresponding '' | ||
+ | |||
+ | Therefore the above snippet in Raku gives a consistent result: | ||
+ | <code raku> | ||
+ | sub var_dump(Str $v) { | ||
+ | say ' | ||
+ | } | ||
+ | |||
+ | my $s = " | ||
+ | var_dump(++$s); | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | ===== Summary of behavioural differences ===== | ||
+ | |||
+ | | | ||
+ | ^ '' | ||
+ | ^ '' | ||
+ | ^ '' | ||
+ | ^ ''""'' | ||
+ | ^ ''" | ||
+ | ^ Tidy Object | '' | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | The proposal is to create a path which creates awareness around | + | The proposal is to create a path so that in the next major version |
- | * Emit an < | + | To achieve this, we propose |
- | * Emit an < | + | |
- | * Deprecate PERL alphanumeric string increments by emitting an < | + | |
- | * Deprecate using the decrement operator on an empty string, | + | |
- | * Add support for values of type '' | + | |
- | ==== Proposal Addendum ==== | + | * Add the < |
+ | * Add support to increment/ | ||
+ | < | ||
+ | $o = tidy_parse_string("< | ||
+ | var_dump(++$o); | ||
+ | </ | ||
- | As the behaviour around these operators | + | * to emit < |
+ | < | ||
+ | $n = null; | ||
+ | --$n; // Warning: Decrement on type null has no effect, this will change in the next major version | ||
+ | var_dump($n); | ||
- | * Emit an < | + | $false = false; |
- | * Deprecate using the increment operator on values | + | --$false; |
+ | var_dump($false); | ||
+ | ++$false; // Warning: Increment on type bool has no effect, | ||
+ | var_dump($false); | ||
+ | $true = true; | ||
+ | --$true; // Warning: Decrement on type bool has no effect, this will change in the next major version of PHP | ||
+ | var_dump($true); | ||
+ | ++$true; // Warning: Increment on type bool has no effect, this will change in the next major version of PHP | ||
+ | var_dump($true); | ||
+ | </ | ||
+ | |||
+ | |||
+ | * Deprecate using the decrement operator with non-numeric strings. | ||
+ | <PHP> | ||
+ | $empty = ""; | ||
+ | --$empty // Deprecated: Decrement on empty string is deprecated as non-numeric | ||
+ | var_dump($empty); | ||
+ | |||
+ | $s = " | ||
+ | --$s; // Deprecated: Decrement on non-numeric string has no effect and is deprecated | ||
+ | var_dump($s); | ||
+ | </ | ||
+ | |||
+ | * Deprecate using the increment operator with strings that are not strictly alphanumeric. | ||
+ | <PHP> | ||
+ | $empty = ""; | ||
+ | ++$empty // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($empty); | ||
+ | |||
+ | $s = " | ||
+ | ++$s; // No Deprecation | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = "Z "; | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = " Z"; | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | # Non-ASCII characters | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | </ | ||
+ | |||
+ | In a follow-up minor version of PHP the following changes will take place: | ||
+ | * Deprecate using the increment operator with non-numeric strings. | ||
+ | <PHP> | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-numeric string is deprecated | ||
+ | var_dump($s); | ||
+ | </ | ||
+ | |||
+ | In the next major version of PHP the following changes will take place: | ||
+ | * Values of type '' | ||
+ | * Non-numeric string values throw a '' | ||
+ | |||
+ | ==== Semantics of str_increment() and str_decrement() ==== | ||
+ | |||
+ | The signature of the functions are: | ||
+ | <PHP> | ||
+ | function str_increment(string $string): string {} | ||
+ | function str_decrement(string $string): string {} | ||
+ | </ | ||
+ | |||
+ | If < | ||
+ | |||
+ | If decrementing < | ||
+ | |||
+ | As those functions would not be performing any type juggling strings that can be interpreted as numbers in scientific notation will not be implicitly converted to float. | ||
+ | |||
+ | <PHP> | ||
+ | $s = " | ||
+ | $s = str_increment($s); | ||
+ | var_dump($s); | ||
+ | $s = str_increment($s); | ||
+ | var_dump($s); | ||
+ | </ | ||
+ | |||
+ | ==== Cost/ | ||
+ | |||
+ | PHP currently has 6 main and 4 operation-specific type juggling contexts. | ||
+ | The main 6 are documented in the userland manual on the [[https:// | ||
+ | * Numeric | ||
+ | * String | ||
+ | * Logical | ||
+ | * Integral and string | ||
+ | * Comparative | ||
+ | * Function | ||
+ | |||
+ | The 4 operation-specific contexts are: | ||
+ | * Increment/ | ||
+ | * String offsets | ||
+ | * Array offsets | ||
+ | * < | ||
+ | |||
+ | With the semantics proposed in this RFC the increment/ | ||
+ | |||
+ | The drawback of this approach is the deprecation, | ||
+ | However, the issues around strings that can be interpreted in scientific notation, the fact it only properly supports strings which are only comprised of the ASCII alphanumeric characters ('' | ||
+ | and adding support for string decrements was previously [[rfc: | ||
+ | makes us believe the current semantics of the string increment feature are unsound. | ||
+ | |||
+ | Therefore, we consider the value of reducing the semantic complexity of PHP higher than keeping support for this feature in its current form. | ||
+ | The introduction of the < | ||
+ | <PHP> | ||
+ | function str_increment_polyfill(string $s): string { | ||
+ | if (is_numeric($s)) { | ||
+ | $offset = stripos($s, ' | ||
+ | if ($offset !== false) { | ||
+ | /* Using increment operator would cast the string to float | ||
+ | * Therefore we manually increment it to convert it to an " | ||
+ | $c = $s[$offset]; | ||
+ | $c++; | ||
+ | $s[$offset] = $c; | ||
+ | $s++; | ||
+ | $s[$offset] = match ($s[$offset]) { | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | }; | ||
+ | return $s; | ||
+ | } | ||
+ | } | ||
+ | return ++$s; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | ==== Impact of deprecating the PERL string increment feature on userland ==== | ||
+ | |||
+ | To determine the impact of this RFC on userland, the static analysis tool [[https:// | ||
+ | |||
+ | The only non-false-positive use cases using the PERL string increment feature are: | ||
+ | |||
+ | * Generating a list of valid unicode (or ASCII) characters. The most popular project using this is HTMLPurifier, | ||
+ | * Generating sequential IDs. The main library doing this is amphp/amp, however a lot of other projects depend on this library. | ||
+ | * Incrementing a spreadsheet column. | ||
+ | |||
+ | In any of these cases, no deprecation notices would be emitted in the first stage of this RFC. | ||
+ | As the first stage of this RFC also provides the < | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
- | The backwards incompatible changes are the changes which introduce an < | + | Using the increment/decrement operators on the empty string. |
- | The changes that introduce an < | + | The string increment feature. |
+ | |||
+ | The changes that introduce an < | ||
+ | |||
+ | ===== Future Scope ===== | ||
+ | |||
+ | One possible future scope is to add support to both arithmetic operations and the increment/ | ||
+ | |||
+ | One other possible extension is to add a < | ||
===== Proposed PHP Version ===== | ===== Proposed PHP Version ===== | ||
- | Next minor version, i.e. PHP 8.3. | + | Next minor version, i.e. PHP 8.3.0, follow-up minor version, e.g. PHP 8.4.0, and next major version, i.e. PHP 9.0.0. |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 69: | Line 585: | ||
As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted. | As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted. | ||
- | Voting started on 2022-XX-XX and will end on 2022-XX-XX. | + | Voting started on 2023-06-28 and will end on 2023-07-12. |
<doodle title=" | <doodle title=" | ||
- | * Yes | ||
- | * No | ||
- | </ | ||
- | |||
- | The addendum to this proposal will also require a 2/3 majority to be accepted. | ||
- | |||
- | Voting started on 2022-XX-XX and will end on 2022-XX-XX. | ||
- | <doodle title=" | ||
* Yes | * Yes | ||
* No | * No | ||
Line 85: | Line 593: | ||
===== Implementation ===== | ===== Implementation ===== | ||
- | GitHub pull request: https:// | + | GitHub pull request: |
After the project is implemented, | After the project is implemented, | ||
- | * the version(s) it was merged into | + | * Version: PHP 8.3 |
- | * a link to the git commit(s) | + | * Implementation : |
* a link to the PHP manual entry for the feature | * a link to the PHP manual entry for the feature | ||
===== References ===== | ===== References ===== | ||
rfc/saner-inc-dec-operators.txt · Last modified: 2023/07/17 14:52 by girgias