rfc:saner-inc-dec-operators
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
rfc:saner-inc-dec-operators [2023/01/14 22:22] – Add examples and rewrite slightly to accomodate them girgias | rfc:saner-inc-dec-operators [2023/07/12 14:11] – Close vote girgias | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== PHP RFC: Path to Saner Increment/ | ====== PHP RFC: Path to Saner Increment/ | ||
- | * Version: 0.2 | + | * Version: 0.3 |
* Date: 2022-11-21 | * Date: 2022-11-21 | ||
* Author: George Peter Banyard, < | * Author: George Peter Banyard, < | ||
- | * Status: | + | * Status: |
- | * Target Version: PHP 8.3 and PHP 9.0 | + | * Target Version: PHP 8.3, PHP 8.(3+x), |
- | * Implementation: | + | * Implementation: |
* First Published at: [[http:// | * First Published at: [[http:// | ||
Line 68: | Line 68: | ||
</ | </ | ||
- | The only examples of an internal class that does not implements | + | The only examples of an internal class that does not implement |
<PHP> | <PHP> | ||
$o = tidy_parse_string("< | $o = tidy_parse_string("< | ||
Line 134: | Line 134: | ||
</ | </ | ||
- | For non-numeric '' | + | For non-numeric '' |
=== Current behaviour of the decrement operator with values of type null and non-numeric string === | === Current behaviour of the decrement operator with values of type null and non-numeric string === | ||
Line 190: | Line 190: | ||
*/ | */ | ||
</ | </ | ||
+ | |||
+ | === Details about the PERL String increment feature === | ||
+ | |||
+ | If the string to increment is the empty string, return the string ''" | ||
+ | |||
+ | Otherwise, the last byte of the string is inspected: | ||
+ | * If it is in-between " | ||
+ | * If if is " | ||
+ | * Otherwise, do nothing. | ||
+ | |||
+ | If, and only if, a carry value is held after having inspected the first byte of the string. The string is prepended the character " | ||
+ | |||
+ | Here are a couple examples demonstrating these rules: | ||
+ | <PHP> | ||
+ | <?php | ||
+ | |||
+ | // Empty string | ||
+ | $s = ""; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // String increments are unaware of being " | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Carrying values of different cases/types | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Carrying values until the beginning of the string | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Trailing whitespace | ||
+ | $s = "Z "; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Leading whitespace | ||
+ | $s = " Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Whitespace in-between | ||
+ | $s = "C Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // Non-ASCII characters | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // With period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | // With multiple period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | The behaviour is slightly different than that of [[https:// | ||
+ | |||
+ | <code raku> | ||
+ | sub var_dump(Str $v) { | ||
+ | say ' | ||
+ | } | ||
+ | |||
+ | # Empty string | ||
+ | my $s = ""; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # String increments are unaware of being " | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Carrying values of different cases/types | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Carrying values until the beginning of the string | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Trailing whitespace | ||
+ | $s = "Z "; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Leading whitespace | ||
+ | $s = " Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Whitespace in-between | ||
+ | $s = "C Z"; | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # Non-ASCII characters | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # With period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | |||
+ | # With multiple period | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | However, the biggest problem is with strings that can be interpreted as a number in scientific notation, because they will never be interpreted as an alphanumeric string to be incremented using the PERL increment feature, but converted to float first: | ||
+ | <PHP> | ||
+ | $s = " | ||
+ | var_dump(++$s); | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | While Raku also supports arithmetic operations with strings that represent number in scientific notation, it does not perform any type juggling at all for the increment and decrement operators (therefore having the same behaviour as currently for boolean and its corresponding '' | ||
+ | |||
+ | Therefore the above snippet in Raku gives a consistent result: | ||
+ | <code raku> | ||
+ | sub var_dump(Str $v) { | ||
+ | say ' | ||
+ | } | ||
+ | |||
+ | my $s = " | ||
+ | var_dump(++$s); | ||
+ | var_dump(++$s); | ||
+ | </ | ||
+ | |||
+ | ===== Summary of behavioural differences ===== | ||
+ | |||
+ | | | ||
+ | ^ '' | ||
+ | ^ '' | ||
+ | ^ '' | ||
+ | ^ ''""'' | ||
+ | ^ ''" | ||
+ | ^ Tidy Object | '' | ||
===== Proposal ===== | ===== Proposal ===== | ||
- | The proposal is to create a path so that in the next major version of PHP the increment and decrement operators behave identically to adding/ | + | The proposal is to create a path so that in the next major version of PHP the increment and decrement operators behave identically to adding/ |
To achieve this, we propose the following changes to be made in the next minor version of PHP: | To achieve this, we propose the following changes to be made in the next minor version of PHP: | ||
+ | * Add the < | ||
* Add support to increment/ | * Add support to increment/ | ||
<PHP> | <PHP> | ||
Line 203: | Line 394: | ||
</ | </ | ||
- | * to emit < | + | * to emit < |
<PHP> | <PHP> | ||
$n = null; | $n = null; | ||
Line 223: | Line 414: | ||
- | * Deprecate using those operators | + | * Deprecate using the decrement operator |
<PHP> | <PHP> | ||
$empty = ""; | $empty = ""; | ||
Line 232: | Line 423: | ||
--$s; // Deprecated: Decrement on non-numeric string has no effect and is deprecated | --$s; // Deprecated: Decrement on non-numeric string has no effect and is deprecated | ||
var_dump($s); | var_dump($s); | ||
+ | </ | ||
+ | * Deprecate using the increment operator with strings that are not strictly alphanumeric. | ||
+ | <PHP> | ||
$empty = ""; | $empty = ""; | ||
- | ++$empty // Deprecated: Increment on non-numeric | + | ++$empty // Deprecated: Increment on non-alphanumeric |
var_dump($empty); | var_dump($empty); | ||
+ | $s = " | ||
+ | ++$s; // No Deprecation | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = "Z "; | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = " Z"; | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | # Non-ASCII characters | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | |||
+ | $s = " | ||
+ | ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated | ||
+ | var_dump($s); | ||
+ | </ | ||
+ | |||
+ | In a follow-up minor version of PHP the following changes will take place: | ||
+ | * Deprecate using the increment operator with non-numeric strings. | ||
+ | <PHP> | ||
$s = " | $s = " | ||
++$s; // Deprecated: Increment on non-numeric string is deprecated | ++$s; // Deprecated: Increment on non-numeric string is deprecated | ||
Line 246: | Line 478: | ||
* Non-numeric string values throw a '' | * Non-numeric string values throw a '' | ||
+ | ==== Semantics of str_increment() and str_decrement() ==== | ||
+ | |||
+ | The signature of the functions are: | ||
+ | <PHP> | ||
+ | function str_increment(string $string): string {} | ||
+ | function str_decrement(string $string): string {} | ||
+ | </ | ||
+ | |||
+ | If < | ||
+ | |||
+ | If decrementing < | ||
+ | |||
+ | As those functions would not be performing any type juggling strings that can be interpreted as numbers in scientific notation will not be implicitly converted to float. | ||
+ | |||
+ | <PHP> | ||
+ | $s = " | ||
+ | $s = str_increment($s); | ||
+ | var_dump($s); | ||
+ | $s = str_increment($s); | ||
+ | var_dump($s); | ||
+ | </ | ||
==== Cost/ | ==== Cost/ | ||
- | PHP currently has 6 main and 3 operation specific type juggling contexts. | + | PHP currently has 6 main and 4 operation-specific type juggling contexts. |
- | The main 6 are documented in the userland manual on the type juggling page and are as follows: | + | The main 6 are documented in the userland manual on the [[https:// |
* Numeric | * Numeric | ||
* String | * String | ||
Line 258: | Line 511: | ||
* Function | * Function | ||
- | The 3 operation specific | + | The 4 operation-specific |
* Increment/ | * Increment/ | ||
* String offsets | * String offsets | ||
* Array offsets | * Array offsets | ||
- | | + | |
With the semantics proposed in this RFC the increment/ | With the semantics proposed in this RFC the increment/ | ||
- | The drawback of this approach is the deprecation, | + | The drawback of this approach is the deprecation, |
+ | However, | ||
+ | and adding support for string decrements | ||
+ | makes us believe | ||
+ | |||
+ | Therefore, we consider the value of reducing the semantic complexity of PHP higher than keeping support for this feature | ||
+ | The introduction of the < | ||
+ | < | ||
+ | function str_increment_polyfill(string $s): string { | ||
+ | if (is_numeric($s)) { | ||
+ | $offset = stripos($s, ' | ||
+ | if ($offset !== false) { | ||
+ | /* Using increment operator would cast the string to float | ||
+ | * Therefore we manually increment it to convert it to an " | ||
+ | $c = $s[$offset]; | ||
+ | $c++; | ||
+ | $s[$offset] = $c; | ||
+ | $s++; | ||
+ | $s[$offset] = match ($s[$offset]) { | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | }; | ||
+ | return $s; | ||
+ | } | ||
+ | } | ||
+ | return ++$s; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | ==== Impact of deprecating the PERL string increment feature on userland ==== | ||
+ | |||
+ | To determine the impact of this RFC on userland, the static analysis tool [[https://www.exakat.io/en/|Exakat]] was used. We analyzed 2909 open source projects, including the top 1000 composer packages, plus various private enterprise code bases. ((Raw results of the analysis are available as a [[https:// | ||
+ | |||
+ | The only non-false-positive use cases using the PERL string increment feature are: | ||
+ | |||
+ | * Generating a list of valid unicode (or ASCII) characters. The most popular project using this is HTMLPurifier, | ||
+ | * Generating sequential IDs. The main library doing this is amphp/amp, however a lot of other projects depend on this library. | ||
+ | * Incrementing a spreadsheet column. | ||
+ | |||
+ | In any of these cases, no deprecation notices would be emitted | ||
+ | As the first stage of this RFC also provides the < | ||
===== Backward Incompatible Changes ===== | ===== Backward Incompatible Changes ===== | ||
Line 278: | Line 574: | ||
One possible future scope is to add support to both arithmetic operations and the increment/ | One possible future scope is to add support to both arithmetic operations and the increment/ | ||
+ | |||
+ | One other possible extension is to add a < | ||
===== Proposed PHP Version ===== | ===== Proposed PHP Version ===== | ||
- | Next minor version, i.e. PHP 8.3.0, and next major version, i.e. PHP 9.0.0. | + | Next minor version, i.e. PHP 8.3.0, follow-up minor version, e.g. PHP 8.4.0, and next major version, i.e. PHP 9.0.0. |
===== Proposed Voting Choices ===== | ===== Proposed Voting Choices ===== | ||
Line 287: | Line 585: | ||
As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted. | As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted. | ||
- | Voting started on 2023-XX-XX and will end on 2023-XX-XX. | + | Voting started on 2023-06-28 and will end on 2023-07-12. |
<doodle title=" | <doodle title=" | ||
* Yes | * Yes | ||
Line 295: | Line 593: | ||
===== Implementation ===== | ===== Implementation ===== | ||
- | GitHub pull request: https:// | + | GitHub pull request: |
After the project is implemented, | After the project is implemented, |
rfc/saner-inc-dec-operators.txt · Last modified: 2023/07/17 14:52 by girgias