PHP RFC: Path to Saner Increment/Decrement operators
- Version: 0.3
- Date: 2022-11-21
- Author: George Peter Banyard, girgias@php.net
- Status: Implemented
- Target Version: PHP 8.3, PHP 8.(3+x), and PHP 9.0
- First Published at: http://wiki.php.net/rfc/saner-inc-dec-operators
Introduction
PHP's increment and decrement operators can have some surprising behaviours when used with types other than int and float. Various previous attempts 1)
2)
3)
have been made to improve the behaviour of these operators, but none have been implemented.
The goal of this RFC is to normalize the behaviour of $v++
and $v--
to be the same as $v += 1
and $v -= 1
, respectively.
Therefore, we will first look at the behaviour of arithmetic operators with various types, then detail the current behaviour of the increment and decrement operators, and finally propose various changes to fix the discrepancies.
Behaviour of arithmetic operators
Arithmetic operators perform a numeric type juggling, which is described in the userland manual as:
In this context if either operand is a float (or not interpretable as an int), both operands are interpreted as floats, and the result will be a float. Otherwise, the operands will be interpreted as ints, and the result will also be an int. As of PHP 8.0.0, if one of the operands cannot be interpreted a
TypeError
is thrown.
The following types (other than int and float) are considered interpretable as int/float:
null
, as0
bool
, wherefalse
is interpreted as0
, andtrue
as1
string
, if it is numeric the string is converted to int/float and the standard behaviour is used
var_dump(null + 1); // int(1) var_dump(null - 1); // int(-1) var_dump(false + 1); // int(1) var_dump(false - 1); // int(-1) var_dump(true + 1); // int(2) var_dump(true - 1); // int(0) var_dump("10" + 1); // int(11) var_dump("10" - 1); // int(9) var_dump("5.7" + 1); // float(6.7) var_dump("5.7" - 1); // float(4.7)
Resources, non-numeric strings, arrays (except when adding two arrays together), and objects that are instances of userland classes throw a TypeError
.
Object values that are instances of an internal class that overload the arithmetic operator (by implementing the do_operation
handler) will use the result of calling the handler.
If an internal class implements a custom cast_object
handler which supports a numeric _IS_NUMBER
cast, the object is cast and the standard int/float behaviour is used.
Otherwise, a TypeError
is thrown.
One example of an internal class that implements a do_operation
handler is the GMP class.
The only examples of an internal class that does not implement a do_operation
handler but implements an _IS_NUMBER
cast in php-src are in Tidy extension (and are of dubious nature):
$o = tidy_parse_string("<p>Hello world</p>"); var_dump($o + 1); // int(1)
Note: the empty string has never been considered numeric. (see: https://3v4l.org/uvLbV)
Note: If an internal class implements a custom cast_object
handler that supports an integer cast (via IS_LONG
) and/or a float cast (via IS_DOUBLE
), but not an _IS_NUMBER
cast, no casting occurs and a TypeError
is thrown.
Current behaviour of the increment and decrement operators
The current behaviour of these operators is rather complex and depends on which operator is used with which type. First, we will describe the common behaviour between both operators:
- the value is of type
int
orfloat
, the operation is performed - the value is of type
array
orresource
then aTypeError
is raised - the value is of type
bool
, no action is performed on the value - the value is of type
string
and is numeric, then a standard numeric type cast is performed, and theint
/float
behaviour is utilized.
$int = 10; var_dump(++$int); // int(11) $int = 10; var_dump(--$int); // int(9) $float = 5.7; var_dump(++$float); // float(6.7) $float = 5.7; var_dump(--$float); // float(4.7) $false = false; var_dump(++$false); // bool(false) var_dump(--$false); // bool(false) $true = true; var_dump(++$true); // bool(true) var_dump(--$true); // bool(true) $stringInt = "10"; var_dump(++$stringInt); // int(11) var_dump(--$stringInt); // int(9) $stringFloat = "5.7"; var_dump(++$stringFloat); // float(6.7) var_dump(--$stringFloat); // float(4.7)
Object values that are instances of an internal class that overload the arithmetic operator (by implementing the do_operation
handler) will use the result of calling the handler. Otherwise, a TypeError
is thrown.
$o = gmp_init(36); var_dump(++$o); /* object(GMP)#2 (1) { ["num"]=> string(2) "37" } */ $o = tidy_parse_string("<p>Hello world</p>"); var_dump(++$o); // Fatal error: Uncaught TypeError: Cannot increment tidy
For non-numeric string
values and values of type null
the behaviour is different between the increment and decrement operators.
Current behaviour of the decrement operator with values of type null and non-numeric string
If the value is of type null
, no action is performed.
If the value is a non-numeric string
, no action is performed, except if the value is the empty string, in which case the result of the operation is the integer -1
.
Current behaviour of the increment operator with values of type null and non-numeric string
If the value is of type null
, the result of the operation is the integer 1
.
If the value is a non-numeric string
a PERL alphanumeric string increment is performed.
$n = null; ++$n; var_dump($n); // int(1) $s = "foo"; ++$s; var_dump($s); // string(3) "fop" $e = ""; ++$e; var_dump($e); // string(1) "1"
Note: this means that the behaviour around the empty string differs between both operators. Because for ++
a PERL increment is used, the result is the string “1”
. This behaviour is identical in all versions of PHP.
<?php $s1 = $s2 = ""; var_dump(++$s1, ++$s1, --$s2, --$s2); /* this results in string(1) "1" int(2) int(-1) int(-2) */
Details about the PERL String increment feature
If the string to increment is the empty string, return the string “1”
.
Otherwise, the last byte of the string is inspected:
- If it is in-between “a” and “y”, “A” and “Y”, or “0” and “8”, the ASCII code point value is increased by one.
- If if is “z”, “Z”, or “9” replace it by “a”, “A”, and “0” respectively, then inspect the previous byte while holding a carry value of 1.
- Otherwise, do nothing.
If, and only if, a carry value is held after having inspected the first byte of the string. The string is prepended the character “a”, “A”, or “1” depending on the value of the first byte (“z”, “Z”, and “9” respectively).
Here are a couple examples demonstrating these rules:
<?php // Empty string $s = ""; var_dump(++$s); // string(1) "1" // String increments are unaware of being "negative" $s = "-cc"; var_dump(++$s); // string(3) "-cd" $s = "cc"; var_dump(++$s); // string(2) "cd" // Carrying values of different cases/types $s = "Az"; var_dump(++$s); // string(2) "Ba" $s = "aZ"; var_dump(++$s); // string(2) "bA" $s = "A9"; var_dump(++$s); // string(2) "B0" $s = "a9"; var_dump(++$s); // string(2) "b0" // Carrying values until the beginning of the string $s = "Zz"; var_dump(++$s); // string(3) "AAa" $s = "zZ"; var_dump(++$s); // string(3) "aaA" $s = "9z"; var_dump(++$s); // string(3) "10a" $s = "9Z"; var_dump(++$s); // string(3) "10A" // Trailing whitespace $s = "Z "; var_dump(++$s); // string(2) "Z " // Leading whitespace $s = " Z"; var_dump(++$s); // string(2) " A" // Whitespace in-between $s = "C Z"; var_dump(++$s); // string(3) "C A" // Non-ASCII characters $s = "é"; var_dump(++$s); // string(2) "é" $s = "あいうえお"; var_dump(++$s); // string(15) "あいうえお" $s = "α"; var_dump(++$s); // string(2) "α" $s = "ω"; var_dump(++$s); // string(2) "ω" $s = "Α"; var_dump(++$s); // string(2) "Β" $s = "Ω"; var_dump(++$s); // string(2) "Ω" // With period $s = "foo1.txt"; var_dump(++$s); // string(8) "foo1.txu" $s = "1f.5"; var_dump(++$s); // string(4) "1f.6" // With multiple period $s = "foo.1.txt"; var_dump(++$s); // string(9) "foo.1.txu" $s = "1.f.5"; var_dump(++$s); // string(5) "1.f.6"
The behaviour is slightly different than that of Raku (a PERL successor). It performs the string increment prior to the first FULL STOP .
character, handles Unicode characters, performs the carry in a slightly differently way, and also does not do anything with empty strings.
sub var_dump(Str $v) { say 'string(' ~ $v.encode('UTF-8').bytes ~ ') "' ~ $v ~ "\"\n"; } # Empty string my $s = ""; var_dump(++$s); # String increments are unaware of being "negative" $s = "-cc"; var_dump(++$s); # string(3) "-cd" $s = "cc"; var_dump(++$s); # string(2) "cd" # Carrying values of different cases/types $s = "Az"; var_dump(++$s); # string(2) "Ba" $s = "aZ"; var_dump(++$s); # string(2) "bA" $s = "A9"; var_dump(++$s); # string(2) "B0" $s = "a9"; var_dump(++$s); # string(2) "b0" # Carrying values until the beginning of the string $s = "Zz"; var_dump(++$s); # string(3) "AAa" $s = "zZ"; var_dump(++$s); # string(3) "aaA" $s = "9z"; var_dump(++$s); # string(3) "10a" $s = "9Z"; var_dump(++$s); # string(3) "10A" # Trailing whitespace $s = "Z "; var_dump(++$s); # string(2) "Z " # Leading whitespace $s = " Z"; var_dump(++$s); # string(2) " A" # Whitespace in-between $s = "C Z"; var_dump(++$s); # string(4) "C AA" # Non-ASCII characters $s = "é"; var_dump(++$s); # string(2) "é" $s = "あいうえお"; var_dump(++$s); # string(15) "あいうえお" $s = "α"; var_dump(++$s); # string(2) "β" $s = "ω"; var_dump(++$s); # string(4) "αα" $s = "Α"; var_dump(++$s); # string(2) "Β" $s = "Ω"; var_dump(++$s); # string(4) "ΑΑ" # With period $s = "foo1.txt"; var_dump(++$s); # string(8) "foo2.txt" $s = "1f.5"; var_dump(++$s); # string(4) "1g.5" # With multiple period $s = "foo.1.txt"; var_dump(++$s); # string(9) "fop.2.txt" $s = "1.f.5"; var_dump(++$s); # string(5) "2.f.5"
However, the biggest problem is with strings that can be interpreted as a number in scientific notation, because they will never be interpreted as an alphanumeric string to be incremented using the PERL increment feature, but converted to float first:
While Raku also supports arithmetic operations with strings that represent number in scientific notation, it does not perform any type juggling at all for the increment and decrement operators (therefore having the same behaviour as currently for boolean and its corresponding null
type Nil
).
Therefore the above snippet in Raku gives a consistent result:
sub var_dump(Str $v) { say 'string(' ~ $v.encode('UTF-8').bytes ~ ') "' ~ $v ~ "\"\n"; } my $s = "5d9"; var_dump(++$s); // string(3) "5e0" var_dump(++$s); // string(3) "5e1"
Summary of behavioural differences
+1 | ++ | -1 | -- |
|
---|---|---|---|---|
null | 1 | 1 | -1 | null |
false | 1 | false | -1 | false |
true | 2 | true | 0 | true |
“” | TypeError | “1” | TypeError | -1 |
“foo” | TypeError | “fop” | TypeError | “foo” |
Tidy Object | 1 | TypeError | -1 | TypeError |
Proposal
The proposal is to create a path so that in the next major version of PHP the increment and decrement operators behave identically to adding/subtracting 1 respectively, while acknowledging that users rely on the PERL string increment feature.
To achieve this, we propose the following changes to be made in the next minor version of PHP:
- Add the
str_increment()
andstr_decrement()
functions which implement a symmetrical but more rigorous and strict behaviour than the current PERL string increment feature has which is described in the sub-section below. - Add support to increment/decrement objects that implement support for a
_IS_NUMBER
cast but do not implement ado_operation
handle
$o = tidy_parse_string("<p>Hello world</p>"); var_dump(++$o); // int(1)
- to emit
E_WARNING
s when the operators currently do not have any behaviour when they would if replaced with a proper addition/subtraction (i.e. when the value is of typebool
andnull
for the decrement operator).
$n = null; --$n; // Warning: Decrement on type null has no effect, this will change in the next major version of PHP var_dump($n); // NULL $false = false; --$false; // Warning: Decrement on type bool has no effect, this will change in the next major version of PHP var_dump($false); // bool(false) ++$false; // Warning: Increment on type bool has no effect, this will change in the next major version of PHP var_dump($false); // bool(false) $true = true; --$true; // Warning: Decrement on type bool has no effect, this will change in the next major version of PHP var_dump($true); // bool(true) ++$true; // Warning: Increment on type bool has no effect, this will change in the next major version of PHP var_dump($true); // bool(true)
- Deprecate using the decrement operator with non-numeric strings.
$empty = ""; --$empty // Deprecated: Decrement on empty string is deprecated as non-numeric var_dump($empty); // int(-1) $s = "foo"; --$s; // Deprecated: Decrement on non-numeric string has no effect and is deprecated var_dump($s); // string(3) "foo"
- Deprecate using the increment operator with strings that are not strictly alphanumeric.
$empty = ""; ++$empty // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($empty); // string(1) "1" $s = "foo"; ++$s; // No Deprecation var_dump($s); // string(3) "fop" $s = "-cc"; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); // string(3) "-cd" $s = "Z "; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); // string(2) "Z " $s = " Z"; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); // string(2) " A" # Non-ASCII characters $s = "é"; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); # string(2) "é" $s = "あいうえお"; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); # string(15) "あいうえお" $s = "α"; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); # string(2) "α" $s = "1f.5"; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); # string(4) "1f.6" $s = "1.f.5"; ++$s; // Deprecated: Increment on non-alphanumeric string is deprecated var_dump($s); # string(5) "1.f.6"
In a follow-up minor version of PHP the following changes will take place:
- Deprecate using the increment operator with non-numeric strings.
$s = "foo"; ++$s; // Deprecated: Increment on non-numeric string is deprecated var_dump($s); // string(3) "fop"
In the next major version of PHP the following changes will take place:
- Values of type
bool
andnull
are first cast to integers - Non-numeric string values throw a
TypeError
Semantics of str_increment() and str_decrement()
The signature of the functions are:
function str_increment(string $string): string {} function str_decrement(string $string): string {}
If $string
is the empty string or not totally comprised of ASCII alphanumeric characters ([a-zA-Z0-9]
) then a ValueError is thrown.
If decrementing $string
would result in an underflow (e.g. “AA”
or “0”
) an out of range ValueError will be thrown. This follows Raku's behaviour.
As those functions would not be performing any type juggling strings that can be interpreted as numbers in scientific notation will not be implicitly converted to float.
Cost/Benefit
PHP currently has 6 main and 4 operation-specific type juggling contexts. The main 6 are documented in the userland manual on the type juggling page and are as follows:
- Numeric
- String
- Logical
- Integral and string
- Comparative
- Function
The 4 operation-specific contexts are:
- Increment/Decrement operators
- String offsets
- Array offsets
exit
language construct
With the semantics proposed in this RFC the increment/decrement operators would be folded into the numeric type juggling context which reduces the semantic complexity of the language and possibly the engine/optimizer implementation in the next major version.
The drawback of this approach is the deprecation, and thus removal, of the PERL increment feature.
However, the issues around strings that can be interpreted in scientific notation, the fact it only properly supports strings which are only comprised of the ASCII alphanumeric characters ([a-zA-Z0-9]
),
and adding support for string decrements was previously rejected unanimously,
makes us believe the current semantics of the string increment feature are unsound.
Therefore, we consider the value of reducing the semantic complexity of PHP higher than keeping support for this feature in its current form.
The introduction of the str_increment()
function provides a migration path for users relying on this feature that can easily be polyfilled in prior versions of PHP:
function str_increment_polyfill(string $s): string { if (is_numeric($s)) { $offset = stripos($s, 'e'); if ($offset !== false) { /* Using increment operator would cast the string to float * Therefore we manually increment it to convert it to an "f"/"F" that doesn't get affected */ $c = $s[$offset]; $c++; $s[$offset] = $c; $s++; $s[$offset] = match ($s[$offset]) { 'f' => 'e', 'F' => 'E', 'g' => 'f', 'G' => 'F', }; return $s; } } return ++$s; }
Impact of deprecating the PERL string increment feature on userland
To determine the impact of this RFC on userland, the static analysis tool Exakat was used. We analyzed 2909 open source projects, including the top 1000 composer packages, plus various private enterprise code bases. 4)
The only non-false-positive use cases using the PERL string increment feature are:
- Generating a list of valid unicode (or ASCII) characters. The most popular project using this is HTMLPurifier, which no longer does so as of this PR.
- Generating sequential IDs. The main library doing this is amphp/amp, however a lot of other projects depend on this library.
- Incrementing a spreadsheet column.
In any of these cases, no deprecation notices would be emitted in the first stage of this RFC.
As the first stage of this RFC also provides the str_increment()
function, which can be polyfilled, we believe there will be enough time to migrate all these usages to the new function prior to removal of this feature.
Backward Incompatible Changes
Using the increment/decrement operators on the empty string.
The string increment feature.
The changes that introduce an E_WARNING
diagnostic do not technically break backwards compatibility, however they might be elevated to an exception via a user set error handler which may reveal some unintended usages.
Future Scope
One possible future scope is to add support to both arithmetic operations and the increment/decrement operators to support objects that only implement an int or float cast instead of a numeric cast.
One other possible extension is to add a $step
argument to str_increment()
and str_decrement()
Proposed PHP Version
Next minor version, i.e. PHP 8.3.0, follow-up minor version, e.g. PHP 8.4.0, and next major version, i.e. PHP 9.0.0.
Proposed Voting Choices
As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.
Voting started on 2023-06-28 and will end on 2023-07-12.
Implementation
GitHub pull request: https://github.com/php/php-src/pull/10358
After the project is implemented, this section should contain
- Version: PHP 8.3
- a link to the PHP manual entry for the feature