PHP RFC: Define proper semantics for range() function
- Version: 0.3
- Date: 2023-03-13
- Author: George Peter Banyard, girgias@php.net
- Status: Implemented (https://github.com/php/php-src/commit/798c40a739e8f1081a516679a367d76c3d0aabb9)
- Target Version: PHP 8.3
- Implementation: https://github.com/php/php-src/pull/10826
- First Published at: http://wiki.php.net/rfc/proper-range-semantics
Introduction
PHP's standard library implements the range()
function, which generates an array of values going from a $start
value to an $end
value.
By default values are generated by using a step of 1
but this behaviour can be changed by passing the $step
parameter.
In principle, the range()
function only works with integer, float, and string $start
and $end
values, but in reality this is not the case. Moreover, even within those expected types the behaviour can be quite strange.
Current Behaviour of range()
The current behaviour is quite complex, and it might be easier to just read the implementation, but it roughly goes as follows:
First, check if the $step
argument is negative; if it is multiply by -1
.
Then check the boundary arguments:
- If both start and end values are strings with at least one byte (e.g.
range('A', 'Z');
,range('AA', 'BB');
, orrange('15', '25');
):- If one of the inputs is a float numeric string, or the
$step
parameter is a float: go to the handle float input branch. - If one of the inputs is an integer numeric string: go to the generic handling branch.
- Otherwise: discard every byte after the first one and return an array of ASCII characters going from the start ASCII code point to the end ASCII code point.
- Otherwise (generic handling): cast start and end values to int and return an array of integers.
The generic case will accept any type.
Let us look at various examples to highlight the range of behaviour exhibited by range()
Examples
Example with expected values:
var_dump(range(1, 3)); array(3) { [0]=> int(1) [1]=> int(2) [2]=> int(3) } var_dump(range(1.0, 3.0)); array(3) { [0] float(1) [1] float(2) [2] float(3) } var_dump(range(1, 3, 1.5)); array(2) { [0] float(1) [1] float(2.5) } var_dump(range(1.0, 3.0, 1.5)); array(2) { [0] float(1) [1] float(2.5) } var_dump(range('10', '13')); array(4) { [0] int(10) [1] int(11) [2] int(12) [3] int(13) } var_dump(range('10.0', '13.0')); array(4) { [0] float(10) [1] float(11) [2] float(12) [3] float(13) } var_dump(range('10', '13', 1.5)); array(3) { [0] float(10) [1] float(11.5) [2] float(13) } var_dump(range('10.0', '13.0', 1.5)); array(3) { [0] float(10) [1] float(11.5) [2] float(13) } var_dump(range('A', 'E')); array(5) { [0] string(1) "A" [1] string(1) "B" [2] string(1) "C" [3] string(1) "D" [4] string(1) "E" } var_dump(range('1', '3')); array(3) { [0]=> int(1) [1]=> int(2) [2]=> int(3) }
Example showing how to produce a decreasing range:
var_dump(range('E', 'A')); array(5) { [0] string(1) "E" [1] string(1) "D" [2] string(1) "C" [3] string(1) "B" [4] string(1) "A" }
Example showing how negative steps for increasing ranges are multiplied by -1
:
var_dump(range(0, 10, -2)); array(6) { [0]=> int(0) [1]=> int(2) [2]=> int(4) [3]=> int(6) [4]=> int(8) [5]=> int(10) }
Example showing the ASCII code point range:
var_dump( range("!", "/") ); /* array(15) { [0]=> string(1) "!" [1]=> string(1) """ [2]=> string(1) "#" [3]=> string(1) "$" [4]=> string(1) "%" [5]=> string(1) "&" [6]=> string(1) "'" [7]=> string(1) "(" [8]=> string(1) ")" [9]=> string(1) "*" [10]=> string(1) "+" [11]=> string(1) "," [12]=> string(1) "-" [13]=> string(1) "." [14]=> string(1) "/" } */ var_dump(range('a', 'Z')); /* array(8) { [0]=> string(1) "a" [1]=> string(1) "`" [2]=> string(1) "_" [3]=> string(1) "^" [4]=> string(1) "]" [5]=> string(1) "\" [6]=> string(1) "[" [7]=> string(1) "Z" } */
Example showing how string inputs can get cast to int/float:
var_dump(range('', 'Z')); array(1) { [0]=> int(0) } var_dump(range('A', 'E', 1.0)); array(1) { [0]=> float(0) }
Examples with unexpected types:
/* null */ var_dump(range(null, 2)); array(3) { [0]=> int(0) [1]=> int(1) [2]=> int(2) } var_dump(range(null, 'e')); array(1) { [0]=> int(1) } /* Array */ var_dump(range([5], [8])); array(1) { [0]=> int(1) } /* Resources */ var_dump(range(STDIN, STDERR)); array(3) { [0]=> int(1) [1]=> int(2) [2]=> int(3) } /* Int/Float castable object */ $o1 = gmp_init(15); $o2 = gmp_init(20); var_dump(range($o1, $o2)); array(6) { [0]=> int(15) [1]=> int(16) [2]=> int(17) [3]=> int(18) [4]=> int(19) [5]=> int(20) } /* Int/Float non-castable object */ $o1 = new stdClass(); $o2 = new stdClass(); var_dump(range($o1, $o2)); /* Warning: Object of class stdClass could not be converted to int in /tmp/preview on line 13 Warning: Object of class stdClass could not be converted to int in /tmp/preview on line 13 array(1) { [0]=> int(1) } */
Issues surrounding usage of INF and NAN values
Infinite values are handled as part of the range boundary checks, or for the $step
parameter when checking that the step is less than the range being requested, and will throw ValueErrors.
However, NAN values are not specifically handled and result in nonsensical ranges:
$nan = fdiv(0,0); var_dump(range($nan, 5)); array(1) { [0]=> float(NAN) } var_dump(range(1, 5, $nan)); array(0) { }
Where using a NAN value as a step even breaks the expectation that range()
will return a non empty list.
Issues surrounding usage of string digits
If one of the boundary inputs is a string digit (e.g. “1”
) both inputs will be interpreted as numbers.
This doesn't pose too much of an issue if both inputs are string digits as it will generate a list of integers.
However, if the other input is a non-numeric string the expected behaviour of generating a list of ASCII characters is not upheld anymore:
var_dump( range("9", "A") ); array(10) { [0]=> int(9) [1]=> int(8) [2]=> int(7) [3]=> int(6) [4]=> int(5) [5]=> int(4) [6]=> int(3) [7]=> int(2) [8]=> int(1) [9]=> int(0) }
instead of the expected:
Proposal
The proposal is to adjust the semantics of range()
in various ways to throw exceptions outright or at least warn when passing unusable arguments to range()
.
The changes are as follows:
- If
$step
is a float but is compatible withint
(i.e.(float)(int)$step === $step
) interpret it as an integer. - Introduce and use a proper ZPP check for
int|float|string
$start
and$end
parameters; this will causeTypeError
s to be thrown when passing objects, resources, and arrays torange()
. It will also cause a deprecation warning to be emitted when passingnull
. - Throw value errors if
$start
,$end
, or$step
is a non-finite float (-INF, INF, NAN). - Throw a more descriptive
ValueError
when$step
is zero. - Throw a
ValueError
when passing a negative$step
for increasing ranges. - Emit an
E_WARNING
when$start
or$end
is the empty string, and cast the value to0
- Emit an
E_WARNING
when$start
or$end
has more than one byte if it is a non-numeric string. - Emit an
E_WARNING
when$start
or$end
is cast to an integer because the other boundary input is a number. (e.g.range(5, 'z');
) - Produce a list of characters if one of the boundary inputs is a string digit instead of casting the other input to int (e.g.
range('5', 'z');
) - Emit an
E_WARNING
when$step
is a float when trying to generate a range of characters, except if both boundary inputs are numeric strings (e.g.range('5', '9', 0.5);
does not produce a warning).
Therefore, the behaviour of some of the previous examples would result in the following behaviour:
var_dump(range('A', 'E', 1.0)); array(5) { [0]=> string(1) "A" [1]=> string(1) "B" [2]=> string(1) "C" [3]=> string(1) "D" [4]=> string(1) "E" } var_dump( range("9", "A") ); array(9) { [0]=> string(1) "9" [1]=> string(1) ":" [2]=> string(1) ";" [3]=> string(1) "<" [4]=> string(1) "=" [5]=> string(1) ">" [6]=> string(1) "?" [7]=> string(1) "@" [8]=> string(1) "A" } var_dump(range('', 'Z')); /* Warning: range(): Argument #1 ($start) must not be empty, casted to 0 Warning: range(): Argument #1 ($start) must be a string if argument #2 ($end) is a string, argument #2 ($end) converted to 0 */ var_dump(range(null, 2)); /* Deprecated: range(): Passing null to parameter #1 ($start) of type string|int|float is deprecated array(3) { [0]=> int(0) [1]=> int(1) [2]=> int(2) } */ var_dump(range(null, 'e')); /* Deprecated: range(): Passing null to parameter #1 ($start) of type string|int|float is deprecated in %s on line %d Warning: range(): Argument #1 ($start) must be a string if argument #2 ($end) is a string, argument #2 ($end) converted to 0 in %s on line %d array(1) { [0]=> int(1) } */ var_dump(range(0, 10, -2)); /* range(): Argument #3 ($step) must be greater than 0 for increasing ranges */
Impact Analysis
Using Nikita Popov's ''popular-package-analysis'' project and running a rough analysis of the usage of range()
on the top 1000 composer projects we get that out of around 450 calls to range()
- 154 calls are made with literal number arguments
- 18 calls are made with literal string arguments
- 140 calls have at least one argument be the result of a plus (
+
), minus (-
), or times (*
) operation. - 47 calls have at least one argument be a variable
- 66 calls have at least an argument that comes from a class property, class method, function, or array dimension.
The calls that are non-trivial were manually checked and seem all valid.
Backward Incompatible Changes
TypeError
s are thrown for incompatible types.
ValueError
s are thrown for INF, NAN, and negative step values for increasing ranges.
E_WARNING
s are emitted for various issues.
Calls to range()
that have integer boundaries but a float step that is compatible as an integer will now return an array of integers instead of an array of floats:
Proposed PHP Version
Next minor version, i.e. PHP 8.3.0.
Proposed Voting Choices
As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.
Voting started on 2023-06-01 and will end on 2023-06-15.
Implementation
GitHub pull request: https://github.com/php/php-src/pull/10826
Implemented in PHP 8.3, as commit: https://github.com/php/php-src/commit/798c40a739e8f1081a516679a367d76c3d0aabb9