rfc:proper-range-semantics

PHP RFC: Define proper semantics for range() function

Introduction

PHP's standard library implements the range() function, which generates an array of values going from a $start value to an $end value. By default values are generated by using a step of 1 but this behaviour can be changed by passing the $step parameter. In principle, the range() function only works with integer, float, and string $start and $end values, but in reality this is not the case. Moreover, even within those expected types the behaviour can be quite strange.

Current Behaviour of range()

The current behaviour is quite complex, and it might be easier to just read the implementation, but it roughly goes as follows:

First, check if the $step argument is negative; if it is multiply by -1.

Then check the boundary arguments:

  • If both start and end values are strings with at least one byte (e.g. range('A', 'Z');, range('AA', 'BB');, or range('15', '25');):
    • If one of the inputs is a float numeric string, or the $step parameter is a float: go to the handle float input branch.
    • If one of the inputs is an integer numeric string: go to the generic handling branch.
    • Otherwise: discard every byte after the first one and return an array of ASCII characters going from the start ASCII code point to the end ASCII code point.
  • If the start or end value is a float or the $step parameter is a float (e.g. range(10.5, 12);, range(1, 3, 1.5);, or range(1, 3, 1.0);): cast start and end values to float and return an array of floats.
  • Otherwise (generic handling): cast start and end values to int and return an array of integers.

The generic case will accept any type.

Let us look at various examples to highlight the range of behaviour exhibited by range()

Examples

Example with expected values:

var_dump(range(1, 3));
array(3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
 
var_dump(range(1.0, 3.0));
array(3) {
  [0]
  float(1)
  [1]
  float(2)
  [2]
  float(3)
}
 
var_dump(range(1, 3, 1.5));
array(2) {
  [0]
  float(1)
  [1]
  float(2.5)
}
 
 
var_dump(range(1.0, 3.0, 1.5));
array(2) {
  [0]
  float(1)
  [1]
  float(2.5)
}
 
var_dump(range('10', '13'));
array(4) {
  [0]
  int(10)
  [1]
  int(11)
  [2]
  int(12)
  [3]
  int(13)
}
 
var_dump(range('10.0', '13.0'));
array(4) {
  [0]
  float(10)
  [1]
  float(11)
  [2]
  float(12)
  [3]
  float(13)
}
 
var_dump(range('10', '13', 1.5));
array(3) {
  [0]
  float(10)
  [1]
  float(11.5)
  [2]
  float(13)
}
 
var_dump(range('10.0', '13.0', 1.5));
array(3) {
  [0]
  float(10)
  [1]
  float(11.5)
  [2]
  float(13)
}
 
var_dump(range('A', 'E'));
array(5) {
  [0]
  string(1) "A"
  [1]
  string(1) "B"
  [2]
  string(1) "C"
  [3]
  string(1) "D"
  [4]
  string(1) "E"
}
 
 
var_dump(range('1', '3'));
array(3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}

Example showing how to produce a decreasing range:

var_dump(range('E', 'A'));
array(5) {
  [0]
  string(1) "E"
  [1]
  string(1) "D"
  [2]
  string(1) "C"
  [3]
  string(1) "B"
  [4]
  string(1) "A"
}

Example showing how negative steps for increasing ranges are multiplied by -1:

var_dump(range(0, 10, -2));
array(6) {
  [0]=>
  int(0)
  [1]=>
  int(2)
  [2]=>
  int(4)
  [3]=>
  int(6)
  [4]=>
  int(8)
  [5]=>
  int(10)
}

Example showing the ASCII code point range:

var_dump( range("!", "/") );
/*
array(15) {
  [0]=>
  string(1) "!"
  [1]=>
  string(1) """
  [2]=>
  string(1) "#"
  [3]=>
  string(1) "$"
  [4]=>
  string(1) "%"
  [5]=>
  string(1) "&"
  [6]=>
  string(1) "'"
  [7]=>
  string(1) "("
  [8]=>
  string(1) ")"
  [9]=>
  string(1) "*"
  [10]=>
  string(1) "+"
  [11]=>
  string(1) ","
  [12]=>
  string(1) "-"
  [13]=>
  string(1) "."
  [14]=>
  string(1) "/"
}
*/
 
var_dump(range('a', 'Z'));
/*
array(8) {
  [0]=>
  string(1) "a"
  [1]=>
  string(1) "`"
  [2]=>
  string(1) "_"
  [3]=>
  string(1) "^"
  [4]=>
  string(1) "]"
  [5]=>
  string(1) "\"
  [6]=>
  string(1) "["
  [7]=>
  string(1) "Z"
}
*/

Example showing how string inputs can get cast to int/float:

var_dump(range('', 'Z'));
array(1) {
  [0]=>
  int(0)
}
 
var_dump(range('A', 'E', 1.0));
array(1) {
  [0]=>
  float(0)
}

Examples with unexpected types:

/* null */
var_dump(range(null, 2));
array(3) {
  [0]=>
  int(0)
  [1]=>
  int(1)
  [2]=>
  int(2)
}
 
var_dump(range(null, 'e'));
array(1) {
  [0]=>
  int(1)
}
 
/* Array */
var_dump(range([5], [8]));
array(1) {
  [0]=>
  int(1)
}
 
/* Resources */
var_dump(range(STDIN, STDERR));
array(3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
 
/* Int/Float castable object */
$o1 = gmp_init(15);
$o2 = gmp_init(20);
var_dump(range($o1, $o2));
array(6) {
  [0]=>
  int(15)
  [1]=>
  int(16)
  [2]=>
  int(17)
  [3]=>
  int(18)
  [4]=>
  int(19)
  [5]=>
  int(20)
}
 
/* Int/Float non-castable object */
$o1 = new stdClass();
$o2 = new stdClass();
var_dump(range($o1, $o2));
/*
 
Warning: Object of class stdClass could not be converted to int in /tmp/preview on line 13
 
Warning: Object of class stdClass could not be converted to int in /tmp/preview on line 13
array(1) {
  [0]=>
  int(1)
}
*/

Issues surrounding usage of INF and NAN values

Infinite values are handled as part of the range boundary checks, or for the $step parameter when checking that the step is less than the range being requested, and will throw ValueErrors.

However, NAN values are not specifically handled and result in nonsensical ranges:

$nan = fdiv(0,0);
 
var_dump(range($nan, 5));
array(1) {
  [0]=>
  float(NAN)
}
 
var_dump(range(1, 5, $nan));
array(0) {
}

Where using a NAN value as a step even breaks the expectation that range() will return a non empty list.

Issues surrounding usage of string digits

If one of the boundary inputs is a string digit (e.g. “1”) both inputs will be interpreted as numbers. This doesn't pose too much of an issue if both inputs are string digits as it will generate a list of integers.

However, if the other input is a non-numeric string the expected behaviour of generating a list of ASCII characters is not upheld anymore:

var_dump( range("9", "A") );
array(10) {
  [0]=>
  int(9)
  [1]=>
  int(8)
  [2]=>
  int(7)
  [3]=>
  int(6)
  [4]=>
  int(5)
  [5]=>
  int(4)
  [6]=>
  int(3)
  [7]=>
  int(2)
  [8]=>
  int(1)
  [9]=>
  int(0)
}

instead of the expected:

var_dump( range("9", "A") );
array(9) {
  [0]=>
  string(1) "9"
  [1]=>
  string(1) ":"
  [2]=>
  string(1) ";"
  [3]=>
  string(1) "<"
  [4]=>
  string(1) "="
  [5]=>
  string(1) ">"
  [6]=>
  string(1) "?"
  [7]=>
  string(1) "@"
  [8]=>
  string(1) "A"
}

Proposal

The proposal is to adjust the semantics of range() in various ways to throw exceptions outright or at least warn when passing unusable arguments to range().

The changes are as follows:

  • If $step is a float but is compatible with int (i.e. (float)(int)$step === $step) interpret it as an integer.
  • Introduce and use a proper ZPP check for int|float|string $start and $end parameters; this will cause TypeErrors to be thrown when passing objects, resources, and arrays to range(). It will also cause a deprecation warning to be emitted when passing null.
  • Throw value errors if $start, $end, or $step is a non-finite float (-INF, INF, NAN).
  • Throw a more descriptive ValueError when $step is zero.
  • Throw a ValueError when passing a negative $step for increasing ranges.
  • Emit an E_WARNING when $start or $end is the empty string, and cast the value to 0
  • Emit an E_WARNING when $start or $end has more than one byte if it is a non-numeric string.
  • Emit an E_WARNING when $start or $end is cast to an integer because the other boundary input is a number. (e.g. range(5, 'z');)
  • Produce a list of characters if one of the boundary inputs is a string digit instead of casting the other input to int (e.g. range('5', 'z');)
  • Emit an E_WARNING when $step is a float when trying to generate a range of characters, except if both boundary inputs are numeric strings (e.g. range('5', '9', 0.5); does not produce a warning).

Therefore, the behaviour of some of the previous examples would result in the following behaviour:

var_dump(range('A', 'E', 1.0));
array(5) {
  [0]=>
  string(1) "A"
  [1]=>
  string(1) "B"
  [2]=>
  string(1) "C"
  [3]=>
  string(1) "D"
  [4]=>
  string(1) "E"
}
 
var_dump( range("9", "A") );
array(9) {
  [0]=>
  string(1) "9"
  [1]=>
  string(1) ":"
  [2]=>
  string(1) ";"
  [3]=>
  string(1) "<"
  [4]=>
  string(1) "="
  [5]=>
  string(1) ">"
  [6]=>
  string(1) "?"
  [7]=>
  string(1) "@"
  [8]=>
  string(1) "A"
}
 
var_dump(range('', 'Z'));
/*
Warning: range(): Argument #1 ($start) must not be empty, casted to 0
 
Warning: range(): Argument #1 ($start) must be a string if argument #2 ($end) is a string, argument #2 ($end) converted to 0
*/
 
 
var_dump(range(null, 2));
/*
Deprecated: range(): Passing null to parameter #1 ($start) of type string|int|float is deprecated
array(3) {
  [0]=>
  int(0)
  [1]=>
  int(1)
  [2]=>
  int(2)
}
*/
 
var_dump(range(null, 'e'));
/*
Deprecated: range(): Passing null to parameter #1 ($start) of type string|int|float is deprecated in %s on line %d
 
Warning: range(): Argument #1 ($start) must be a string if argument #2 ($end) is a string, argument #2 ($end) converted to 0 in %s on line %d
array(1) {
  [0]=>
  int(1)
}
*/
 
var_dump(range(0, 10, -2));
/*
range(): Argument #3 ($step) must be greater than 0 for increasing ranges
*/

Impact Analysis

Using Nikita Popov's ''popular-package-analysis'' project and running a rough analysis of the usage of range() on the top 1000 composer projects we get that out of around 450 calls to range()

  1. 154 calls are made with literal number arguments
  2. 18 calls are made with literal string arguments
  3. 140 calls have at least one argument be the result of a plus (+), minus (-), or times (*) operation.
  4. 47 calls have at least one argument be a variable
  5. 25 calls have an argument made from a function that returns a number (count(), min(), or max())
  6. 66 calls have at least an argument that comes from a class property, class method, function, or array dimension.

The calls that are non-trivial were manually checked and seem all valid.

Backward Incompatible Changes

TypeErrors are thrown for incompatible types.

ValueErrors are thrown for INF, NAN, and negative step values for increasing ranges.

E_WARNINGs are emitted for various issues.

Calls to range() that have integer boundaries but a float step that is compatible as an integer will now return an array of integers instead of an array of floats:

var_dump( range(1, 5, 2.0) );
/* New Behaviour */
array(3) {
  [0]=>
  int(1)
  [1]=>
  int(3)
  [2]=>
  int(5)
}
/* Current Behaviour */
array(3) {
  [0]=>
  float(1)
  [1]=>
  float(3)
  [2]=>
  float(5)
}

Proposed PHP Version

Next minor version, i.e. PHP 8.3.0.

Proposed Voting Choices

As per the voting RFC a yes/no vote with a 2/3 majority is needed for this proposal to be accepted.

Voting started on 2023-06-01 and will end on 2023-06-15.

Accept Saner range() semantics RFC?
Real name Yes No
alcaeus (alcaeus)  
bwoebi (bwoebi)  
colinodell (colinodell)  
crell (crell)  
cschneid (cschneid)  
dharman (dharman)  
galvao (galvao)  
girgias (girgias)  
heiglandreas (heiglandreas)  
lufei (lufei)  
mcmic (mcmic)  
nicolasgrekas (nicolasgrekas)  
nielsdos (nielsdos)  
ocramius (ocramius)  
petk (petk)  
pierrick (pierrick)  
santiagolizardo (santiagolizardo)  
sergey (sergey)  
svpernova09 (svpernova09)  
theodorejb (theodorejb)  
Final result: 20 0
This poll has been closed.

Implementation

References

rfc/proper-range-semantics.txt · Last modified: 2023/06/19 13:41 by girgias