rfc:negative-string-offsets

PHP RFC: Generalize support of negative string offsets

Introduction

In most PHP functions, providing a negative value as string offset means 'n positions counted backwards from the end of the string'. This mechanism is widely used but, unfortunately, these negative values are not supported everywhere this would make sense. So, as PHP developers cannot easily know whether a given string functions accepts negative values or not, they regularly need to refer to the documentation. If it does not, they need to insert a call to the substr() function, making their code less readable and slower.

An obvious example is strrpos() accepting negative offsets, while strpos() does not. The same with substr_count() rejecting negative offset or length, when substr() accepts them.

Proposal

This RFC proposes to generalize support for negative string offsets everywhere it makes sense.

In accordance with the existing behavior, a negative offset is a position counted backwards from the end of the string.

In the same spirit, the RFC also adds support for negative length arguments where it makes sense. A negative length (-x) means 'up to the position x counted backwards from the end of the string'.

The reference behavior for negative offset and length is the substr() function.

The main objective of this RFC is to improve the overall consistency of the language.

Feature requests solved by this RFC

Detailed changes

In the language

String access to individual characters using a '{}' or '[]' will be extended to negative values.

Examples :

$str='abcdef';
var_dump($str[-2]); // => string(1) "e"
 
$str{-3}='.';
var_dump($str);		// => string(6) "abc.ef"
 
var_dump(isset($str{-4}));	// => bool(true)
 
var_dump(isset($str{-10}));	// => bool(false)

In built-in functions

Function Add support for
strpos Negative offset
stripos Negative offset
substr_count Negative 'offset' and 'length' (same behavior as substr())
grapheme_strpos Negative offset
grapheme_stripos Negative offset
grapheme_extract Negative offset
iconv_strpos Negative offset
file_get_contents Negative offset (based on seek(SEEK_END), reserved to seekable streams)
mb_strimwidth Negative 'start' and 'width'
mb_ereg_search_setpos Negative 'position' when search string is defined
mb_strpos Negative offset
mb_stripos Negative offset

Notes

  • Nothing done for iconv_strrpos() because function does not accept an 'offset' arg (inconsistent with every other xxx_strrpos() functions but argument cannot be added without breaking BC).
  • file_get_contents() : 'maxlen' argument cannot support negative values because of stream filters.

Backward Incompatible Changes

This RFC extends the range of valid values. In most cases, negative values raise a warning message and offset is considered as zero. The new behavior considers such values as valid (as long as they don't exceed the string length).

While not negligible, I consider these BC breaks as minor because, everywhere the behavior is modified, negative values were considered as invalid and raised an error message. So, we are just suppressing error cases.

Proposed PHP Version(s)

7.1

RFC Impact

To SAPIs

None

To Existing Extensions

None

To Opcache

None

Open Issues

To do (waiting for RFC approval) :

  • Update documentation
  • Update language specifications

Unaffected PHP Functionality

mbstring functions remain compatible with their ASCII counterpart, relative to the mbstring.func_overload ini setting.

Future Scope

Recommend using '{}' vs '[]' for string offsets

It was suggested during the discussion that, since array access and string offsets are very different operations, the official documentation should recommend using the '{}' syntax for string offsets, instead of the ambiguous '[]' syntax (and potentially deprecate '[]' for string offsets in the future).

On the opposite side, it was also suggested that array access and string offsets are so closely-related concepts that we should recommend using '[]' in both cases and disable the alternate '{}' syntax for string offsets !

So, as the subject is controversial and very tangential to the subject of this RFC, it will be left for a future RFC.

Proposed Voting Choices

As this RFC adds support for negative string offsets in the language, it requires a 2/3 majority.

Generalize support of negative string offsets
Real name Yes No
ajf (ajf)  
alan_k (alan_k)  
bwoebi (bwoebi)  
cmb (cmb)  
colinodell (colinodell)  
daverandom (daverandom)  
davey (davey)  
dm (dm)  
jpauli (jpauli)  
kinncj (kinncj)  
leigh (leigh)  
lstrojny (lstrojny)  
mariano (mariano)  
mbeccati (mbeccati)  
mikemike (mikemike)  
nikic (nikic)  
ocramius (ocramius)  
oliviergarcia (oliviergarcia)  
patrickallaert (patrickallaert)  
pauloelr (pauloelr)  
rmf (rmf)  
stas (stas)  
thorstenr (thorstenr)  
tpunt (tpunt)  
trowski (trowski)  
willfitch (willfitch)  
yohgaki (yohgaki)  
yunosh (yunosh)  
Final result: 28 0
This poll has been closed.

Patches and Tests

Pull Request : https://github.com/php/php-src/pull/1431 (final patch)

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

None

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/negative-string-offsets.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1