rfc:negative-string-offsets

This is an old revision of the document!


PHP RFC: Generalize support of negative string offsets

Introduction

In most PHP functions, providing a negative value as string offset means 'count n positions from the end of the string'. This mechanism is widely used but, unfortunately, negative values are not supported everywhere it would make sense. So, developers need to refer to the documentation to know whether a function supports negative offset values or not. If it does not, they need to insert a call to substr(), making their code less readable and slower.

An obvious example is strrpos() accepting negative offsets, while strpos() does not. The same with substr_count() rejecting negative offset or length, when substr() accepts them.

Proposal

This RFC proposes to generalize support for negative string offsets everywhere it makes sense. In the same spirit, it also adds support for negative lengths where it is missing.

The main objective is to impprove the overall consistency of the language.

Feature requests solved by this RFC

Detailed changes

  • Negative string offsets in read mode ($xx = $str[-2] or $str{-2});
  • Negative string offsets in assignments ($str{-2} = 'x');
  • Negative string offsets in isset()/empty() (isset($str{-5}))
  • strpos() : negative offset
  • stripos() : negative offset
  • substr_count(): negative 'offset' and 'length' (same behavior as substr()).
  • grapheme_strpos() : negative offset
  • grapheme_stripos(), grapheme_extract() : negative offset
  • grapheme_extract() : negative offset
  • iconv_strpos() : negative offset
  • file_get_contents(): negative offset (based on seek(SEEK_END), reserved to seekable streams)
  • mb_strimwidth() : negative 'start' and 'width'
  • mb_ereg_search_setpos(): Accept negative 'position' when search string is defined.
  • mb_strpos(): negative offset
  • mb_stripos(): negative offset

Notes

  • Nothing done for iconv_strrpos() because function does not accept an 'offset' arg (inconsistent with every other xxx_strrpos() functions but argument cannot be added without breaking BC).
  • file_get_contents() : 'maxlen' argument cannot support negative values because of stream filters.

Backward Incompatible Changes

This RFC extends the range of valid values. In most cases, negative values raise a warning message and offset are considered as null. The new behavior considers such values as valid (as long as they don't exceed the string length).

I consider these BC breaks as minor because, everywhere the behavior is modified, negative values were considered as invalid. So, we are replacing error cases only.

Proposed PHP Version(s)

7.1

RFC Impact

To SAPIs

None

To Existing Extensions

None

To Opcache

None

Open Issues

  • Update documentation (waiting for RFC approval)

Unaffected PHP Functionality

mbstring functions remain compatible with their ASCII counterpart, relative to the mbstring.func_overload ini setting.

Future Scope

None

Proposed Voting Choices

As this RFC adds support for negative string offsets in the language, it requires a 2/3 majority.

Patches and Tests

Pull Request : https://github.com/php/php-src/pull/1431 (final patch)

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

None

Rejected Features

Keep this updated with features that were discussed on the mail lists.

rfc/negative-string-offsets.1453562742.txt.gz · Last modified: 2017/09/22 13:28 (external edit)