rfc:cyclic-replace

This is an old revision of the document!


Add cyclic string replacements

  • Version: 1.5
  • Creation date: 2015-01-05
  • Last modification date : 2015-01-10
  • Author: François Laupretre, francois@tekwire.net
  • Status: Under discussion

Introduction

This RFC improves the str_replace() and str_ireplace() functions.

The additional feature is named 'cyclic replace' in the rest of the document.

Proposal

The idea originated as a feature request back in 2006 (https://bugs.php.net/bug.php?id=38685) and was recently revived, refined, and improved by the internals community.

In the current str_[i]replace() implementation, the case (string search and array replace) is supported but quite useless as, in this case, the replace array is converted to the 'Array' string and, then, string/string conversion is performed.

This RFC proposes that, in this case, the first occurence of search is replaced with the first element of the replace array, the second occurence with the second element, and so on. When we arrive at the end of the replace array, several behaviors can be chosen (loop, repeat last element, etc... see the options parameter below).

This is what we name 'cyclic replace'. Purists will note that it is really 'cyclic' only when we set the option to loop in the replace array but I didn't find a better name.

Note that replacements are done in array order. Key values are ignored in replace arrays.

So, features brought by this RFC to str_[i]replace() are :

When search is a string and replace an array, cyclic replace is performed. This is the 'first-level' case.

When search and replace are arrays, each element of the replace array can be a string or an array. If it is a string, we have the usual string/string behavior. If it is an array, cyclic replacement is performed. So, the 'array search' case can be seen as an implicit loop around the 'string search' case, providing exactly the same features. As this is performed using recursion, search can contain any level of imbricated arrays, provided replace provides the corresponding structure.

Note that, in case of multi-level arrays, replace array can be less deep than search array as, as soon as we find a non-array replace value, this value is used as replacement string for the whole corresponding search subtree. The opposite case is very different : If the replace array tree is deeper than the search tree by more than 1 level, array to string conversion will occur (with E_NOTICE and the wonderful 'Array' result).

Recursion also extends the behavior of 'non-cyclic' replacements as the previous implementation supported only one-level arrays for search and replace arguments.

Completely off topic but brings a cleaner design: subject arrays are now managed recursively too, returning exactly the same array structure and preserving keys. Only values are replaced.

Empty replace arrays are considered as unexpected. When one is provided, an E_WARNING error is raised and the input subject is returned as-is. If search is an array, this warning can be raised more than once during a single str_[i]replace() execution, as we raise it each time we meet an empty replace array.

Backward Incompatible Changes

C API

php_char_to_str_ex() and php_str_to_str_ex() (defined in ext/standard/php_string.h) take an additional options argument. This argument is not used at the moment but, this allows their API to remain compatible with the new new php_str_to_array_ex() function.

Note that the API for php_char_to_str() and php_str_to_str() is not modified.

PHP API

In str_[i]replace() functions, search as a string and replace as an array caused the replace array to be converted to string, giving 'Array' and raising an E_NOTICE.

Now, this combination of argument types causes the search string to be replaced with elements from the replace array.

Different behavior too each time we meet an empty array as replace value. Previously, as seen above, an E_NOTICE was raised and the array was converted to 'Array'. Now, an E_WARNING is raised and the subject is returned unchanged.

The support of multi-level arrays as search, replace, and subject brings the same kind of BC because previous implementations supported one array level only. So, if provided with multi-level arrays, array to string conversion would have been performed.

All these BC breaks are similar and deal with array to string conversions in previous implementations. I think we can consider them as very low impact because relying on an array-to-string conversion (with E_NOTICE) when calling these functions, while theoritically supported, can be considered as broken and very improbable.

Proposed PHP Version(s)

PHP 7

RFC Impact

To SAPIs

None

To Existing Extensions

Extensions using one of the C functions with a modified API (see BC changes in C API above) need to be adapted (adding a 0 final argument to each call). Only two occurences of such calls exist in the whole php-src tree (outside of string.c).

To Opcache

None

New Constants

C constants

These new C constants are defined in ext/standard/php_strings.h :

  • PHP_STR_ARRAY_REPLACE_STOP
  • PHP_STR_ARRAY_REPLACE_FIRST
  • PHP_STR_ARRAY_REPLACE_LAST
  • PHP_STR_ARRAY_REPLACE_LOOP
  • PHP_STR_ARRAY_REPLACE_EMPTY
  • PHP_STR_ARRAY_REPLACE_MASK
  • PHP_STR_ARRAY_REPLACE_MAX
  • PHP_REPLACE_MASK

PHP constants

New constants are defined to allow controlling how the replacements are done after a replace array is exhausted (when there are more occurences of search in the subject than the number of elements in the replace array). These constants are exclusive (they cannot be combined) :

  • STR_REPLACE_STOP : Stop replacements (up to count(replace) occurences of needle can be replaced)
  • STR_REPLACE_FIRST : Remaining occurences are replaced with the first element of the replace array.
  • STR_REPLACE_LAST : Remaining occurences are replaced with the last element of the replace array.
  • STR_REPLACE_LOOP : Loop and restart replacements with the first element of the replace array. Looping occurs as many times as needed.
  • STR_REPLACE_EMPTY : Remaining occurences are replaced with an empty string.

API changes

PHP API

An additional optional argument, named options in the documentation, is added at the end of the argument list for str_replace() and str_ireplace().

If set, its value must be one of the STR_REPLACE_xxx constants defined above.

When not set, the default value is STR_REPLACE_STOP.

C API

Adds the php_str_to_array() and php_str_to_array_ex() functions. These functions perform cyclic replacements on an input string.

Open Issues

None

Unaffected PHP Functionality

The C API for php_char_to_str() and php_str_to_str() is left unchanged.

Future Scope

Proposed:

  • Add (search=null, replace=array) syntax. Would mean that search=array_keys(replace). An array of (search => replace) elements would be, IMO, a more intuitive way to specify multiple replacements.
  • Add similar features (cyclic replacement and multi-level array recursion) to preg_replace() and preg_filter().

Proposed Voting Choices

Required majority : probably 2/3.

Patches and Tests

Pull request against current PHP7 branch : https://github.com/php/php-src/pull/980

When implementation will be complete, this PR is intended to be the final patch.

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Rejected Features

(Keep this updated with features that were discussed on the mail lists)

rfc/cyclic-replace.1420917236.txt.gz · Last modified: 2017/09/22 13:28 (external edit)