rfc:add_str_starts_with_and_ends_with_functions

PHP RFC: Add str_starts_with() and str_ends_with() functions

Introduction

str_starts_with checks if a string begins with another string and returns a boolean value (true/false) whether it does.
str_ends_with checks if a string ends with another string and returns a boolean value (true/false) whether it does.

Typically this functionality is accomplished by using existing string functions such as substr, strpos/strrpos, strncmp, or substr_compare (often combined with strlen). These bespoke userland implementations have various downsides, discussed further below.

The str_starts_with and str_ends_with functionality is so commonly needed that many major PHP frameworks support it, including Symfony, Laravel, Yii, FuelPHP, and Phalcon 1).

Checking the start and end of strings is a very common task which should be easy. Accomplishing this task is not easy now and that is why many frameworks have chosen to include it. This is also why other high-level programming languages---as diverse as JavaScript, Java, Haskell, and Matlab---have implemented this functionality. Checking the start and end of a string should not be a task which requires pulling in a PHP framework or developing a potentially suboptimal (or worse, buggy) function in userland.

Downsides of Common Userland Approaches

Ad hoc userland implementations of this functionality are less intuitive than dedicated functions (this is especially true for new PHP developers and developers who frequently switch between PHP and other languages---many of which include this functionality natively).
The implementation is also easy to get wrong (especially with the === comparison).
Additionally, there are performance issues with many userland implementations.

Note: some implementations add “$needle === "" || ” and/or “strlen($needle) <= strlen($haystack) && ” guards to handle empty needle values and/or avoid warnings.

str_starts_with

substr($haystack, 0, strlen($needle)) === $needle

This is memory inefficient because it requires an unnecessary copy of part of the haystack.

strpos($haystack, $needle) === 0

This is potentially CPU inefficient because it will unnecessarily search along the whole haystack if it doesn't find the needle.

strncmp($haystack, $needle, strlen($needle)) === 0 // generic
strncmp($subject, "prefix", 6) === 0 // ad hoc

This is efficient but requires providing the needle length as a separate argument, which is either verbose (repeat “$needle”) or error prone (hard-coded number).

str_ends_with

substr($haystack, -strlen($needle)) === $needle

This is memory inefficient (see above).

strpos(strrev($haystack), strrev($needle)) === 0

This is doubly inefficient because it requires reversing both the haystack and the needle as well as applying strpos (see above).

strrpos($haystack, $needle) === strlen($haystack) - strlen($needle)

This is verbose and also potentially CPU inefficient.

substr_compare($haystack, $needle, -strlen($needle)) === 0 // generic
substr_compare($subject, "suffix", -6) === 0 // ad hoc

This is efficient but either verbose or error prone (see strncmp above).

Proposal

Add two new basic functions: str_starts_with and str_ends_with:

str_starts_with ( string $haystack , string $needle ) : bool
str_ends_with ( string $haystack , string $needle ) : bool

str_starts_with() checks if $haystack begins with $needle. If $needle is longer than $haystack, it returns false; else, it compares each character in $needle with the corresponding character in $haystack (aligning both strings at their start), returning false if it encounters a mismatch, and true otherwise.
str_ends_with() does the same thing but aligning both strings at their end.

Examples below:

$str = "beginningMiddleEnd";
if (str_starts_with($str, "beg")) echo "printed\n";
if (str_starts_with($str, "Beg")) echo "not printed\n";
if (str_ends_with($str, "End")) echo "printed\n";
if (str_ends_with($str, "end")) echo "not printed\n";
 
// empty strings:
if (str_starts_with("a", "")) echo "printed\n";
if (str_starts_with("", "")) echo "printed\n";
if (str_starts_with("", "a")) echo "not printed\n";
if (str_ends_with("a", "")) echo "printed\n";
if (str_ends_with("", "")) echo "printed\n";
if (str_ends_with("", "a")) echo "not printed\n";

Note: the behavior concerning empty strings is in accordance with what is described in the accepted str_contains RFC. This behavior is also the same as is common with other languages, including Java and Python.

Backward Incompatible Changes

This could break functions existing in userland with the same names. But see the corresponding section in the str_contains RFC for a discussion illustrating how this concern may be mitigated and why it does not justify the rejection of this RFC.

Proposed PHP Version(s)

PHP 8

RFC Impact

  • To SAPIs: Will add the aforementioned functions to all PHP environments.
  • To Existing Extensions: None.
  • To Opcache: No effect.
  • New Constants: No new constants.
  • php.ini Defaults: No changed php.ini settings.

Votes

Voting closes 2020-05-04

Add str_starts_with and str_ends_with as described
Real name yes no
alcaeus (alcaeus)  
alec (alec)  
as (as)  
asgrim (asgrim)  
ashnazg (ashnazg)  
beberlei (beberlei)  
brzuchal (brzuchal)  
bwoebi (bwoebi)  
carusogabriel (carusogabriel)  
cmb (cmb)  
daverandom (daverandom)  
derick (derick)  
duncan3dc (duncan3dc)  
duodraco (duodraco)  
ekin (ekin)  
galvao (galvao)  
geekcom (geekcom)  
girgias (girgias)  
jasny (jasny)  
jbnahan (jbnahan)  
jhdxr (jhdxr)  
kalle (kalle)  
kelunik (kelunik)  
kguest (kguest)  
klaussilveira (klaussilveira)  
kocsismate (kocsismate)  
kriscraig (kriscraig)  
lcobucci (lcobucci)  
malukenho (malukenho)  
marandall (marandall)  
mariano (mariano)  
mcmic (mcmic)  
mfonda (mfonda)  
mike (mike)  
narf (narf)  
nicolasgrekas (nicolasgrekas)  
nikic (nikic)  
ocramius (ocramius)  
pajoye (pajoye)  
petk (petk)  
pmmaga (pmmaga)  
pollita (pollita)  
ramsey (ramsey)  
reywob (reywob)  
ruudboon (ruudboon)  
salathe (salathe)  
sergey (sergey)  
stas (stas)  
svpernova09 (svpernova09)  
tandre (tandre)  
thorstenr (thorstenr)  
tiffany (tiffany)  
trowski (trowski)  
yunosh (yunosh)  
zimt (zimt)  
Final result: 51 4
This poll has been closed.

Patches and Tests

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Rejected Features

1)
some of those links are for str_starts_with functionality, but the mentioned frameworks also contain str_ends_with functionality, often visible on the same web page
rfc/add_str_starts_with_and_ends_with_functions.txt · Last modified: 2020/05/05 14:12 by nikic