Table of Contents

PHP RFC: Fix "foreach" behavior

Introduction

The behavior of foreach statement in PHP for some edge cases is not exactly defined. Actually, it is implementation driven and quite weird.

Most these edge cases are related to manipulation with internal pointer and modification of elements of array, iterated by foreach. The behavior may depend on the value of reference-counter or the fact - if the value is a reference or not. I'll provide just few examples to demonstrate the existing inconsistencies.

Result of current() is undefined

$ php -r '$a = [1,2,3]; foreach($a as $v) {echo $v . " - " . current($a) . "\n";}'
1 - 2
2 - 2
3 - 2

$ php -r '$a = [1,2,3]; $b = $a; foreach($a as $v) {echo $v . " - " . current($a) . "\n";}'
1 - 1
2 - 1
3 - 1

unset() may exclude an element from iteration or not

$ php -r '$a = [1,2,3]; foreach($a as $v) {echo "$v\n"; unset($a[1]);}'
1
2
3

$ php -r '$a = [1,2,3]; $b = &$a; foreach($a as $v) {echo "$v\n"; unset($a[1]);}'
1
3

It's possible to write more inconsistent or strange examples...

Proposal

We propose consistent foreach statement behavior and provide efficient implementation.

The changes in beahvior affects only edge undefined cases and keeps the conceptual behavior unchanged.

At first, foreach statement may iterate:

At second, foreach statement may iterate over:

Iteration over iterator objects (by value and by reference) is kept the same.

foreach() by value over arrays

foreach by value over array will never use or modify internal array pointer. It also won't duplicate array, it'll lock it instead (incrementing reference counter). This will lead to copy-on-write on attempt of array modification inside the loop. As result, we will always iterate over elements of array originally passed to foreach, ignoring any possible array changes. The following examples demonstrates the changes in behavior (in comparison to examples from introduction).

The value of internal pointer is unaffected

$ php -r '$a = [1,2,3]; foreach($a as $v) {echo $v . " - " . current($a) . "\n";}'
1 - 1
2 - 1
3 - 1

Modifications of the original array are ignored

$ php -r '$a = [1,2,3]; $b = &$a; foreach($a as $v) {echo "$v\n"; unset($a[1]);}'
1
2
3

foreach() by reference over arrays

In most cases it repeats the PHP5 behavior.

foreach by reference over array modifys internal pointer on each iteration. Exactly as it was implemented in PHP5, it sets the internal pointer to the following element.

$ php -r '$a = [1,2,3]; foreach($a as &$v) {echo $v . " - " . current($a) . "\n"; }'
1 - 2
2 - 3
3 - 

Modification of internal array pointer through next() and family doesn't affect foreach pointer. On next iteration internal pointer is restored to the value of foreach pointer. This is also the default PHP5 behavior.

$ php -r '$a = [1,2,3,4]; foreach($a as &$v) {echo "$v - "; next($a); var_dump(current($a));}'
1 - int(3)
2 - int(4)
3 - bool(false)
4 - bool(false)

Deletion of the next element referred by foreach pointer leads to skipping it (in the same way as as in PHP5).

$ sapi/cli/php -r '$a = [1,2,3]; foreach($a as &$v) {echo "$v\n"; unset($a[1]);}'
1
3

Adding new elements after the current foreach pointer adds them to iteration (the same as in PHP5)

$ php -r '$a = [1,2]; foreach($a as &$v) {echo "$v\n"; $a[2]=3;}'
1
2
3

Adding new elements after the current foreach pointer when we are already at the end adds them to iteration as well (this didn't work in PHP5)

$ php -r '$a = [1]; foreach($a as &$v) {echo "$v\n"; $a[1]=2;}'
1
2

Replacing iterated array with another array lead to continue iteration over the new array starting from its internal pointer (the same as in PHP5)

$ php -r '$a = [1,2]; $b = [3,4]; next($b); foreach($a as &$v) {echo "$v\n"; $a = $b;}'
1
4

In case we have several forech by reference statements over the same array each of them works according to the rules above, independently from the others. (It didn't work in PHP5)

<?php
$a = [0, 1, 2, 3];
foreach ($a as &$x) {
	foreach ($a as &$y) {
		echo "$x - $y\n";
		if ($x == 0 && $y == 1) {
			unset($a[1]);
			unset($a[2]);
		}
	}
}
$ php test.php 
0 - 0
0 - 1
0 - 3
3 - 0
3 - 3

Modification of array, iterated through foreach by reference, using internal functions like array_pop(), array_push(), array_shift(), array_unshift() works consistently. These functions preserve the current foreach position or move it to the following element, if the current is deleted. (It didn't work in PHP5)

$ php -r '$a=[1,2,3,4]; foreach($a as &$v) { echo "$v\n"; array_pop($a);}'
1
2

foreach() by value over plain objects

It beahves in the same way as foreach by reference over array, but using object value instead of reference. As result the object can be modified, but can't be replaced.

foreach() by reference over plain objects

It beahves in the same way as foreach by reference over array.

Implementation Details

The existing FE_RESET/FE_FETCH opcodes are split into separate FE_RESET_R/FE_FETCH_R opcodes used to implement foreach by value and FE_RESET_RW/FE_FETCH_RW to implement foreach by reference. The suffix _R means that we use array (or object) only for reading, and suffix _RW that we also may indirectly modify it. A new FE_FREE opcode is introduced. It's used at the end of foreach loops, instead of FREE opcode.

Iteration by value over array doesn't use or modify internal array pointer. The value of the pointer is kept in reserved space of temporary variable used for iteration. It's acceptable through Z_FE_POS() macro.

Iteration by reference or by value over plain object implemented using special HashTableIterator structures.

typedef struct _HashTableIterator {
	HashTable    *ht;
	HashPosition  pos;
} HashTableIterator;

On entrance into foreach loop FE_RESET_R/RW opcode creates and initializes a new iterator and stores its index in reserved space of temporary variable used for iteration. On exit, FE_FREE opcode removes corresponding iterator.

Iterators are actually allocated in a buffer - EG(ht_iterators), represented by plain array. The more nested foreach by reference iterators the bigger buffer we will need. We start with small preallocated buffer - EG(ht_iterators_slots), and then extend it if necessary in heap. EG(ht_iterators_count) keeps the number of available slots for iterators, EG(ht_iterators_used) - the number of used slots.

struct _zend_executor_globals {
	...
	uint32_t           ht_iterators_count;     /* number of allocatd slots */
	uint32_t           ht_iterators_used;      /* number of used slots */
	HashTableIterator *ht_iterators;
	HashTableIterator  ht_iterators_slots[16];
	...
}

Creation, deletion and accessing iterators position is implemented through special API.

ZEND_API uint32_t     zend_hash_iterator_add(HashTable *ht);
ZEND_API HashPosition zend_hash_iterator_pos(uint32_t idx, HashTable *ht);
ZEND_API void         zend_hash_iterator_del(uint32_t idx);

Indirect modification of iterators positions implemented through zend_hash_iterators_update(). It's called when HashTable modification may affects iterator position. For example when element referred by iterator is inserted, or when iterator is set at the end of the array and new element is inserted.

ZEND_API void         zend_hash_iterators_update(HashTable *ht, HashPosition from, HashPosition to);

Foe more details see zend_hash_iterators_*() functions implementation in zend_hash.c

Backward Incompatible Changes

Some rare cases where the foreach statement behavior was undefined may be changed. The implementation changes few such PHPT tests. The list and explanation follows:

Additional Behavoir Change

With new implementation it's quite easy to stop using internal array/object pointer even for *foreach be referece*. It means that reset/key/current/next/prev function will be completely independent from the sate of *foreach* iterator. This would change the output of few examples above.

foreach (even foreach by reference) won't affect internal array pointer

$ php -r '$a = [1,2,3]; foreach($a as &$v) {echo $v . " - " . current($a) . "\n"; }'
1 - 1
2 - 1
3 - 1

Modification of internal array pointer through next() and family doesn't affect foreach pointer. But it also won't be affected by the value of forech pointer.

$ php -r '$a = [1,2,3,4]; foreach($a as &$v) {echo "$v - "; next($a); var_dump(current($a));}'
1 - int(2)
2 - int(3)
3 - int(4)
4 - bool(false)

Proposed PHP Version(s)

PHP7

RFC Impact

To Performance

New behavior allows elimination of array duplication and this should lead to better performance, on the other hand some HashTable operations require additional check(s). Anyway, for Wordpress-3.6 the proposed patch reduces the number of executed CPU instructions by ~1%. It saves about 200 array duplications and destructions per each request to home page.

To Opcache

OPCache has to support new opcodes. All necessary OPCache changes are provided with the proposed implementation

Open Issues

Future Scope

This sections details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Proposed Voting Choices

The vote is a straight Yes/No vote, that requires a 2/3 majority

Fix foreach behavoir?
Real name Yes No
ajf (ajf)  
andi (andi)  
auroraeosrose (auroraeosrose)  
bwoebi (bwoebi)  
crodas (crodas)  
dmitry (dmitry)  
dragoonis (dragoonis)  
francois (francois)  
galvao (galvao)  
guilhermeblanco (guilhermeblanco)  
gwynne (gwynne)  
irker (irker)  
jedibc (jedibc)  
jpauli (jpauli)  
kalle (kalle)  
klaussilveira (klaussilveira)  
krakjoe (krakjoe)  
kriscraig (kriscraig)  
laruence (laruence)  
lcobucci (lcobucci)  
leigh (leigh)  
levim (levim)  
lstrojny (lstrojny)  
mbeccati (mbeccati)  
mike (mike)  
mrook (mrook)  
nikic (nikic)  
ramsey (ramsey)  
rasmus (rasmus)  
rdohms (rdohms)  
reeze (reeze)  
sebastian (sebastian)  
treffynnon (treffynnon)  
yohgaki (yohgaki)  
zeev (zeev)  
Final result: 34 1
This poll has been closed.

The second (Yes/No 50%+1) question is - if we should stop modifying internal array/object pointer in foreach.

Stop using internal array/object pointer in foreach by reference?
Real name Yes No
ajf (ajf)  
auroraeosrose (auroraeosrose)  
bwoebi (bwoebi)  
crodas (crodas)  
dmitry (dmitry)  
dragoonis (dragoonis)  
francois (francois)  
galvao (galvao)  
guilhermeblanco (guilhermeblanco)  
gwynne (gwynne)  
irker (irker)  
jedibc (jedibc)  
jpauli (jpauli)  
kalle (kalle)  
kinncj (kinncj)  
klaussilveira (klaussilveira)  
krakjoe (krakjoe)  
kriscraig (kriscraig)  
laruence (laruence)  
lcobucci (lcobucci)  
leigh (leigh)  
levim (levim)  
lstrojny (lstrojny)  
mbeccati (mbeccati)  
mike (mike)  
mrook (mrook)  
nikic (nikic)  
ramsey (ramsey)  
rasmus (rasmus)  
rdohms (rdohms)  
reeze (reeze)  
sebastian (sebastian)  
treffynnon (treffynnon)  
yohgaki (yohgaki)  
zeev (zeev)  
Final result: 34 1
This poll has been closed.

The vote will end on February 12.

Patches and Tests

Pull request for master branch: https://github.com/php/php-src/pull/1034

The implementation of additional idea is trivial https://gist.github.com/dstogov/63b269207ba0aed8b776

Implementation

The RFC implemented in PHP7 with two commits:

97fe15db4356f8fa1b3b8eb9bb1baa8141376077

4d2a575db2ac28c9acede4a85152bcec342c4a1d