rfc:php7_foreach

This is an old revision of the document!


PHP RFC: Fix "foreach" behavior

Introduction

The behavior of foreach statement in PHP for some edge cases is not exactly defined. Actually, it is implementation driven and quite weird.

Most these edge cases are related to manipulation with internal pointer and modification of elements of array, iterated by foreach. The behavior may depend on the value of reference-counter or the fact - if the value is a reference or not. I'll provide just few examples to demonstrate the existing inconsistencies.

Result of current() is undefined

$ php -r '$a = [1,2,3]; foreach($a as $v) {echo $v . " - " . current($a) . "\n";}'
1 - 2
2 - 2
3 - 2

$ php -r '$a = [1,2,3]; $b = $a; foreach($a as $v) {echo $v . " - " . current($a) . "\n";}'
1 - 1
2 - 1
3 - 1

unset() may exclude an element from iteration or not

$ php -r '$a = [1,2,3]; foreach($a as $v) {echo "$v\n"; unset($a[1]);}'
1
2
3

$ php -r '$a = [1,2,3]; $b = &$a; foreach($a as $v) {echo "$v\n"; unset($a[1]);}'
1
3

It's possible to write more inconsistent or strange examples...

Proposal

We propose consistent foreach statement behavior and provide efficient implementation.

The changes in beahvior affects only edge undefined cases and keeps the conceptual behavior unchanged.

At first, foreach statement may iterate:

  • by value - foreach ($a as $v)
  • by reference- foreach ($a as &$v)

At second, foreach statement may iterate over:

  • array - $x = [1,2,3]; foreach ($x as $v)
  • plain object - $x = new SomeClass(); foreach ($x as $v)
  • iterator object - $x = new SomeIteratorClass(); foreach ($x as $v)

Iteration over iterator objects (by value and by reference) is kept the same.

foreach() by value over arrays

foreach by value over array will never use or modify internal array pointer. It also won't duplicate array, it'll lock it instead (incrementing reference counter). This will lead to copy-on-write on attempt of array modification inside the loop. As result, we will always iterate over elements of array originally passed to foreach, ignoring any possible array changes. The following examples demonstrates the changes in behavior (in comparison to examples from introduction).

The value of internal pointer is unaffected

$ php -r '$a = [1,2,3]; foreach($a as $v) {echo $v . " - " . current($a) . "\n";}'
1 - 1
2 - 1
3 - 1

Modifications of the original array are ignored

$ php -r '$a = [1,2,3]; $b = &$a; foreach($a as $v) {echo "$v\n"; unset($a[1]);}'
1
2
3

foreach() by value over plain objects

TODO

foreach() by reference over arrays

In most cases it repeats the PHP5 behavior.

foreach by reference over array modifys internal pointer on each iteration. Exactly as it was implemented in PHP5, it sets the internal pointer to the following element.

$ php -r '$a = [1,2,3]; foreach($a as &$v) {echo $v . " - " . current($a) . "\n"; }'
1 - 2
2 - 3
3 - 

Modification of internal array pointer through next() and family doesn't affect foreach pointer. On next iteration internal pointer is restored to the value of foreach pointer. This is also the default PHP5 behavior.

$ php -r '$a = [1,2,3,4]; foreach($a as &$v) {echo "$v - "; next($a); var_dump(current($a));}'
1 - int(3)
2 - int(4)
3 - bool(false)
4 - bool(false)

Deletion of the next element referred by foreach pointer leads to skipping it (in the same way as as in PHP5).

$ sapi/cli/php -r '$a = [1,2,3]; foreach($a as &$v) {echo "$v\n"; unset($a[1]);}'
1
3

Adding new elements after the current foreach pointer adds them to iteration (the same as in PHP5)

$ php -r '$a = [1,2]; foreach($a as &$v) {echo "$v\n"; $a[2]=3;}'
1
2
3

Adding new elements after the current foreach pointer when we are already at the end adds them to iteration as well (this didn't work in PHP5)

$ php -r '$a = [1]; foreach($a as &$v) {echo "$v\n"; $a[1]=2;}'
1
2

Replacing iterated array with another array lead to continue iteration over the new array starting from its internal pointer (the same as in PHP5)

$ php -r '$a = [1,2]; $b = [3,4]; next($b); foreach($a as &$v) {echo "$v\n"; $a = $b;}'
1
4

In case we have several forech by reference statements over the same array each of them works according to the rules above, independently from the others. (It didn't work in PHP5)

<?php
$a = [0, 1, 2, 3];
foreach ($a as &$x) {
	foreach ($a as &$y) {
		echo "$x - $y\n";
		if ($x == 0 && $y == 1) {
			unset($a[1]);
			unset($a[2]);
		}
	}
}
$ php test.php 
0 - 0
0 - 1
0 - 3
3 - 0
3 - 3

foreach() by reference over plain objects

It beahves in the same way as foreach by reference over array.

Implementation Details

The existing FE_RESET/FE_FETCH opcodes are split into separate FE_RESET_R/FE_FETCH_R opcodes used to implement foreach by value and FE_RESET_RW/FE_FETCH_RW to implement foreach by reference. The suffix _R means that we use array (or object) only for reading, and suffix _RW that we also may indirectly modify it. A new FE_FREE opcode is introduced. It's used at the end of foreach by reference loops, instead of FREE opcode.

Iteration by value don't use or modify internal array (or object) pointer. The value of the pointer is kept in reserved space of temporary variable used for iteration. It's acceptable through Z_FE_POS() macro.

Iteration by reference implemented using special HashTableIterator structures.

typedef struct _HashTableIterator {
	HashTable    *ht;
	HashPosition  pos;
} HashTableIterator;

On entrance into foreach by reference FE_RESET_RW opcode creates and initializes a new iterator, increments the counter of such iterators for given HachTable and store its index in reserved space of temporary variable used for iteration. On exit, FE_FREE opcode removes corresponding iterator and decrements the iterators counter of corresponding hash table.

Iterators are actually allocated in a buffer - EG(ht_iterators), represented by plain array. The more nested foreach by reference iterators the bigger buffer we will need. We start with small preallocated buffer - EG(ht_iterators_slots), and then extend it if necessary in heap. EG(ht_iterators_count) keeps the number of available slots for iterators, EG(ht_iterators_used) - the number of used slots.

struct _zend_executor_globals {
	...
	uint32_t           ht_iterators_count;     /* number of allocatd slots */
	uint32_t           ht_iterators_used;      /* number of used slots */
	HashTableIterator *ht_iterators;
	HashTableIterator  ht_iterators_slots[16];
	...
}

Creation, deletion and accessing iterators position is implemented through special API.

ZEND_API uint32_t     zend_hash_iterator_add(HashTable *ht);
ZEND_API HashPosition zend_hash_iterator_pos(uint32_t idx, HashTable *ht);
ZEND_API void         zend_hash_iterator_del(uint32_t idx);

Indirect modification of iterators positions implemented through zend_hash_iterators_update(). It's called when HashTable modification may affects iterator position. For example when element referred by iterator is inserted, or when iterator is set at the end of the array and new element is inserted.

ZEND_API void         zend_hash_iterators_update(HashTable *ht, HashPosition pos);

Foe more details see zend_hash_iterators_*() functions implementation in zend_hash.c

Backward Incompatible Changes

Some rare cases where the foreach statement behavior was undefined may be changed. The implementation changes few such PHPT tests.

Proposed PHP Version(s)

PHP7

RFC Impact

To Performance

New behavior allows elimination of array duplication and this should lead to better performance, on the other hand some HashTable operations require additional check(s). Anyway, for Wordpress-3.6 the proposed patch reduces the number of executed CPU instructions by ~1%. It saves about 200 array duplications and destructions per each request to home page.

To Opcache

OPCache has to support new opcodes. All necessary OPCache changes are provided with the proposed implementation

Open Issues

  • complete RFC
  • complete implementation
  • performance optimization

Future Scope

This sections details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Proposed Voting Choices

The vote is a straight Yes/No vote, that requires a 2/3 majority

Patches and Tests

Pull request for master branch: https://github.com/php/php-src/pull/1034

It doesn't solve all the problems yet.

Implementation

After the project is implemented, this section should contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Links to external references, discussions or RFCs

rfc/php7_foreach.1422557124.txt.gz · Last modified: 2017/09/22 13:28 (external edit)