rfc:integer_semantics

PHP RFC: Integer Semantics

Introduction

This RFC improves cross-platform consistency in PHP for some operations dealing with integers, and makes PHP's behaviour more intuitive, as well as partly paving the way for Big Integer Support.

Proposal

This RFC proposes changes to certain behaviours related to integers:

  1. Instead of being undefined and platform-dependant, NaN and Infinity will always be zero when casted to integer
  2. Bitwise shifts by negative numbers of bits will be disallowed (throws E_WARNING and gives FALSE, like a division by zero)
  3. Left bitwise shifts by a number of bits beyond the bit width of an integer will always result in 0, even on CPUs which wrap around
  4. Right bitwise shifts by a number of bits beyond the bit width of an integer will always result in 0 or -1 (depending on sign), even on CPUs which wrap around

Examples

For all “was” results, these are only on my machine, as they would differ across platforms. All “now” results are the same on all platforms.

// Was: int(-9223372036854775808)
// Now: int(0)
var_dump((int)NAN);
// Was: int(-9223372036854775808)
// Now: int(0)
var_dump((int)INF);
// Was: int(4611686018427387904)
// Now: bool(false) and E_WARNING
var_dump(1 << -2);
// Was: int(8)
// Now: int(0)
var_dump(8 >> 64);

Rationale

This RFC tries to clean up some integer edge cases and make integers more consistent across platforms. As PHP is a high-level language, we ought to abstract away implementation differences, otherwise we make it difficult to write code that runs consistently across platforms, something which is an key requirement for PHP.

“Negative shifts” do not do what users would reasonably expect them to do: shift in the opposite direction. Rather, a negative shift is usually a shift by the 2's complement unsigned integer representation (in the case of -2, this would be 18446744073709551614 when integers are 64-bit). This is also reliant on undefined behaviour in C, and will give different results depending on the processor and integer size. For this reason, we now disallow such shifts.

On Intel CPUs, a bitwise shift by a number of bits that is greater than the bit width of an integer (e.g. » 65 on a 64-bit machine) will “wrap around” (e.g. » 65 is effectively » 1). To ensure cross-platform consistency, we ensure that such shifts will always result in zero (for left shifts), or zero or negative one (for right shifts, depending on the sign of the number being shifted). It is worth noting that shifts of a number of bits greater than the bit width of an integer is also undefined behaviour in C.

Making NaN and Infinity always become zero when casted to integer means more cross-platform consistency, and is also less surprising than what is currently produces, where NaN produces the minimum integer on my machine (-9223372036854775808).

The changes in this RFC are all backported from my Big Integer Support RFC. Alongside cleaning up some aspects of how integers work, this RFC partly paves the way for bigints. This RFC also helps clear up some undefined behaviour in the PHP specification by explicitly defining it.

Proposed PHP Version(s)

The next major release of PHP, currently PHP 7.

Unaffected PHP Functionality

Integer to float conversion is untouched. Despite some people's misconceptions, the bitwise shift operators do not operate on strings like the other bitwise operators do, so I have not affected how they deal with strings as they didn't in the first place (they cast to integer). This does not touch the behaviour of array key casting (except for Infinity and NaN), although this is something I would like to do

Proposed Voting Choices

This is a language change, so it requires a 2/3 majority. A straight Yes/No vote would be held.

Patches and Tests

A pull request is here: https://github.com/php/php-src/pull/781

The patch is working and has tests, but code review would be appreciated.

Implementation

Merged into master. It will be in PHP 7.

It is not yet in the manual.

Vote

As this is a language change, a 2/3 majority is required.

Voting started 2014-09-14 and ended on 2014-09-21.

Accept the integer semantics RFC and merge patch into master?
Real name Yes No
ab (ab)  
ajf (ajf)  
brianlmoon (brianlmoon)  
bwoebi (bwoebi)  
datibbaw (datibbaw)  
daverandom (daverandom)  
demon (demon)  
derick (derick)  
dm (dm)  
dragoonis (dragoonis)  
googleguy (googleguy)  
jedibc (jedibc)  
jpauli (jpauli)  
laruence (laruence)  
levim (levim)  
lstrojny (lstrojny)  
neeke (neeke)  
nikic (nikic)  
peehaa (peehaa)  
rdlowrey (rdlowrey)  
salathe (salathe)  
stas (stas)  
yumin1985 (yumin1985)  
zhangzhenyu (zhangzhenyu)  
Final result: 16 8
This poll has been closed.

References

Rejected Features

Keep this updated with features that were discussed on the mail lists.

Changelog

  • v0.2.3 - Removed “Open Questions/Possible Future Scope” to avoid confusion
  • v0.2.2 - Added examples
  • v0.2.1 - Open Questions/Possible Future Scope added
  • v0.2 - Dropped zend_parse_parameters change, fixed left shift too
  • v0.1.1 - Introduction added
  • v0.1 - Initial version
rfc/integer_semantics.txt · Last modified: 2014/09/21 01:50 by ajf