Table of Contents

PHP RFC: Big Integer Support

Introduction

Since the beginning, PHP has had only two numeric types: integer, and float. The former has been a platform-dependent C long, usually either 32-bit or 64-bit, and the latter has been a platform-dependent C double, usually an IEEE 754 double-precision floating-point number.

Both work relatively well, but beyond the maximum integer value on a specific platform, things get a bit messy. Typically, PHP will have integers overflow to floats, resulting in a loss of precision. Integer size is platform-specific, so code dealing with large integers won't act the same on a 32-bit machine versus a 64-bit machine.

Some applications need to deal with very large integers beyond 32-bit or 64-bit and for this they can resort to extensions like gmp. However, dealing with these so-called “big integers” or “bigints” is rather clumsy. You must write all your code to deal with them specifically, and you must create objects for them rather than simply using numeric literals like for the built-in integer and float types.

Hence, this RFC proposes the addition of built-in bigint support to PHP. Now, you can do operations with integers of any size, so long as you have enough memory. While there are now two types internally (long and bigint), userland code will continue to see only “integers”, and the two types will be indistinguishable.

The advantages of doing this are numerous. Now integers will always be consistent across platforms, with programmers not needing to worry about the size of a long – 32-bit, 64-bit or otherwise – on their platform. Operations, too, will always be consistent. This will help the portability of PHP code and mean less time wasted by programmers dealing with platform differences, strengthening PHP's cross-platform guarantees. Dealing with extremely large data sets becomes easier, as you no longer need to anticipate if your IDs will exceed 32 or 64 bits. Integer overflow is largely relegated to being an issue for internals programmers, as userland code will never have to deal with it, and there is no risk of a loss of precision as they will no longer become floats. All this combined is likely to make for more robust, less buggy applications. Finally, being able to deal with large integers “natively” makes PHP more attractive for web developers needing to do large integer math, such as applications dealing with currency, or perhaps statistical applications.

Proposal

New type

To complement the existing internal IS_LONG and IS_DOUBLE types, a new IS_BIGINT type is introduced. IS_BIGINT is a reference-counted, copy-on-write type which is not garbage collected, much like a string. Behind-the-scenes, the a bigint library - LibTomMath by default, but GMP can also be used - is used to implement it, but it is abstracted with a new family of zend_bigint_* functions and the zend_bigint type, which allows the aforementioned choice of libraries. As stated in the Introduction, no new userland type is added to PHP, and instead “integer” now covers two internal types: IS_LONG and IS_BIGINT. There should be no visible difference to userland code between these types. Internally, a new “fake type” is also added, namely IS_BIGINT_OR_LONG. This is used by a few functions dealing with conversions and casts, and is now the “type” that (integer) will cast to.

Type specifiers for zend_parse_parameters that previously yielded a long will continue to do so. The type specifiers i, for a bigint or a long, and I, for a bigint, are added, along with the corresponding Z_PARAM_BIGINT_OR_LONG(_EX) and Z_PARAM_BIGINT(_EX) FAST_ZPP macros.

Changes to operators for the sake of consistency

In order to make integer arithmetic consistent between longs and bigints, certain changes to existing operator behaviour will be made:

Standard library changes

Examples

Currently, if an integer gets too large in PHP, it becomes a float, accuracy is lost, and operations start behaving differently. Take this code for example:

$x = PHP_INT_MAX - 1;
var_dump($x);
$x++;
var_dump($x);
$x++;
var_dump($x);
$x++;
var_dump($x);

Under PHP 5.5 on a 64-bit machine, it produces the following result:

int(9223372036854775806)
int(9223372036854775807)
float(9.2233720368548E+18)
float(9.2233720368548E+18)

The last six digits are lost, and incrementing suddenly does nothing!

However, the output would be different with this RFC:

int(9223372036854775806)
int(9223372036854775807)
int(9223372036854775808)
int(9223372036854775809)

No digits are lost, incrementing still works, and it's still an integer. Under the hood, it may technically be a different type (depending on the platform), but from the user's perspective, it's still an integer, and it functions exactly the same.

This means you can do arbitrarily large integer operations with full accuracy, so long as there is enough memory available. For example:

$ php -r 'var_dump(10 ** 100);'
int(10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
$ php -r 'var_dump((1 << 67) >> 63);'
int(16)
$ php -r 'var_dump(2 ** 3 ** 4);'
int(2417851639229258349412352)
$ php -r 'var_dump((10 ** 100) % 10);'
int(0)
$ php -r 'var_dump(123098209381029380128301298301298309812098213);'
int(123098209381029380128301298301298309812098213)

This works consistently across platforms. So, it is possible to handle 64-bit integers with full precision on a 32-bit machine with exactly the same code - indeed, it does not matter how many bits are in the integer, so long as there is sufficient memory to store it. Every example above works on a 64-bit machine running OS X, but would equally function identically on a 32-bit Windows machine, or a 64-bit Linux server, or any other platform.

Backward Incompatible Changes

As mentioned before, the shift left and shift right operators act differently, as does pow for very large exponents.

Longs will no longer overflow to float, but instead become bigints (which, so far as userland cares, are just integers). Code expecting large integer literals to be floats will now end up with bigints instead, which might cause problems. However, if a float is still desired, this can be fixed simply by appending .0.

Internals changes

Some internal APIs, mostly ones dealing with numbers, will necessarily change their signatures or behaviour:

  1. For example, is_numeric_string/_ex now takes a zend_bigint** parameter
  2. The cast_object object handler now has to deal with IS_BIGINT_OR_LONG and IS_BIGINT

Proposed PHP Version(s)

This is proposed for the next PHP X, currently PHP 7. The patch is based on PHP master (originally, phpng).

RFC Impact

Performance

The performance penalties are minor for normal integer and float arithmetic. While left shifts and right shifts now require overflow checks, generally bigints will just take the place of floats in existing overflow checks so the performance impact is minimal.

Fatal errors

Unfortunately, bigints would introduce two new ways to cause fatal errors in PHP.

Firstly, if you do an operation resulting in an extremely large number, you might hit your request memory limit.

Secondly, when trying to calculate a value that would require more memory than size_t can describe, we have bail out and throw an E_ERROR with the message “Result of integer operation would be too large to represent”.

Licensing and dependency issues

To avoid implementing the underlying arithmetic itself, PHP needs to add a dependency on a library implementing arbitrary-precision integers.

This patch supports two different libraries, which you can choose between when compiling PHP:

A choice is allowed to avoid licensing issues with GMP: while it has better performance, it uses the GNU Lesser General Public License version 3, which may be unacceptable to some people. LibTomMath, by contrast, is very liberally licensed.

Arrays

Since HashTable has not been and will not be updated to support directly IS_BIGINT keys, indexing by an IS_BIGINT key must be handled somehow. The RFC proposes to simply convert the bigint to a string, thus $x[PHP_INT_MAX + 1] = 3; would be equivalent to $x[(string)(PHP_INT_MAX + 1)] = 3;. This is inconsistent with the behaviour of floats (which are blindly converted, wrapped and truncated by zend_dval_to_lval), but changing their behaviour might cause compatibility issues. If that became a problem, it could be addressed in a follow-up RFC.

To SAPIs

This should have no impact on existing SAPIs.

To Existing Extensions

Any extensions which request numeric parameters as zvals rather than longs or doubles from zend_parse_parameters will need changes. Those dealing with numerical operations specifically will require deeper changes. Obviously, ext/standard will need some updating.

ext/gmp will be updated to handle bigints. However, due to behavioural and implementation differences between GMP objects and the bigint type, it won't just pass through to the built-in operator functions. With the addition of bigints, ext/gmp would quickly become irrelevant except for backwards-compatibility with existing applications, and might eventually be moved to PECL.

Extensions dealing with parts of the Zend API that deal with numbers will need to be modified to deal with changes in signatures and behaviour. (See “Backwards Incompatible Changes”)

To Opcache

Both GMP and LibTomMath can only have one custom allocator, so I weighed the options and made that be emalloc, not malloc. I expect this would pose a problem for opcache, as any bigints would be destroyed upon the end of a request, so opcache would need to store bigints persistently. Hence, some sort of import/export mechanism could be added to zend_bigint. It is obviously possible to use strings, but gmp also has its own format for serialisation which would be more efficient, so that might be a good way.

The patch has not yet been updated to support opcache.

New Constants

None.

php.ini Defaults

No changes.

Open Issues

The patch is unfinished. Many tests are still broken and most extensions will need some updating. It does not work with opcache.

However, there are no open questions.

TODO

Must be done

Optional, possibly future work

Unaffected PHP Functionality

As previously mentioned, the handling of array keys might need to be looked at. Otherwise, it shouldn't affect the behaviour of other PHP functionality. Implementation-wise, if something manipulates zvals directly and looks at their types, then it needs updating for bigints.

Future Scope

None I can think of particularly.

Vote

As this is a language change (it affects the language specification), this requires a 2/3 majority. It is straight Yes/No vote to accepting the RFC.

Voting started on 2015-02-15 and was to end 10 days later on 2015-02-25, but voting was cancelled the same day it started.

Big Integer Support RFC
Real name Yes No
ab (ab)  
ajf (ajf)  
bwoebi (bwoebi)  
derick (derick)  
kalle (kalle)  
laruence (laruence)  
lcobucci (lcobucci)  
leigh (leigh)  
olemarkus (olemarkus)  
pajoye (pajoye)  
rasmus (rasmus)  
thesaur (thesaur)  
Final result: 7 5
This poll has been closed.

Patches and Tests

A work-in-progress, unfinished pull request for php-src is here: https://github.com/php/php-src/pull/876

The branch itself is here: https://github.com/TazeTSchnitzel/php-src/tree/bigint

The LibTomMath backend (the default) is a work-in-progress. Use --enable-bigint-gmp to use the GMP backend.

Many tests are still broken, as as mentioned previously, I still need to deal with extensions and opcache. It is very much unfinished, but it does work to a degree.

See the TODO section in Open Issues (above) for unfinished areas.

There is also an incomplete language specification pull request here, which currently lacks updated tests: https://github.com/php/php-langspec/pull/112

Implementation

If/when this is implemented, this section would/will contain

  1. the version(s) it was merged to
  2. a link to the git commit(s)
  3. a link to the PHP manual entry for the feature

References

Inspiration

Discussion

General

Changelog