Since the beginning, PHP has had only two numeric types: integer, and float. The former has been a platform-dependent C long, usually either 32-bit or 64-bit, and the latter has been a platform-dependent C double, usually an IEEE 754 double-precision floating-point number.
Both work relatively well, but beyond the maximum integer value on a specific platform, things get a bit messy. Typically, PHP will have integers overflow to floats, resulting in a loss of precision. Integer size is platform-specific, so code dealing with large integers won't act the same on a 32-bit machine versus a 64-bit machine.
Some applications need to deal with very large integers beyond 32-bit or 64-bit and for this they can resort to extensions like gmp. However, dealing with these so-called “big integers” or “bigints” is rather clumsy. You must write all your code to deal with them specifically, and you must create objects for them rather than simply using numeric literals like for the built-in integer and float types.
Hence, this RFC proposes the addition of built-in bigint support to PHP. Now, you can do operations with integers of any size, so long as you have enough memory. While there are now two types internally (long and bigint), userland code will continue to see only “integers”, and the two types will be indistinguishable.
The advantages of doing this are numerous. Now integers will always be consistent across platforms, with programmers not needing to worry about the size of a long – 32-bit, 64-bit or otherwise – on their platform. Operations, too, will always be consistent. This will help the portability of PHP code and mean less time wasted by programmers dealing with platform differences, strengthening PHP's cross-platform guarantees. Dealing with extremely large data sets becomes easier, as you no longer need to anticipate if your IDs will exceed 32 or 64 bits. Integer overflow is largely relegated to being an issue for internals programmers, as userland code will never have to deal with it, and there is no risk of a loss of precision as they will no longer become floats. All this combined is likely to make for more robust, less buggy applications. Finally, being able to deal with large integers “natively” makes PHP more attractive for web developers needing to do large integer math, such as applications dealing with currency, or perhaps statistical applications.
To complement the existing internal IS_LONG and IS_DOUBLE types, a new IS_BIGINT type is introduced. IS_BIGINT is a reference-counted, copy-on-write type which is not garbage collected, much like a string. Behind-the-scenes, the a bigint library - LibTomMath by default, but GMP can also be used - is used to implement it, but it is abstracted with a new family of zend_bigint_* functions and the zend_bigint type, which allows the aforementioned choice of libraries. As stated in the Introduction, no new userland type is added to PHP, and instead “integer” now covers two internal types: IS_LONG and IS_BIGINT. There should be no visible difference to userland code between these types. Internally, a new “fake type” is also added, namely IS_BIGINT_OR_LONG. This is used by a few functions dealing with conversions and casts, and is now the “type” that (integer)
will cast to.
Type specifiers for zend_parse_parameters that previously yielded a long will continue to do so. The type specifiers i
, for a bigint or a long, and I
, for a bigint, are added, along with the corresponding Z_PARAM_BIGINT_OR_LONG
(_EX
) and Z_PARAM_BIGINT
(_EX
) FAST_ZPP
macros.
In order to make integer arithmetic consistent between longs and bigints, certain changes to existing operator behaviour will be made:
(1 << 67) >> 66
will result in 2
.*
*
) operator will now error when an exponent too large is used if it is dealing with an integer. This is because both GMP and LibTomMath can't handle exponents beyond the size of an unsigned long. This restriction will not occur when using the pow operator when either operand is a float.rand
, srand
and getrandmax
are unchanged, because C's random number generator has no support for arbitrary-precision integersmt_rand
, mt_srand
and mt_getrandmax
are unchanged, because PHP's random number generator always produces a fixed-size valueintdiv
supports big integers and will no longer return 0
for intdiv(PHP_INT_MIN, -1)
(this is not a BC break assuming this RFC is accepted for PHP 7, because intdiv
is a function introduced in PHP 7)abs
, max
and min
gain big integer supportpow
gains big integer support as a result of the *
*
operator being updatedarray_sum
and array_product
are now implemented in the patch using add_function
and mul_function
, respectively. This means that they now support not only bigints, but also internal objects with operator overloadingdecbin
, decoct
, dechex
TBDgettype
, settype
, var_dump
, var_export
, print_r
, is_int
/is_integer
/is_long
and debug_zval_dump
gain bigint supportCurrently, if an integer gets too large in PHP, it becomes a float, accuracy is lost, and operations start behaving differently. Take this code for example:
$x = PHP_INT_MAX - 1; var_dump($x); $x++; var_dump($x); $x++; var_dump($x); $x++; var_dump($x);
Under PHP 5.5 on a 64-bit machine, it produces the following result:
int(9223372036854775806) int(9223372036854775807) float(9.2233720368548E+18) float(9.2233720368548E+18)
The last six digits are lost, and incrementing suddenly does nothing!
However, the output would be different with this RFC:
int(9223372036854775806) int(9223372036854775807) int(9223372036854775808) int(9223372036854775809)
No digits are lost, incrementing still works, and it's still an integer. Under the hood, it may technically be a different type (depending on the platform), but from the user's perspective, it's still an integer, and it functions exactly the same.
This means you can do arbitrarily large integer operations with full accuracy, so long as there is enough memory available. For example:
$ php -r 'var_dump(10 ** 100);' int(10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000) $ php -r 'var_dump((1 << 67) >> 63);' int(16) $ php -r 'var_dump(2 ** 3 ** 4);' int(2417851639229258349412352) $ php -r 'var_dump((10 ** 100) % 10);' int(0) $ php -r 'var_dump(123098209381029380128301298301298309812098213);' int(123098209381029380128301298301298309812098213)
This works consistently across platforms. So, it is possible to handle 64-bit integers with full precision on a 32-bit machine with exactly the same code - indeed, it does not matter how many bits are in the integer, so long as there is sufficient memory to store it. Every example above works on a 64-bit machine running OS X, but would equally function identically on a 32-bit Windows machine, or a 64-bit Linux server, or any other platform.
As mentioned before, the shift left and shift right operators act differently, as does pow for very large exponents.
Longs will no longer overflow to float, but instead become bigints (which, so far as userland cares, are just integers). Code expecting large integer literals to be floats will now end up with bigints instead, which might cause problems. However, if a float is still desired, this can be fixed simply by appending .0
.
Some internal APIs, mostly ones dealing with numbers, will necessarily change their signatures or behaviour:
zend_bigint*
*
parameterIS_BIGINT_OR_LONG
and IS_BIGINT
This is proposed for the next PHP X, currently PHP 7. The patch is based on PHP master (originally, phpng).
The performance penalties are minor for normal integer and float arithmetic. While left shifts and right shifts now require overflow checks, generally bigints will just take the place of floats in existing overflow checks so the performance impact is minimal.
Unfortunately, bigints would introduce two new ways to cause fatal errors in PHP.
Firstly, if you do an operation resulting in an extremely large number, you might hit your request memory limit.
Secondly, when trying to calculate a value that would require more memory than size_t
can describe, we have bail out and throw an E_ERROR with the message “Result of integer operation would be too large to represent”.
To avoid implementing the underlying arithmetic itself, PHP needs to add a dependency on a library implementing arbitrary-precision integers.
This patch supports two different libraries, which you can choose between when compiling PHP:
A choice is allowed to avoid licensing issues with GMP: while it has better performance, it uses the GNU Lesser General Public License version 3, which may be unacceptable to some people. LibTomMath, by contrast, is very liberally licensed.
Since HashTable
has not been and will not be updated to support directly IS_BIGINT
keys, indexing by an IS_BIGINT
key must be handled somehow. The RFC proposes to simply convert the bigint to a string, thus $x[PHP_INT_MAX + 1] = 3;
would be equivalent to $x[(string)(PHP_INT_MAX + 1)] = 3;
. This is inconsistent with the behaviour of floats (which are blindly converted, wrapped and truncated by zend_dval_to_lval
), but changing their behaviour might cause compatibility issues. If that became a problem, it could be addressed in a follow-up RFC.
This should have no impact on existing SAPIs.
Any extensions which request numeric parameters as zvals rather than longs or doubles from zend_parse_parameters will need changes. Those dealing with numerical operations specifically will require deeper changes. Obviously, ext/standard will need some updating.
ext/gmp will be updated to handle bigints. However, due to behavioural and implementation differences between GMP objects and the bigint type, it won't just pass through to the built-in operator functions. With the addition of bigints, ext/gmp would quickly become irrelevant except for backwards-compatibility with existing applications, and might eventually be moved to PECL.
Extensions dealing with parts of the Zend API that deal with numbers will need to be modified to deal with changes in signatures and behaviour. (See “Backwards Incompatible Changes”)
Both GMP and LibTomMath can only have one custom allocator, so I weighed the options and made that be emalloc, not malloc. I expect this would pose a problem for opcache, as any bigints would be destroyed upon the end of a request, so opcache would need to store bigints persistently. Hence, some sort of import/export mechanism could be added to zend_bigint. It is obviously possible to use strings, but gmp also has its own format for serialisation which would be more efficient, so that might be a good way.
The patch has not yet been updated to support opcache.
None.
No changes.
The patch is unfinished. Many tests are still broken and most extensions will need some updating. It does not work with opcache.
However, there are no open questions.
op1
(need to do check on its absolute value, and account for sign bit)decbin
/dechex
/decoct
(or not)--disable-all
?IS_BIGINT_OR_LONG
should be renamed to _IS_BIGINT_OR_LONG
for consistency with _IS_BOOL
. That way, it's more obviously a fake type.u
(Z_PARAM_ULONG
), to zend_parse_parameters
? This is especially useful on 32-bit systems.fast_add_function
, fast_sub_function
, ZEND_SIGNED_MULTIPLY_LONG
and shift_left_function
, unlike php-src master. For the sake of compilers that aren't GCC 5.0 or clang, some of the old inline assembly routines for this checking could be restored and updated for bigints.As previously mentioned, the handling of array keys might need to be looked at. Otherwise, it shouldn't affect the behaviour of other PHP functionality. Implementation-wise, if something manipulates zvals directly and looks at their types, then it needs updating for bigints.
None I can think of particularly.
As this is a language change (it affects the language specification), this requires a 2/3 majority. It is straight Yes/No vote to accepting the RFC.
Voting started on 2015-02-15 and was to end 10 days later on 2015-02-25, but voting was cancelled the same day it started.
A work-in-progress, unfinished pull request for php-src is here: https://github.com/php/php-src/pull/876
The branch itself is here: https://github.com/TazeTSchnitzel/php-src/tree/bigint
The LibTomMath backend (the default) is a work-in-progress. Use --enable-bigint-gmp
to use the GMP backend.
Many tests are still broken, as as mentioned previously, I still need to deal with extensions and opcache. It is very much unfinished, but it does work to a degree.
See the TODO section in Open Issues (above) for unfinished areas.
There is also an incomplete language specification pull request here, which currently lacks updated tests: https://github.com/php/php-langspec/pull/112
If/when this is implemented, this section would/will contain