Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
rfc:bigint [2015/02/09 23:26] – ajf | rfc:bigint [2015/02/15 13:58] – Clarified "Unaffected PHP Functionality" ajf |
---|
====== PHP RFC: Big Integer Support ====== | ====== PHP RFC: Big Integer Support ====== |
* Version: 0.1.7 | * Version: 0.1.8 |
* Date: 2014-06-20 (Initial Draft; Put Under Discussion 2014-10-10, Last updated 2015-01-10) | * Date: 2014-06-20 (Initial Draft; Put Under Discussion 2014-10-10, Last updated 2015-02-15) |
* Author: Andrea Faulds, ajf@ajf.me | * Author: Andrea Faulds, ajf@ajf.me |
* Status: Under Discussion | * Status: Voting |
* First Published at: http://wiki.php.net/rfc/bigint | * First Published at: http://wiki.php.net/rfc/bigint |
| |
==== Standard library changes ==== | ==== Standard library changes ==== |
| |
* All math functions are updated to work with bigints. | * Some math functions accepting integers are updated: |
* ''array_sum'' and ''array_product'' are now implemented in the patch using ''add_function'' and ''mul_function'', respectively. This means that they support bigints now, but also internal objects with operator overloading (currently only the GMP extension, to the best of my knowledge). | * ''rand'', ''srand'' and ''getrandmax'' are unchanged, because C's random number generator has no support for arbitrary-precision integers |
| * ''mt_rand'', ''mt_srand'' and ''mt_getrandmax'' are unchanged, because PHP's random number generator always produces a fixed-size value |
| * ''intdiv'' supports big integers and will no longer return ''0'' for ''intdiv(PHP_INT_MIN, -1)'' (this is not a BC break assuming this RFC is accepted for PHP 7, because ''intdiv'' is a function introduced in PHP 7) |
| * ''abs'', ''max'' and ''min'' gain big integer support |
| * ''pow'' gains big integer support as a result of the ''*''''*'' operator being updated |
| * ''array_sum'' and ''array_product'' are now implemented in the patch using ''add_function'' and ''mul_function'', respectively. This means that they now support not only bigints, but also internal objects with operator overloading |
| * ''decbin'', ''decoct'', ''dechex'' TBD |
| * Serialisation and unserialisation supports bigints |
| * ''gettype'', ''settype'', ''var_dump'', ''var_export'', ''print_r'', ''is_int''/''is_integer''/''is_long'' and ''debug_zval_dump'' gain bigint support |
| |
==== Examples ==== | ==== Examples ==== |
===== Proposed PHP Version(s) ===== | ===== Proposed PHP Version(s) ===== |
| |
This is proposed for the next PHP X, currently PHP 7. The patch is based off of phpng, and my intention is for it to be merged into phpng. | This is proposed for the next PHP X, currently PHP 7. The patch is based on PHP master (originally, phpng). |
| |
===== RFC Impact ===== | ===== RFC Impact ===== |
Firstly, if you do an operation resulting in an extremely large number, you might hit your request memory limit. | Firstly, if you do an operation resulting in an extremely large number, you might hit your request memory limit. |
| |
Secondly, when trying to calculate a value that would require more memory than size_t can describe, GMP prints the ''overflow in mpz type'' error to the command line and ''abort()''s. Allowing this to happen and kill the PHP process would be very problematic, so instead, [[https://github.com/TazeTSchnitzel/php-src/commit/360e1b0f3212da8b59266cbdcb0966cda69fc4e0|this commit]] introduces a workaround whereby we try to check ahead-of-time if the operation would fall foul of the overflow error and, if so, throw an E_ERROR with the message "Result of integer operation would be too large to represent". For LibTomMath we don't need to check this ourselves because the library has sensible error handling, but we still produce E_ERROR in this case. | Secondly, when trying to calculate a value that would require more memory than ''size_t'' can describe, we have bail out and throw an E_ERROR with the message "Result of integer operation would be too large to represent". |
| |
==== Licensing and dependency issues ==== | ==== Licensing and dependency issues ==== |
| |
I am current porting this to use [[http://www.libtom.net/|LibTomMath]], a dual-licensed Public Domain/WTFPL arbitrary-length integer library written in C, which is available packaged for several platforms, and is battle-tested as it is used by Tcl. As it is available under both Public Domain and the WTFPL, with the latter an extremely liberal license, it doesn't pose any licensing issues. Its source is contained in the repo and built with the rest of Zend, which avoids an external dependency. | To avoid implementing the underlying arithmetic itself, PHP needs to add a dependency on a library implementing arbitrary-precision integers. |
| |
At compile-time, it is also possible to choose to use GMP, which is LGPLv3 licensed, but it is not the default. | This patch supports two different libraries, which you can choose between when compiling PHP: |
| |
==== Arrays ==== | * [[http://www.libtom.net/|LibTomMath]] - a dual-licensed Public Domain/WTFPL arbitrary-length implementation. It is used by default, and a version is included within the repository to avoid adding an extra external dependency when building PHP, and also because this is required for the custom allocators to work. |
| * [[https://gmplib.org/|The GNU Multiple Precision Arithmetic Library (GMP]] - an LGPLv3 implementation. It has greatly improved performance over LibTomMath (up to two orders of magnitude). |
| |
A problem arising from allowing integers to be arbitrarily large is that array keys using strings for numeric keys beyond the maximum size of a long would probably seem weird. At present, bigints are just dealt with as if they were numeric strings when using them as array keys and indices, but this may not be optimal. This RFC aims for integer consistency across platforms, and this would be a remaining inconsistency. It also doesn't make sense from a user perspective to have integers over a certain value suddenly become string keys, though whether this matters much in practise with PHP's type casting and juggling is a different question. | A choice is allowed to avoid licensing issues with GMP: while it has better performance, it uses the GNU Lesser General Public License version 3, which may be unacceptable to some people. LibTomMath, by contrast, is very liberally licensed. |
| |
| ==== Arrays ==== |
| |
This also presents a further issue: inconsistency between longs, bigints and doubles, which **must** be avoided, as integer consistency cross-platform is a key goal of this RFC. Currently in PHP, doubles used as indexes are simply casted to longs, without any regard for size. This means that they overflow if they are larger than the platform's long size, either 32-bit or 64-bit. However, bigints as implemented, will be treated as strings if they are outside of the bounds of a long on the platform. While bigints are likely to break existing code anyway, this would be a particularly bad breakage, as code relying on very large numbers being floats and wrapping when used as indices would break. Hence some sort of solution must be found. Either we cast bigints to longs and let them overflow (not terribly desirable), we don't change the current behaviour (inconsistent), or we change the handling of doubles. Personally, I don't like what PHP does here and would to go for this last option. | Since ''HashTable'' has not been and will not be updated to support directly ''IS_BIGINT'' keys, indexing by an ''IS_BIGINT'' key must be handled somehow. The RFC proposes to simply convert the bigint to a string, thus <php>$x[PHP_INT_MAX + 1] = 3;</php> would be equivalent to <php>$x[(string)(PHP_INT_MAX + 1)] = 3;</php>. This is inconsistent with the behaviour of floats (which are blindly converted, wrapped and truncated by ''zend_dval_to_lval''), but changing their behaviour might cause compatibility issues. If that became a problem, it could be addressed in a follow-up RFC. |
| |
==== To SAPIs ==== | ==== To SAPIs ==== |
==== To Existing Extensions ==== | ==== To Existing Extensions ==== |
| |
Any which request numeric parameters as zvals rather than longs or doubles from zend_parse_parameters will need changes. Those dealing with numerical operations specifically will require deeper changes. Obviously, ext/standard will need some updating. | Any extensions which request numeric parameters as zvals rather than longs or doubles from zend_parse_parameters will need changes. Those dealing with numerical operations specifically will require deeper changes. Obviously, ext/standard will need some updating. |
| |
ext/gmp will be updated to handle bigints. However, due to behavioural and implementation differences between GMP objects and the bigint type, it won't just pass through to the built-in operator functions. With the addition of bigints, ext/gmp would quickly become irrelevant except for backwards-compatibility with existing applications, and might eventually be moved to PECL. | ext/gmp will be updated to handle bigints. However, due to behavioural and implementation differences between GMP objects and the bigint type, it won't just pass through to the built-in operator functions. With the addition of bigints, ext/gmp would quickly become irrelevant except for backwards-compatibility with existing applications, and might eventually be moved to PECL. |
Both GMP and LibTomMath can only have one custom allocator, so I weighed the options and made that be emalloc, not malloc. I expect this would pose a problem for opcache, as any bigints would be destroyed upon the end of a request, so opcache would need to store bigints persistently. Hence, some sort of import/export mechanism could be added to zend_bigint. It is obviously possible to use strings, but gmp also has its own format for serialisation which would be more efficient, so that might be a good way. | Both GMP and LibTomMath can only have one custom allocator, so I weighed the options and made that be emalloc, not malloc. I expect this would pose a problem for opcache, as any bigints would be destroyed upon the end of a request, so opcache would need to store bigints persistently. Hence, some sort of import/export mechanism could be added to zend_bigint. It is obviously possible to use strings, but gmp also has its own format for serialisation which would be more efficient, so that might be a good way. |
| |
I have not yet dealt with opcache implementation-wise, and I might need help when the time comes. | The patch has not yet been updated to support opcache. |
| |
==== New Constants ==== | ==== New Constants ==== |
===== Open Issues ===== | ===== Open Issues ===== |
| |
The patch is unfinished. Many tests are still broken, I haven't gotten round to updating the extensions, and it almost certainly does not work with opcache. | The patch is unfinished. Many tests are still broken and most extensions will need some updating. It does not work with opcache. |
| |
==== Open Questions ==== | However, there are no open questions. |
| |
* Should we rework array key handling? (See "Arrays" above) | |
| |
==== TODO ==== | ==== TODO ==== |
===== Unaffected PHP Functionality ===== | ===== Unaffected PHP Functionality ===== |
| |
As previously mentioned, the handling of array keys might need to be looked at. Otherwise, it shouldn't affect the behaviour of other PHP functionality, but obviously the implementations of anything dealing with integers may need to be changed. | As previously mentioned, the handling of array keys might need to be looked at. Otherwise, it shouldn't affect the behaviour of other PHP functionality. Implementation-wise, if something manipulates zvals directly and looks at their types, then it needs updating for bigints. |
| |
===== Future Scope ===== | ===== Future Scope ===== |
None I can think of particularly. | None I can think of particularly. |
| |
===== Proposed Voting Choices ===== | ===== Vote ===== |
| |
| As this is a language change (it affects the language specification), this requires a 2/3 majority. It is straight Yes/No vote to accepting the RFC. |
| |
| Voting started on 2015-02-15 and ends 10 days later on 2015-02-25. |
| |
In some respects this is just an implementation detail, but as this would break backwards-compatibility for some apps and arguably changes the language, I think this requires a 2/3 majority. It would be a straight Yes/No vote. | <doodle title="Big Integer Support RFC" auth="ajf" voteType="single" closed="false"> |
| * Yes |
| * No |
| </doodle> |
| |
===== Patches and Tests ===== | ===== Patches and Tests ===== |
===== Changelog ===== | ===== Changelog ===== |
| |
| * v0.1.8 - Decided on not touching float indexing behaviour for now |
* v0.1.7 - Minor changes, removed some outdated information | * v0.1.7 - Minor changes, removed some outdated information |
* v0.1.6 - LibTomMath built as part of PHP | * v0.1.6 - LibTomMath built as part of PHP |