rfc:bigint

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
rfc:bigint [2015/01/20 06:03] ajfrfc:bigint [2017/09/22 13:28] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== PHP RFC: Big Integer Support ====== ====== PHP RFC: Big Integer Support ======
-  * Version: 0.1.7 +  * Version: 0.1.8 
-  * Date: 2014-06-20 (Initial Draft; Put Under Discussion 2014-10-10, Last updated 2015-01-10)+  * Date: 2014-06-20 (Initial Draft; Put Under Discussion 2014-10-10, Last updated 2015-02-15)
   * Author: Andrea Faulds, ajf@ajf.me   * Author: Andrea Faulds, ajf@ajf.me
-  * Status: Under Discussion+  * Status: Withdrawn
   * First Published at: http://wiki.php.net/rfc/bigint   * First Published at: http://wiki.php.net/rfc/bigint
  
Line 36: Line 36:
 ==== Standard library changes ==== ==== Standard library changes ====
  
-  * All math functions are updated to work with bigints. +  * Some math functions accepting integers are updated: 
-  * ''array_sum'' and ''array_product'' are now implemented in the patch using ''add_function'' and ''mul_function'', respectively. This means that they support bigints now, but also internal objects with operator overloading (currently only the GMP extensionto the best of my knowledge).+      * ''rand'', ''srand'' and ''getrandmax'' are unchanged, because C's random number generator has no support for arbitrary-precision integers 
 +      * ''mt_rand'', ''mt_srand'' and ''mt_getrandmax'' are unchanged, because PHP's random number generator always produces a fixed-size value 
 +      * ''intdiv'' supports big integers and will no longer return ''0'' for ''intdiv(PHP_INT_MIN, -1)'' (this is not a BC break assuming this RFC is accepted for PHP 7, because ''intdiv'' is a function introduced in PHP 7) 
 +      * ''abs'', ''max'' and ''min'' gain big integer support 
 +      * ''pow'' gains big integer support as a result of the ''*''''*'' operator being updated 
 +      * ''array_sum'' and ''array_product'' are now implemented in the patch using ''add_function'' and ''mul_function'', respectively. This means that they now support not only bigints, but also internal objects with operator overloading 
 +      * ''decbin''''decoct'', ''dechex'' TBD 
 +  * Serialisation and unserialisation supports bigints 
 +  * ''gettype'', ''settype'', ''var_dump'', ''var_export'', ''print_r'', ''is_int''/''is_integer''/''is_long'' and ''debug_zval_dump'' gain bigint support
  
 ==== Examples ==== ==== Examples ====
Line 102: Line 110:
 ===== Proposed PHP Version(s) ===== ===== Proposed PHP Version(s) =====
  
-This is proposed for the next PHP X, currently PHP 7. The patch is based off of phpngand my intention is for it to be merged into phpng.+This is proposed for the next PHP X, currently PHP 7. The patch is based on PHP master (originally, phpng).
  
 ===== RFC Impact ===== ===== RFC Impact =====
Line 116: Line 124:
 Firstly, if you do an operation resulting in an extremely large number, you might hit your request memory limit. Firstly, if you do an operation resulting in an extremely large number, you might hit your request memory limit.
  
-Secondly, when trying to calculate a value that would require more memory than size_t can describe, GMP prints the ''overflow in mpz type'' error to the command line and ''abort()''s. Allowing this to happen and kill the PHP process would be very problematicso instead, [[https://github.com/TazeTSchnitzel/php-src/commit/360e1b0f3212da8b59266cbdcb0966cda69fc4e0|this commit]] introduces a workaround whereby we try to check ahead-of-time if the operation would fall foul of the overflow error and, if so, throw an E_ERROR with the message "Result of integer operation would be too large to represent". For LibTomMath we don't need to check this ourselves because the library has sensible error handling, but we still produce E_ERROR in this case.+Secondly, when trying to calculate a value that would require more memory than ''size_t'' can describe, we have bail out and throw an E_ERROR with the message "Result of integer operation would be too large to represent".
  
 ==== Licensing and dependency issues ==== ==== Licensing and dependency issues ====
  
-I am current porting this to use [[http://www.libtom.org/|LibTomMath]], dual-licensed Public Domain/WTFPL arbitrary-length integer library written in C, which is available packaged for several platforms, and is battle-tested as it is used by Tcl. As it is available under both Public Domain and the WTFPL, with the latter an extremely liberal license, it doesn't pose any licensing issues. Its source is contained in the repo and built with the rest of Zend, which avoids an external dependency.+To avoid implementing the underlying arithmetic itself, PHP needs to add dependency on a library implementing arbitrary-precision integers.
  
-At compile-timeit is also possible to choose to use GMP, which is LGPLv3 licensed, but it is not the default.+This patch supports two different librarieswhich you can choose between when compiling PHP:
  
-==== Arrays ====+  * [[http://www.libtom.net/|LibTomMath]] - a dual-licensed Public Domain/WTFPL arbitrary-length implementation. It is used by default, and a version is included within the repository to avoid adding an extra external dependency when building PHP, and also because this is required for the custom allocators to work. 
 +  * [[https://gmplib.org/|The GNU Multiple Precision Arithmetic Library (GMP]] - an LGPLv3 implementation. It has greatly improved performance over LibTomMath (up to two orders of magnitude).
  
-problem arising from allowing integers to be arbitrarily large is that array keys using strings for numeric keys beyond the maximum size of a long would probably seem weird. At presentbigints are just dealt with as if they were numeric strings when using them as array keys and indices, but this may not be optimalThis RFC aims for integer consistency across platformsand this would be a remaining inconsistency. It also doesn't make sense from a user perspective to have integers over a certain value suddenly become string keysthough whether this matters much in practise with PHP's type casting and juggling is a different question.+choice is allowed to avoid licensing issues with GMP: while it has better performance, it uses the GNU Lesser General Public License version 3which may be unacceptable to some peopleLibTomMathby contrast, is very liberally licensed. 
 + 
 +==== Arrays ====
  
-This also presents a further issue: inconsistency between longs, bigints and doubles, which **must** be avoided, as integer consistency cross-platform is a key goal of this RFC. Currently in PHP, doubles used as indexes are simply casted to longswithout any regard for size. This means that they overflow if they are larger than the platform's long size, either 32-bit or 64-bit. However, bigints as implemented, will be treated as strings if they are outside of the bounds of a long on the platformWhile bigints are likely to break existing code anyway, this would be particularly bad breakageas code relying on very large numbers being floats and wrapping when used as indices would break. Hence some sort of solution must be found. Either we cast bigints to longs and let them overflow (not terribly desirable), we don't change the current behaviour (inconsistent), or we change the handling of doublesPersonallyI don't like what PHP does here and would to go for this last option.+Since ''HashTable'' has not been and will not be updated to support directly ''IS_BIGINT'' keysindexing by an ''IS_BIGINT'' key must be handled somehowThe RFC proposes to simply convert the bigint to a stringthus <php>$x[PHP_INT_MAX + 1] = 3;</php> would be equivalent to <php>$x[(string)(PHP_INT_MAX + 1)] = 3;</php>. This is inconsistent with the behaviour of floats (which are blindly converted, wrapped and truncated by ''zend_dval_to_lval''), but changing their behaviour might cause compatibility issuesIf that became a problemit could be addressed in a follow-up RFC.
  
 ==== To SAPIs ==== ==== To SAPIs ====
Line 136: Line 147:
 ==== To Existing Extensions ==== ==== To Existing Extensions ====
  
-Any which request numeric parameters as zvals rather than longs or doubles from zend_parse_parameters will need changes. Those dealing with numerical operations specifically will require deeper changes. Obviously, ext/standard will need some updating.+Any extensions which request numeric parameters as zvals rather than longs or doubles from zend_parse_parameters will need changes. Those dealing with numerical operations specifically will require deeper changes. Obviously, ext/standard will need some updating.
  
 ext/gmp will be updated to handle bigints. However, due to behavioural and implementation differences between GMP objects and the bigint type, it won't just pass through to the built-in operator functions. With the addition of bigints, ext/gmp would quickly become irrelevant except for backwards-compatibility with existing applications, and might eventually be moved to PECL. ext/gmp will be updated to handle bigints. However, due to behavioural and implementation differences between GMP objects and the bigint type, it won't just pass through to the built-in operator functions. With the addition of bigints, ext/gmp would quickly become irrelevant except for backwards-compatibility with existing applications, and might eventually be moved to PECL.
Line 146: Line 157:
 Both GMP and LibTomMath can only have one custom allocator, so I weighed the options and made that be emalloc, not malloc. I expect this would pose a problem for opcache, as any bigints would be destroyed upon the end of a request, so opcache would need to store bigints persistently. Hence, some sort of import/export mechanism could be added to zend_bigint. It is obviously possible to use strings, but gmp also has its own format for serialisation which would be more efficient, so that might be a good way. Both GMP and LibTomMath can only have one custom allocator, so I weighed the options and made that be emalloc, not malloc. I expect this would pose a problem for opcache, as any bigints would be destroyed upon the end of a request, so opcache would need to store bigints persistently. Hence, some sort of import/export mechanism could be added to zend_bigint. It is obviously possible to use strings, but gmp also has its own format for serialisation which would be more efficient, so that might be a good way.
  
-I have not yet dealt with opcache implementation-wise, and I might need help when the time comes.+The patch has not yet been updated to support opcache.
  
 ==== New Constants ==== ==== New Constants ====
Line 158: Line 169:
 ===== Open Issues ===== ===== Open Issues =====
  
-The patch is unfinished. Many tests are still broken, I haven't gotten round to updating the extensions, and it almost certainly does not work with opcache.+The patch is unfinished. Many tests are still broken and most extensions will need some updating. It does not work with opcache.
  
-==== Open Questions ==== +However, there are no open questions.
- +
-  * Should we rework array key handling? (See "Arrays" above)+
  
 ==== TODO ==== ==== TODO ====
Line 168: Line 177:
 === Must be done === === Must be done ===
  
 +  * Check if https://github.com/php/php-src/pull/1073 affects bigints
 +  * Fix left shift overflow check for negative ''op1'' (need to do check on its absolute value, and account for sign bit)
   * Finish LibTomMath port   * Finish LibTomMath port
       * TODOs       * TODOs
Line 182: Line 193:
       * Partially ported:       * Partially ported:
           * standard           * standard
 +             * Agree on some semi-sane new behaviour for ''decbin''/''dechex''/''decoct'' (or not)
       * Compiles, not necessarily fully working:       * Compiles, not necessarily fully working:
           * bz2, core, ctype, curl, date, dom, ereg, exif, fileinfo, gd, gettext, hash, iconv, intl, libxml, mbstring, mysql, mysqli, mysqlnd, pcre, pdo_mysql, pdo_mysql, pdo_sqlite, pgsql, phar, reflection, session, shmop, simplexml, soap, sockets, spl, sqlite3, standard, tidy, tokenizer, wddx, xml, xmlreader, xmlwriter, xsl, zip           * bz2, core, ctype, curl, date, dom, ereg, exif, fileinfo, gd, gettext, hash, iconv, intl, libxml, mbstring, mysql, mysqli, mysqlnd, pcre, pdo_mysql, pdo_mysql, pdo_sqlite, pgsql, phar, reflection, session, shmop, simplexml, soap, sockets, spl, sqlite3, standard, tidy, tokenizer, wddx, xml, xmlreader, xmlwriter, xsl, zip
Line 196: Line 208:
   * Add an unsigned long type, ''u'' (''Z_PARAM_ULONG''), to ''zend_parse_parameters''? This is especially useful on 32-bit systems.   * Add an unsigned long type, ''u'' (''Z_PARAM_ULONG''), to ''zend_parse_parameters''? This is especially useful on 32-bit systems.
   * Optimisations:   * Optimisations:
-      * We currently use clang and GCC 5.0 checked arithmetic builtins to implement faster overflow checks in ''fast_add_function'', ''fast_sub_function'' and ''ZEND_SIGNED_MULTIPLY_LONG'', unlike php-src master. For the sake of compilers that aren't GCC 5.0 or clang, some of the old inline assembly routines for this checking could be restored and updated for bigints+      * We currently use clang and GCC 5.0 checked arithmetic builtins to implement faster overflow checks in ''fast_add_function'', ''fast_sub_function''''ZEND_SIGNED_MULTIPLY_LONG'' and ''shift_left_function'', unlike php-src master. For the sake of compilers that aren't GCC 5.0 or clang, some of the old inline assembly routines for this checking could be restored and updated for bigints.
-      * That clz/bsr assembly TODO: We need to do a clz/bsr operation for bit shift overflow checking, and currently we do this with a double conversion and ''frexp''. It would be more efficient to use assembly for this.+
   * Other optimisations:   * Other optimisations:
       * Possibly mark the zend_bigint_* functions as to be inlined and move them to the header       * Possibly mark the zend_bigint_* functions as to be inlined and move them to the header
Line 203: Line 214:
 ===== Unaffected PHP Functionality ===== ===== Unaffected PHP Functionality =====
  
-As previously mentioned, the handling of array keys might need to be looked at. Otherwise, it shouldn't affect the behaviour of other PHP functionality, but obviously the implementations of anything dealing with integers may need to be changed.+As previously mentioned, the handling of array keys might need to be looked at. Otherwise, it shouldn't affect the behaviour of other PHP functionality. Implementation-wiseif something manipulates zvals directly and looks at their types, then it needs updating for bigints.
  
 ===== Future Scope ===== ===== Future Scope =====
Line 209: Line 220:
 None I can think of particularly. None I can think of particularly.
  
-===== Proposed Voting Choices =====+===== Vote ===== 
 + 
 +As this is a language change (it affects the language specification), this requires a 2/3 majority. It is straight Yes/No vote to accepting the RFC. 
 + 
 +Voting started on 2015-02-15 and was to end 10 days later on 2015-02-25, but voting was cancelled the same day it started.
  
-In some respects this is just an implementation detail, but as this would break backwards-compatibility for some apps and arguably changes the language, I think this requires a 2/3 majority. It would be a straight Yes/No vote.+<doodle title="Big Integer Support RFC" auth="ajf" voteType="single" closed="true"> 
 +   Yes 
 +   * No 
 +</doodle>
  
 ===== Patches and Tests ===== ===== Patches and Tests =====
Line 248: Line 266:
 ==== General ==== ==== General ====
  
-  * http://www.libtom.org/ and https://github.com/libtom/libtommath - LibTomMath+  * http://www.libtom.net/ and https://github.com/libtom/libtommath - LibTomMath
   * https://gmplib.org/ - The GNU Multiple Precision Arithmetic Library   * https://gmplib.org/ - The GNU Multiple Precision Arithmetic Library
   * Yasuo's [[rfc:gmp_number|gmp_number]] RFC is similar in some respects   * Yasuo's [[rfc:gmp_number|gmp_number]] RFC is similar in some respects
Line 254: Line 272:
 ===== Changelog ===== ===== Changelog =====
  
 +  * v0.1.8 - Decided on not touching float indexing behaviour for now
   * v0.1.7 - Minor changes, removed some outdated information   * v0.1.7 - Minor changes, removed some outdated information
   * v0.1.6 - LibTomMath built as part of PHP   * v0.1.6 - LibTomMath built as part of PHP
rfc/bigint.1421733809.txt.gz · Last modified: 2017/09/22 13:28 (external edit)