This is an old revision of the document!
Author: Pierre Joye
Status: Draft only, list of thoughts/ideas/discussions summary
List possible changes, updates, additions for php6.
Each of them will require a RFC then, at least for important or large changes.
- OpCache integraton
- Improved and actual 64bit support
- Unicode support
- internals API Cleanup
- Warning free code
- Improve OPcodes, compilation and runtime (perf, features, jit, fixed address jump usage, etc)
- JIT compiler (libjit?)
- Annotation Support
- Named argument
- Scalar Type hinting
- HTTP2 support (avoid our own implementation, nghttp2?)
- Reliable, userfriendly RNG APIs (internally and userland)
- Userland APIs improvement for all PHP types (OO instead of breaking BC)
- C++ Usage
- inclusion of the new crypto extension (maybe support other backends than openssl)
- bundle pecl's http (add http2 support too)
Opcache has been bundled with 5.5.0. A real integration did not happen yet while stabilizing it was the highest priority.
Integration may mean to merge in the engine, partially or totally.
- Change opcodes to ease opcodes caching and optimization
- Add necessary changes to support JIT compilation to native, most likely only partial part of the running code (a function/method or a portion of a function method). This part is totally undefined now, see the JIT section
Improved and actual 64bit support
64 bit support has been working well for a couple of supported platforms. While its implementation is far away from a modern, safe and clean 64 bit support implementation.
We rely on various random casts, using long as base for the php integer bit, int for string lengths, let alone pointers usage.
the size_t and int64 RFC (https://wiki.php.net/rfc/size_t_and_int64) has been rejected for 5.6. However most voters agreed that it should be done in 6.0. The branch is almost completed. Now that its target is 6.0, more options are available and internal code compatibility is less a problem.
Unicode is one of the most requested feature. Our last attempt failed and we should be very careful about how we design and implement unicode support.
UTF-16 has been shown to be a failure (also confirmed by many other projects usage). UTF-8 seems to be the best choice as default (or unique?) encoding for string values, besides binaries string (basically what we have now). Unicode support design and implementation, if desired for php 6, will be one of the most difficult tasks.
- Use of a fast and lite UTF-8 procession libraries for all core string operations
- Possible libraries:
- Use of intl for any advanced operations, localization or conversion?
- Support of UTF-8 for the language itself, as PHP currently allows non ascii encoding in scripts, I would recommend to stop supporting it, except in comments.
internals API Cleanup
PHP internal APIs have kept growing over the last years. Many features are enabled via many functions (feature, feature_ex, FEATURE_P, etc.). Some areas have no developer friendly APIs or are very different to what we do in userland.
An example of such simplification is Sara's array proprosal:
This proposal could be extended to hash, as hashes are widely used as well for internal storage.
The stream API, as being very powerful, suffers from code duplication all over the place. We have stream/IO related functions in the engine, TSRM, main/ and ext/standard. Tests exist in even more locations. It makes stream a hardly maintainable part of PHP, let alone to use in a consistent manner.
- Should all stream operations be part of main/ or the Engine?
Warning free code
CI finally became a critical part of our release process. Between travis, OSTC tests lab or gcov:
One big part is missing, code quality analyzer (static analyzer, fuzzy analyzers and the likes). Reducing the amount of false positive is a must do if we ever want to have usable results from these tools. Having a warning free code is the 1st step in this direction.
Improve OPcodes, compilation and runtime
PHP mainly got new opcodes, growing in almost all major versions but never actually git a redesign/rewamp. Old design choices (due to technically limitation or performance reasons f.e.) have been kept in there for ages.
Some of these choices cause us troubles with the recent additions (OpCache), like the absolute address usage for jump/call operations.
It would be interesting to begin a discussion about what we could change to make them more efficient and maintainable in the long run. It is also necessary to think long term, like how can we support JIT compilation to native code from the OpCode (full or partial).
Ideas posted on the list:
- AST based compiler (Julien)
- Rewrite the engine
Also as much as I would like a full rewrite of the engine, it sounds very hardly possible to do it within a reasonable time-frame (say two years), except if we suddenly have a couple of developers working full time on this task.
JIT compiler to native code would be a huge step forward for the engine. As it won't bring it to the performance level of HHVM, it may be much better than what we have now and open full new areas to our users.
It is by far the most complex tax in this document but the effort is worth it. It can also be done step by step, full JIT support (as in get all parts of the running code JIT'ed) may not be our 1st goal.
There are a couple of choices when it comes to JIT compilation. One of them which sounds like the best choice is LibJIT. It is well maintained, supports all architecture we need, and we have a developer knowing it very well (hey Gopal! ;).
Dmitry told me that he had made some experiment with a php specific version but is not allowed (yet?) to talk about the results or which strategy was used.
However, I won't recommend to go down our own way for this feature. It will drastically increase the time and effort to implement, stabilize and make this feature available on all supported platforms.
HTTP2 is right at the corner. By the time 6.0 may be out, http2 may be out of the draft phase or in a phase where it will be used more widely. PHP 6.0 must support http2.
Mainly the HTTP part of PHP's stream implementation requires changes. However it is not as easy as it may sound. I would not recommend to implement our own HTTP2 support but to use existing and well tested library. nghttp2 is one of the most popular, complete and widely used HTTP2 library (used by CURL f.e.):
How we like to support http2 is a tough task. We can share http1/2 in the same http wrapper (something done by curl) to ease http usage in userland. Existing functions dealing with meta data, headers and the likes may need a redesign as well.
A cleaner solution would be to go with a more modern HTTP client/Server solution, something along the line of the PECL's http extension. Using one of the popular HTTP client PHP library as base for the APIs design of a new HTTP support could be a very valid option as well.
Native Annotation Support
A first attempt to add native annotation support to the core has been rejected:
In the meantime, the same format and concepts became a de facto standard in the PHP communities. Leading projects use and support annotations.
PHP must support annotation in its core. Annotation support should be designed and implemented with the support of the communities (Symfony/Doctrine, Drupal or ZF f.e.). It is a critical part of PHP future.
Reliable, userfriendly RNG APIs
Provide a userfriendly and reliable RNG APIs, available by default, on all supported platforms and for all usages (from weak to crypto safe).
A new getter/setter interface RFC has already been rejected, in a tough votes result.
The lesson of this RFC told me that almost everyone agrees that the current getter/support is a mess, hardly usable or developer friendly and by far the worst implementation along the actual modern languages.
I would suggest to take a 2nd look at this RFC and get new clean and usable getter/setter in core.
Userland APIs improvement for all PHP types
One of the top complains about PHP is the lack of consistent APIs. While most of them are true, we cannot simply suppress or change them, even not in a major version like 6. The pain introduced by such changes would be fatal from an adoption point of view.
Alternatives solutions to change all existing functions have been proposed. Two have got more attention:
- Create aliases
- Add OO APIs for all core types
The 2nd option obviously has my vote. It has only advantages. Fully BC, no namespace pollution with hundred of aliases. However both options share the same challenge: Define a clean API. This has to be done carefully and quietly, without FUDs or trolls :)