Zend Engine Performance Improvements
- Version: 1.0
- Date: 2010-04-13
- Author: Dmitry Stogov firstname.lastname@example.org, Stanislav Malyshev email@example.com
- Status: Implemented in 5.4
- First Published at: http://wiki.php.net/rfc/performanceimprovements
This RFC offers several Zend Engine changes which together make up to 20% performance improvement on synthetic benchmarks and some real-life applications.
Empty HashTable Optimization
During execution PHP uses a lot of HashTables, and our research showed that a significant part of these tables still empty throughout their entire lifetime. For example each PHP class has fully-allocated & initialized method/constant/properties tables even if it doesn't have method/constants/properties. Note that during HashTable initialization it allocates space for pointers to Buckets (HashTable->arBuckets). Of course, this allocation takes space and time.
The idea for improvement is to delay the actual allocation of HashTable->arBuckets in the various tables until the insertion of first element. As a result PHP will use a bit less memory and save one [e]malloc()/[e]free() for each empty HashTable. On the other hand it'll have to check if the HashTable was initialized on each new element insertion.
The patch improves speed of bench.php from 4.31 sec to 4.06 sec and micro_bench.php from 19.78 sec to 19.36 sec.
Currently ZE uses a virtual machine with fixed 3-operand opcode representation. Each operand is represented as a union which may hold a zval (literal - a constant value known at compile time). So for each opcode we have to allocate space for 3 zvals even if the opcode doesn't use them. Each opcode requires 76 bytes on a 32-bit system.
The proposed patch moves zvals from the opcodes themselves into a separate literal_table. Each op_array will have its own literal_table and its opcodes will contain pointers into this table. In addition the opcode structure was changed a bit to remove znode.u.EA.type and align data in a better way. As result each opcode will take only 28 bytes on 32-bit system.
For string constants the literal_table also contains a pre-calculated hash_value, which can be used during run-time. Some ZE internal functions were extended to receive and use additional zend_literal argument in case of constant operands. Several opcodes which use case-insensitive string operands (e.g. function and class references) allocate two following entries in the literal_table. The opcode itself points to the first entry which contains the original string value and the next entry contains the same lower-cased string. This allows performing most ZE primitive operations (e.g. function call, class fetch, method call, property access, etc) with constant operands without hash_value recalculation and lower-casing.
The patch improves speed of bench.php to 4.01 sec and micro_bench.php to 17.95 sec. It also significantly reduces the memory usage of real-life applications with big code base. For example it reduce resident memory usage of PHP process which loads all Zend Framework classes from 27M to 22M (saves 5M or 18% per process).
Note: In the future this patch may be extended with moving zend_op.lineno into a separate debug_info data structure.
Interned strings is a well known technique originally implemented in functional languages. Its main idea is that each string (or atom) is allocated once and never changed (immutable). (See http://en.wikipedia.org/wiki/String_interning for more details). Our implementation makes the strings which are known at compile-time interned. The attempt to make all the strings interned made a significant slowdown on some applications that perform a lot of dynamic string operations. The patch introduces IS_INTERNED() macro to check if a given char* is interned or regular string. PHP doesn't have to duplicate or destroy interned strings in zval_copy_ctor() and zval_dtor(), but it must also not modify them in place. Interned strings contain pre-calculated hash_values so some opcodes with variable arguments won't require hash_value recalculation any more. For example:
$a = "hello"; $arr[$a] = "world"; // hash_value is pre-calculated at compile-time
In addition interned strings may be compared just using == C operator instead of strcmp(). It makes additional benefit at PHP string comparison and HashTable manipulation.
The patch uses a fixed preallocated memory region to hold all the interned strings. IS_INTERNED() is represented simple as a region boundary checks. In case the region is exceeded all the strings which would going to be interned are still regular char*. To create interned strings the CG(new_interned_string)() callback should be used. Interned strings allocated during request processing are held in memory until the end of request processing. It's done through CG(interned_strings_snapshot)() and CG(interned_strings_restore)(). Opcode caches have to dynamically override these callbacks to hold interned strings in shared memory.
The patch improves speed of bench.php to 3.60 sec and micro_bench.php to 14.97 sec.
Note: The usage of special data structure (e.g. zend_string) instead of char* throughout PHP would probably be a more consistent solution, but it would be a huge break. We are going to look into this solution in the future.
Zend Engine VM Tuning
Mainly involves manual tuning based on profile feedback and CPU performance counters. The patch eliminates redundant instructions, performs additional source-level specialization, allows C compiler to make better register allocation and registers spilling. In general it doesn't change behavior or binary-compatibility except for 3 things:
- ZEND_RECV result value is always IS_CV (previously it could be IS_VAR for arguments named $this or $_GET)
- ZEND_CATCH has to be used with constant class name (previously it could be used with “self”, “parent”, “static”)
- ZEND_FETCH_DIM_? opcode handlers may perform array and index fetching in different way. As result if both of them are undefined the warning messages may be emitted in a different order.
The patch improves speed of bench.php to 3.49 sec and micro_bench.php to 14.63 sec
The previous numbers were got without any opcode caches and optimizers.
The following results show the improvement of real-life applications with patched PHP and modified version of Zend Optimizer+.
The patch for APC is provided in the patches section. The following performance measurements show the similar gain of the patch. Note that the provided patch implements basic support for literal tables and interned strings. It misses support for APC binary dumps.
The patches must be applied to php trunk all together
or in a proper order.
after applying the patches the VM has to be regenerated running the following command in the Zend directory
- php zend_vm_gen.php
Patch for APC