====== PHPNG Implementation Details ======
This page provides technical information about PHPNG internals. The general information about PHPNG might be found at [[phpng]], information for extension maintainers at [[phpng-upgrading]].
===== Value Representation =====
All values in existing Zend Engine implementation were allocated on heap and they were subject for reference counting and garbage collection. Zend engine mostly operated by pointers to zvals (in many places even by pointers to pointers to zval)
The new implementation operates by zval structures their selves (not pointers). It stores new zval structures directly on VM stack, in HashTable buckets, and property slots. It dramatically reduces number of heap allocations/deallocations. It also avoids reference counting and garbage collection on primitive values (null, bool, long, double, interned string, immutable arrays).
The new implementation uses more VM stack space (instead of heap), because now it keeps there zval structures instead of pointers. Anyway, the overall memory usage is reduced. In some cases new approach assumes full zval copying instead of copy-on-write implemented before, but it doesn't make performance penalties (it require two memory reads and two memory stores instead of single memory read/store + reference counter increment, before that, leads to the same 2 reads/stores).
==== CELL Format (zval) ====
+----------------------------------------------------------------------------------+
| VALUE (64-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | UNUSED | const_flags | type_flags | TYPE |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
The value cell represented as two 64-bit words. The first word contains actual value (it's defined as a union of possible value types), the second contains type tag and some flags. The type and flags may be accessed together as a single 32-bit word for efficiency. The “unused” space actually may be reused for different purposes when cell are embedded into other structures. (e.g. for hash collision list when the value embedded into HashTable)
The re-factored engine defines the following data types. Most of them are well known from PHP-5 engine:
* IS_UNDEF – we use a special type for undefined variables
* IS_NULL
* IS_FALSE – we've split IS_BOOL into separate IS_FALSE and IS_TRUE
* IS_TRUE
* IS_LONG
* IS_DOUBLE
* IS_STRING – regular or interned string
* IS_ARRAY – regular or immutable array
* IS_OBJECT
* IS_RESOURCE
* IS_REFERENCE – a separate type for references (it'll be explained later)
* IS_CONSTANT – named constant
* IS_CONSTANT_AST – constant expression
* IS_CALLABLE – used only for type hinting
* _IS_BOOL – used_only for type hinting
* IS_INDIRECT – special purpose type to handle pointers to other values
* IS_STR_OFFSET – special purpose type to handle string offsets (used only in VM)
* IS_PTR – pointer to something (e.g. zend_function, zend_class_entry, etc)
Except for types itself, the engine defines few type flags to uniform handling of different data types with similar behavior.
* IS_TYPE_CONSTANT – the type is a constant (IS_CONSTANT, IS_CONSTANT_AST)
* IS_TYPE_REFCOUNTED – the type is a subject for reference counting (IS_STRING excluding interned srings, IS_ARRAY except for immutable arrays, IS_OBJECT, IS_RESOURCE, IS_REFERENCE). Values for all refcounted types are pointers to corresponding structures having common part (zend_refcounted). It's possible to get this structure using Z_COUNTD() macro or some data from that structure using Z_GC_TYPE(), Z_GC_FLAGS(), G_GC_INFO() and Z_GC_TYPE_INFO(). It's also possible to access reference counter using Z_REFCOUNT(), Z_SET_REFCOUNT(), Z_ADDREF() and Z_DELREF() macros.
* IS_TYPE_COLLECTABLE – the type may be a root of unreferenced cycle and it's a subject for Garbage Collection (IS_ARRAY, IS_OBJECT).
* IS_TYPE_COPYABLE – the type has to be duplicated using zval_copy_ctor() on assignment or copy on write (IS_STRING excluding interned strings, IS_ARRAY)
* IS_TYPE_IMMUTABLE - the type can't be changed directly, but may be copied on write. Used by immutable arrays to avoid unnecessary array duplication.
Few constants flags are used as modifiers for IS_CONSTANT. Their meaning kept exactly the same as before.
* IS_CONSTANT_UNQUALIFIED
* IS_LEXICAL_VAR
* IS_LEXICAL_REF
* IS_CONSTANT_IN_NAMESPACE
The type of zval may be read using Z_TYPE() or Z_TYPE_P() macros, type flags using Z_TYPE_FLAGS() or Z_TYPE_FLAGS_P(), the combination of type and flags – Z_TYPE_INFO() or Z_TYPE_INFO_P(). **PHPNG doesn't work with pointers to pointers to zval and it doesn't provide macros with _PP() suffix anymore (like Z_TYPE_PP).**
==== IS_UNDEF ====
+----------------------------------------------------------------------------------+
| UNUSED (64-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | 0 | IS_UNDEF |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
We have to use special IS_UNDEF type for undefined IS_CV variables or empty HashTable slots (previously they were initialized with NULL pointers). The engine needs to suport IS_UNDEF type only in few places. User-land PHP scripts can't get or use undefined values. They see them as NULL.
The undefined value may be initialized using ZVAL_UNDEF() macro.
==== IS_NULL ====
+----------------------------------------------------------------------------------+
| UNUSED (64-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | 0 | IS_NULL |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
The null value may be initialized using ZVAL_NULL() macro.
==== IS_FALSE and IS_TRUE ====
+----------------------------------------------------------------------------------+
| UNUSED (64-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | 0 | IS_FALSE |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
We've split the old IS_BOOL type into separate IS_FALSE and IS_TRUE. Now the value may be checked using the same type tag avoiding additional memory read. Boolean values might be initialized using the same ZVAL_BOOL(), ZVAL_FALSE() or ZVAL_TRUE() macros.
==== IS_LONG ====
+----------------------------------------------------------------------------------+
| LONG VALUE (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | 0 | IS_LONG |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
The actual long value may be retrieved using Z_LVAL() or Z_LVAL_P() macro and initialized using ZVAL_LONG().
==== IS_DOUBLE ====
+----------------------------------------------------------------------------------+
| DOUBLE VALUE (64-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | 0 | IS_DOUBLE |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
The actual double value may be retrieved using Z_DVAL() or Z_DVAL_P() macro and initialized using ZVAL_DOUBLE().
==== IS_STRING ====
+----------------------------------------------------------------------------------+
| POINTER to zend_string (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | type_flags | IS_STRING |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
zend_string:
+-------------+-------------+------------+-----------------------------------------+
| gc_info | flags | IS_STRING | refcount (32-bit) |
+-------------+-------------+------------+-----------------------------------------+
| hash_value (64-bit ir 32-bit) |
+----------------------------------------+-----------------------------------------+
| string characters | string length (32-bit) |
+----------------------------------------+-----------------------------------------+
| string characters (continuation) |
+----------------------------------------------------------------------------------+
63 48 47 40 39 32 31 0
The actual value of this type is kept in the zend_string structure and zval keeps just a pointer to it. The first 64-bit word of this structure is actually the zend_refcounted structure. It consists from reference counter, type that repeats the type of zval (might be with some variations), additional flags and some data used during GC.
The string itself represented with hash_value (it don't have to be calculated, it's initialized with zero and calculated on request, but only once), string length and actual characters data.
The strings might be interned or dynamic. For interned strings we don't have to perform reference counting or duplication, but on the other hand they can't be modified in place.
The following additional flags might be used for strings:
* IS_STR_PERSISTENT - allocated using malloc (otherwise using emalloc)
* IS_STR_INTERNED - interned string
* IS_STR_PERMANENT – interned string that relives request boundary
* IS_STR_CONSTANT – constant index
* IS_STR_CONSTANT_UNQUALIFIED - the same as IS_CONSTANT_UNQUALIFIED
* IS_STR_AST - constant expression index
The value of the string may be retrived using Z_STRVAL(), Z_STRLEN(), Z_STRHASH() or Z_STR() macros, and it may be initialized using ZVAL_STRINGL(), ZVAL_STRING(), ZVAL_STR(), ZVAL_INT_STR() or ZVAL_NEW_STR() macros.
Many places in Zend Engine were changed to use zend_string* instead of char*.
==== IS_ARRAY ====
+----------------------------------------------------------------------------------+
| POINTER to zend_array (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | type_flags | IS_ARRAY |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
zend_array:
+-------------+-------------+------------+-----------------------------------------+
| gc_info | flags | IS_ARRAY | refcount (32-bit) |
+-------------+-------------+------------+-----------------------------------------+
| Embedded HashTable |
+----------------------------------------------------------------------------------+
63 48 47 40 39 32 31 0
HashTable:
+----------------------------------------+-----------------------------------------+
| nTableMask (32-bit) | nTableSize (32-bit) |
+----------------------------------------+-----------------------------------------+
| nNumOfElements (32-bit) | nNumUsed (32-bit) |
+-------------+-------------+------------+-----------------------------------------+
| nNextFreeElement (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
| arData - pointer to array of buckets (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
| arHash - pointer to array of buckets numbers (64-bit or 32-bit) |
+-------------+-------------+------------+-----------------------------------------+
| UNUSED (16) | applay_count| flags | nInternalPointer (32-bit) FIXME |
+-------------+-------------+------------+-----------------------------------------+
63 48 47 40 39 32 31 0
Bucket:
+----------------------------------------------------------------------------------+
| h - key hash_value or the numeric index value (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
| key - pointer to zend_string (NULL for numeric indexes) (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+ -+
| VALUE (64-bit) | |
+--------------------------+-------------+-------------+-------------+-------------+ +- Embedded zval
| next- hash collision list| UNUSED | const_flags | type_flags | TYPE | |
+--------------------------+-------------+-------------+-------------+-------------+ -+
63 32 31 24 23 16 15 8 7 0
IS_ARRAY representation is more or less the same. Instead of keeping reference counter in zval itself we moved it into zend_array structure (similar to zend string), and embedded HashTable structure there, so the cost of HashTable fields access is the same. On array assignment engine still use copy-on-write (increment reference counting), instead of duplication.
The HashTable representation, on the other hand, is changed significantly. At first, now, it's an adaptive data structure that uses plain array of preallocated Buckets and construct hash index only if necessary (for some use case it's always possible to access array values by their index, like in C arrays).
The following HashTable flags are defined:
* HASH_FLAG_PERSISTENT – HashTable must relive the request boundary and its data must be allocated using malloc().
* HASH_FLAG_APPLY_PROTECTION – detect indirect recursion
* HASH_FLAG_PACKED – this is not really a hash but a plain array with numeric indexes (arHash is NULL).
zend_array and embedded HashTable may be retrieved using Z_ARR() and Z_ARRVAL() macros and constructed using ZVAL_ARR(), ZVAL_NEW_ARRAY() and ZVAL_PERSISTENT_ARRAY().
==== IS_OBJECT ====
+----------------------------------------------------------------------------------+
| POINTER to zend_object (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | type_flags | IS_OBJECT |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
zend_object:
+-------------+-------------+------------+-----------------------------------------+
| gc_info | flags | IS_OBJECT | refcount (32-bit) |
+-------------+-------------+------------+-----------------------------------------+
| UNUSED (32-bit) | handle (32-bit) FIXME |
+----------------------------------------+-----------------------------------------+
| ce - pointer to zend_class_entry (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
| handlers - pointer to object handlers (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
| properties - pointer to HashTable of dynamic properties (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
| guards - pointer to HashTable used for recursion protection (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
| properties_table[0] - embedded cell of the first declared property (128-bit) |
+----------------------------------------------------------------------------------+
| ... |
+----------------------------------------------------------------------------------+
| properties_table[N] - embedded cell of the last declared property (128-bit) |
+----------------------------------------------------------------------------------+
| optional user data of internal classes (variable size) |
+----------------------------------------------------------------------------------+
63 48 47 40 39 32 31 0
Objects representation is changed more significantly. We removed double indirection (through) object store handle and double reference counting. We kept the object handle for compatibility. Predefined properties are stored in embedded cells allocated together with zend_object structure. dynamic_properties table is NULL by default. It may be lazily constructed on request. In this case it'll contain IS_INDIRECT references to embedded cells.
Few macros may be used to get zend_object contents – Z_OBJ(), Z_OBJ_HT(), Z_OBJ_HANDLER(), Z_OBJ_HANDLE(), Z_OBJCE(), Z_OBJPROP(), Z_OBJDEBUG(). Objects may be constructed using object_init() or object_init_ex() functions.
On assignment zend_object structure is not copied, only reference counter is incremented.
==== IS_RESOURCE ====
+----------------------------------------------------------------------------------+
| POINTER to zend_resource (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | type_flags | IS_RESOURCE |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
zend_resource:
+-------------+-------------+------------+-----------------------------------------+
| 0 | flags | IS_RESOURCE| refcount (32-bit) |
+-------------+-------------+------------+-----------------------------------------+
| handle - long number (64-bit or 32-bit) FIXME |
+----------------------------------------+-----------------------------------------+
| UNUSED (32-bit) | resource type (32-bit) |
+----------------------------------------+-----------------------------------------+
| ptr - pointer to actual resource data (64-bit or 32-bit) |
+----------------------------------------------------------------------------------+
63 48 47 40 39 32 31 0
For resource representation we now also use direct pointers and avoid double refcounting (however, keep the handle for compatibility).
Resource data may be retrieved using Z_RES(), Z_RES_HANDLE(), Z_RES_TYPE() and Z_RES_VAL() macros and initialized using ZVAL_RES(), ZVAL_NEW_RES() and ZVAL_PERSISTENT_RES() macros.
==== IS_REFERENCE ====
+----------------------------------------------------------------------------------+
| POINTER to zend_reference (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | type_flags | IS_REFERENCE|
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
zend_reference:
+-------------+-------------+------------+-----------------------------------------+
| 0 | flags | IS_REFER...| refcount (32-bit) |
+-------------+-------------+------------+-----------------------------------------+
| Embedded CELL for referenced zval (128-bit) |
+----------------------------------------------------------------------------------+
63 48 47 40 39 32 31 0
The most significant change is handling of PHP references. Previously we had just “is_ref” flag in each zval, now the actual reference value is stored in separate zend_reference structure with additional reference counter. All the aliased zvals just keep the pointer to the same zend_reference.
Note: The referenced value might be another scalar or reference-counted value (IS_STRING, IS_ARRAY, IS_OBJECT, IS_RESOURCE), but not another IS_REFERENCE neither IS_UNDEF.
Note: Nowadays PHP references might be simple turned into regular values when reference counter goes down to 1. With new implementation it's not so trivial operation.
The check if the value is a reference might be done using Z_ISREF() macro, the reference value read using Z_REF() and Z_REFVAL() macros. They may be constructed using ZVAL_REF(), ZVAL_NEW_REF() and ZVAL_NEW_PERSISTENT_REF().
==== IS_CONSTANT and IS_CONSTANT_AST ====
IS_CONSTANT actually points to zend_string structure. They are handled a bit differently from regular strings and arrays.
==== IS_INDIRECT ====
+----------------------------------------------------------------------------------+
| POINTER to real zval (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | 0 | IS_INDIRECT |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
New implementation assumes, that we store zval structures (not pointers) in arrays and function stack frames. It must not be a problem for arrays because scalar values are going to be just duplicated, and compound values may point to shared reference-couned structures anyway. However it is a problem for local variables (IS_CV), because they may be referenced through stack frame (by index) and through symbol table (by name). Both must point to the same structure. Values of IS_INDIRECT types are just weak pointers to real values. When we lazily create local symbol tables, we store IS_INDIRECT values in symbol tables and initialize them with the pointers to corresponding CV slots. It means that CV access by index became extremely efficient, as we don't need to perform double or even triple dereferences as before.
Global symbol tables are handled a bit differently. When we enter into some user code that uses global variables, we copy them from EG(symbol_table) into CV slots and initialize symtable values with IS_INDIRECT pointers. On exit we have to restore them back.
The same concept is used for object properties access. In case dynamic properties table is required it's first initialized with IS_INDIRECT references to predefined object properties slots.
Also, IS_INDIRECT pointers are used in VM during execution to pass address of variables between opcode handlers.
==== IS_STR_OFFSET (used internally in VM) ====
+----------------------------------------------------------------------------------+
| POINTER to string value (zval*) (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| offset (32-bit) | 0 | 0 | 0 | IS_STR_OF...|
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
This is another type used only in run-time to pass address of string element between opcodes.
==== IS_PTR (used internally by the Engine) ====
+----------------------------------------------------------------------------------+
| POINTER to internal entity (64-bit or 32-bit) |
+--------------------------+-------------+-------------+-------------+-------------+
| UNUSED (32-bit) | 0 | 0 | 0 | IS_PTR |
+--------------------------+-------------+-------------+-------------+-------------+
63 32 31 24 23 16 15 8 7 0
This type might be used to reuse the new HashTable implementation for some internal entities, not related to PHP values. (e.g. each zend_class_entry has to keep a HashTable of methods).
===== VM Changes =====
With new zval implementation IS_TMP_VAR, IS_VAR and IS_CV operands are handled in very similar way. All three operands just refer to certain slot of the current function stack frame. Such slots are allocated on segmented VM stack together with frame header (zend_execute_data). The first slots correspond to CV variables and the following to IS_TMP_VAR and IS_VAR. Except for local and temporary variables we also allocate space for syntactically nested function calls and actual parameters, that this function may push.
==== Function Stack Frame (zend_execute_data) ====
+----------------------------------------------------------------------------------+
| opline – instruction pointer (64/32-bit) |
+----------------------------------------------------------------------------------+
| op_array – current function (64/32-bit) |
+----------------------------------------------------------------------------------+
| function_state.function – currently calling function (64/32-bit) |
+----------------------------------------------------------------------------------+
| function_state.arguments – arguments of the currently calling function (64/32-bit|
+----------------------------------------------------------------------------------+
| object – current $this |
+----------------------------------------------------------------------------------+
| scope – static scope of the current function (class where it's defined) |
+----------------------------------------------------------------------------------+
| called_scope – called scope of the function |
+----------------------------------------------------------------------------------+
| symbol_table – current symbol table |
+----------------------------------------------------------------------------------+
| run_time_cache – current run-time cache |
+----------------------------------------------------------------------------------+
| prev_execute_data – pointer to the previous function call frame |
+----------------------------------------------------------------------------------+
| return_value – pointer to the zval where this function has return to |
+----------------------------------------------------------------------------------+
| frame_kind – top or nested, function or eval/include code |
+----------------------------------------------------------------------------------+
| ... FIXME |
+----------------------------------------------------------------------------------+
| Embedded CELL for the first local variable value (128-bit) |
+----------------------------------------------------------------------------------+
| ... |
+----------------------------------------------------------------------------------+
| Embedded CELL for the last local variable value (128-bit) |
+----------------------------------------------------------------------------------+
| Embedded CELL for the first temporary variable value (128-bit) |
+----------------------------------------------------------------------------------+
| ... |
+----------------------------------------------------------------------------------+
| Embedded CELL for the last temporary variable value (128-bit) |
+----------------------------------------------------------------------------------+
| Syntactically nested call slot (first) FIXME |
+----------------------------------------------------------------------------------+
| ... FIXME |
+----------------------------------------------------------------------------------+
| Syntactically nested call slot (last) FIXME |
+----------------------------------------------------------------------------------+
| Embedded CELL for the first actual argument value (128-bit) |
+----------------------------------------------------------------------------------+
| ... |
+----------------------------------------------------------------------------------+
| Embedded CELL for the last actual argument value (128-bit) |
+----------------------------------------------------------------------------------+
[TO BE CONTINUED]