VM extension API Scratchpad
This is an area where a number of us are brainstorming and developing ideas around the proposal for an improved interface between the PHP Virtual Machine and its extensions. Some of us have worked on the ProjectZero implementation of PHP so we refer to our experiences there.
Assumptions
- We assume that the extensions may exist in in a different address space to the VM. The extensions must not assume that they can read or write VM memory.
- We assume that the extensions cannot use any extension interfaces other than those documented in the interface.
- We assume that whilst there will be (are) many VMs implementing PHP, there should only be one set of extensions which run unchanged on all VMs. What we'd like to do is arrive at an interface that facilitates this.
List of problems to solve
Direct manipulation of Zend Engine internal data structures without using macros
In most case macros exist to access the ZE data structures such as zvals. By using these macros it is easy to implement the programming interface without matching the layout of the data structures byte for byte. Unfortunately there is also plenty of extension code that accesses the data structures such as zvals directly without using macros.
Storage Allocation
This is the most pernicious problem of all.
The zend engine recognizes two types of memory. There is memory that is allocated “persistently” (pemalloc) that is to say it can persist from request to request and there non persistently allocated (emalloc). Typically this memory is associated with a zval which participates in the engine garbage collection (a reference counting scheme). If you think about the interaction with an extension there are actually really three cases:
- Memory that is used only during a single extension function call. At the end of the call the extension retains no references to the memory. Anything that must persist beyond the function call has been stored away inside the VM.
- Memory that is allocated from the temporary heap on one extension function call but accessed on a later call, e.g the extension caches a pointer to the memory across requests in extension global storage or a resource.
- Persistent memory that persists from request to request.
Case 2 causes a problem if we assume that we do not want the extensions to participate in the VM garbage collection scheme. There is an example of this in the XML extension function xml_set_object
ALLOC_ZVAL(parser->object); *parser->object = **mythis; zval_copy_ctor(parser->object); INIT_PZVAL(parser->object);
The above code creates a new zval and sets a reference to it in an XML parser resource which is passed across requests.
Solution used in Project Zero
Projectzero assumes case 1 for all function calls. i.e all non-persistent memory is freed. ProjectZero had to modify the extension code specifically to remove instances of case 2.
Use of HashTable to represent PHP arrays and also as a library utility function.
- The problem here is that today the Zend HashTable structure is used to represent two conceptually very different things.
- Firstly it is used to represent a PHP user space array. It is an interface type that can be passed to the VM.
- Secondly it is used as a general programming utility to implement anything that an extension might want to use a HashTable for.
- This makes it very difficult to know whether to route an API call to the VM interface for interaction with the VM's model of a PHP array or whether to route it to the general HashTable interface.
Solution used in Project Zero
- We have found so far that the use of array_init to initialise the HashTable tends to signify that the HashTable is going to be used to create a user space array. Thus we use this to determine how to route API requests.
Extension code which manipulates the contents of HashTables directly
The array extension contains a great deal of code that manipulates hash tables directly. The approach taken in ProjectZero was to recode this as an “internal extension” in Java.
Pass by reference and return by reference
A simple scheme for dealing with pass by reference is to copy back any changes to the referenced value into the VM at the end of an extension function call. This simple scheme does not cope with scenarios where an extesion function returns a reference to a parameter that was passed by reference. Nor does it deal with a situation where a passed reference must be inserted into an array by reference. However, project zero has yet to actually encounter any extension which actually does this (other than the array extension which we coded internally in project zero)
Access to VM global values
Extensions contain a great deal of code which accesses the various globals (EG,CG etc.) Fortunately so far this code has all been easy to deal with by using macro substitution to replace the accesses with getter/setter call across the VM interface.
Size of the interface and duplication
The existing ZE interface contains a great deal of duplication. There are several methods and macros to do almost everything. The approach taken by project zero has been to map these macros to a small set of core APIs which are structured around a table of function pointers similar to JNI.
Extensions which write to $php_errormsg
Folks involved in the discussion
We do not want to spam internals with hundreds of emails about our musings as we bounce ideas around but we do want to do this in the open hence this page. The folks currently involved in the discussion are as follows. If you are interested in the discussion or feel we are going about things in the wrong way please contact us:
- Rob Nicholson (nicholsr@php.net)
- Andy Wharmby (wharmby@php.net)
- Iain Lewis (??iains id??)
- Paul Biggar (paul.biggar@gmail.com)