This is an area where a number of us are brainstorming and developing ideas around the proposal for an improved interface between the PHP Virtual Machine and its extensions. Some of us have worked on the ProjectZero implementation of PHP so we refer to our experiences there.
In most case macros exist to access the ZE data structures such as zvals. By using these macros it is easy to implement the programming interface without matching the layout of the data structures byte for byte. Unfortunately there is also plenty of extension code that accesses the data structures such as zvals directly without using macros.
This is the most pernicious problem of all.
The zend engine recognizes two types of memory. There is memory that is allocated “persistently” (pemalloc) that is to say it can persist from request to request and there non persistently allocated (emalloc). Typically this memory is associated with a zval which participates in the engine garbage collection (a reference counting scheme). If you think about the interaction with an extension there are actually really three cases:
Case 2 causes a problem if we assume that we do not want the extensions to participate in the VM garbage collection scheme. There is an example of this in the XML extension function xml_set_object
ALLOC_ZVAL(parser->object); *parser->object = **mythis; zval_copy_ctor(parser->object); INIT_PZVAL(parser->object);
The above code creates a new zval and sets a reference to it in an XML parser resource which is passed across requests.
Projectzero assumes case 1 for all function calls. i.e all non-persistent memory is freed. ProjectZero had to modify the extension code specifically to remove instances of case 2.
The array extension contains a great deal of code that manipulates hash tables directly. The approach taken in ProjectZero was to recode this as an “internal extension” in Java.
A simple scheme for dealing with pass by reference is to copy back any changes to the referenced value into the VM at the end of an extension function call. This simple scheme does not cope with scenarios where an extesion function returns a reference to a parameter that was passed by reference. Nor does it deal with a situation where a passed reference must be inserted into an array by reference. However, project zero has yet to actually encounter any extension which actually does this (other than the array extension which we coded internally in project zero)
Extensions contain a great deal of code which accesses the various globals (EG,CG etc.) Fortunately so far this code has all been easy to deal with by using macro substitution to replace the accesses with getter/setter call across the VM interface.
The existing ZE interface contains a great deal of duplication. There are several methods and macros to do almost everything. The approach taken by project zero has been to map these macros to a small set of core APIs which are structured around a table of function pointers similar to JNI.
We do not want to spam internals with hundreds of emails about our musings as we bounce ideas around but we do want to do this in the open hence this page. The folks currently involved in the discussion are as follows. If you are interested in the discussion or feel we are going about things in the wrong way please contact us: