Table of Contents

Upgrading PHP extensions from PHP5 to NG

Many of the frequently used API functions have changed, such as the HashTable API; this page intends to document as many as possible of those changes that actually affect the way extension and core code is written. It's highly recommended to read the general information about PHPNG implementation at phpng-int, before reading this guide.

This is not a complete guide that covers every possible situation. This is a collection of prescriptions for most useful cases. I hope it must be enough for most user-level extensions. However if you did not find some information here, found a solution and think it may be useful for others - feel free to add your recipe.

General Advice

zval

-  zval *zv;
-  ALLOC_INIT_ZVAL();
-  ZVAL_LONG(zv, 0);
+  zval zv;
+  ZVAL_LONG(&zv, 0);
struct _zval_struct {
	zend_value        value;			/* value */
	union {
		struct {
			ZEND_ENDIAN_LOHI_4(
				zend_uchar    type,			/* active type */
				zend_uchar    type_flags,
				zend_uchar    const_flags,
				zend_uchar    reserved)	    /* various IS_VAR flags */
		} v;
		zend_uint type_info;
	} u1;
	union {
		zend_uint     var_flags;
		zend_uint     next;                 /* hash collision chain */
		zend_uint     str_offset;           /* string offset */
		zend_uint     cache_slot;           /* literal cache slot */
	} u2;
};

and zend_value as

typedef union _zend_value {
	long              lval;				/* long value */
	double            dval;				/* double value */
	zend_refcounted  *counted;
	zend_string      *str;
	zend_array       *arr;
	zend_object      *obj;
	zend_resource    *res;
	zend_reference   *ref;
	zend_ast_ref     *ast;
	zval             *zv;
	void             *ptr;
	zend_class_entry *ce;
	zend_function    *func;
} zend_value;
- Z_ADDREF_P(zv)
+ if (Z_REFCOUNTED_P(zv)) {Z_ADDREF_P(zv);}
# or equivalently
+ Z_TRY_ADDREF_P(zv);
- zval tmp;
- ZVAL_COPY_VALUE(&tmp, zv);
- zval_copy_ctor(&tmp);
- convert_to_string(&tmp);
- // ...
- zval_dtor(&tmp);
+ zend_string *str = zval_get_string(zv);
+ // ...
+ zend_string_release(str);

Look into zend_types.h code for more details: https://github.com/php/php-src/blob/master/Zend/zend_types.h

References

zval in PHPNG don't have is_ref flag anymore. References are implemented using a separate complex reference-counted type IS_REFERENCE. You may still use Z_ISREF*() macros to check if the given zval is reference. Actually, it just checks if type of the given zval equal to IS_REFERENCE. Macros that worked with is_ref flag are removed: Z_SET_ISREF*(), Z_UNSET_ISREF*() and Z_SET_ISREF_TO*(). Their usage should be changed in the following way:

- Z_SET_ISREF_P(zv);
+ ZVAL_MAKE_REF(zv);

- Z_UNSET_ISREF_P(zv);
+ if (Z_ISREF_P(zv)) {ZVAL_UNREF(zv);}

Previously references might be directly checked for referenced type. Now we have to check it indirectly through Z_REFVAL*() macro

- if (Z_ISREF_P(zv) && Z_TYPE_P(zv) == IS_ARRAY) {
+ if (Z_ISREF_P(zv) && Z_TYPE_P(Z_REFVAL_P(zv)) == IS_ARRAY) {

or perform manual dereferencing using ZVAL_DEREF() macro

- if (Z_ISREF_P(zv)) {...}
- if (Z_TYPE_P(zv) == IS_ARRAY) {
+ if (Z_ISREF_P(zv)) {...}
+ ZVAL_DEREF(zv);
+ if (Z_TYPE_P(zv) == IS_ARRAY) {

Booleans

IS_BOOL does not exist anymore but IS_TRUE and IS_FALSE are types on their own:

- if ((Z_TYPE_PP(item) == IS_BOOL || Z_TYPE_PP(item) == IS_LONG) && Z_LVAL_PP(item)) {
+ if (Z_TYPE_P(item) == IS_TRUE || (Z_TYPE_P(item) == IS_LONG && Z_LVAL_P(item))) {

The Z_BVAL*() macros are removed. Be careful, the return value of Z_LVAL*() on IS_FALSE/IS_TRUE is undefined.

Strings

The value/length of the string may be accessed using the same macros Z_STRVAL*() and Z_STRLEN*(). However now the underlining data structure for string representation is zend_string (it's described in separate section). The zend_string may be retrieved from zval by Z_STR*() macro. It's also possible to get the hash value of the string through Z_STRHASH*().

In case code needs to check if the given string is interned or not, now it should be done using zend_string (not char*)

- if (IS_INTERNED(Z_STRVAL_P(zv))) {
+ if (IS_INTERNED(Z_STR_P(zv))) {

Creation of string zvals was a little bit changed. Previously macros like ZVAL_STRING() had an additional argument that told if the given characters should be duplicated or not. Now these macros always have to create zend_string structure so this parameter became useless. However if its actual value was 0, you have free the original string to avoid memory leak.

- ZVAL_STRING(zv, str, 1);
+ ZVAL_STRING(zv, str);

- ZVAL_STRINGL(zv, str, len, 1);
+ ZVAL_STRINGL(zv, str, len);

- ZVAL_STRING(zv, str, 0);
+ ZVAL_STRING(zv, str);
+ efree(str);

- ZVAL_STRINGL(zv, str, len, 0);
+ ZVAL_STRINGL(zv, str, len);
+ efree(str);

The same is true for similar macros like RETURN_STRING(), RETVAL_STRNGL(), etc and some internal API functions.

- add_assoc_string(zv, key, str, 1);
+ add_assoc_string(zv, key, str);

- add_assoc_string(zv, key, str, 0);
+ add_assoc_string(zv, key, str);
+ efree(str);

The double reallocation may be avoided using zend_string API directly and creating zval directly from zend_string.

- char * str = estrdup("Hello");
- RETURN_STRING(str);
+ zend_string *str = zend_string_init("Hello", sizeof("Hello")-1, 0);
+ RETURN_STR(str);

Z_STRVAL*() now should be used as read-only object. It's not possible to assign anything into it. It's possible to modify separate characters, but before doing it you must be sure that this string is not referred form everywhere else (it is not interned and its reference-counter is 1). Also after in-place string modification you might need to reset calculated hash value.

  SEPARATE_ZVAL(zv);
  Z_STRVAL_P(zv)[0] = Z_STRVAL_P(zv)[0] + ('A' - 'a');
+ zend_string_forget_hash_val((Z_STR_P(zv))

zend_string API

Zend has a new zend_string API, except that zend_string is underlining structure for string representation in zval, these structures are also used throughout much of the codebase where char* and int were used before.

zend_strings (not IS_STRING zvals) may be created using zend_string_init(char *val, size_t len, int persistent) function. The actual characters may be accessed as str→val and string length as str→len. The hash value of the string should be accessed through zend_string_hash_val function. It'll re-calculate hash value if necessary.

Strings should be deallocated using zend_string_release() function, that doesn't necessary free memory, because the same string may be referenced from few places.

If you are going to keep zend_string pointer somewhere you should increase it reference-counter or use zend_string_copy() function that will do it for you. In many places where code copied characters just to keep value (not to modify) it's possible to use this function instead.

- ptr->str = estrndup(Z_STRVAL_P(zv), Z_STRLEN_P(zv));
+ ptr->str = zend_string_copy(Z_STR_P(zv));
  ...
- efree(str);
+ zend_string_release(str);

In case the copied string is going to be changed you may use zend_string_dup() instead

- char *str = estrndup(Z_STRVAL_P(zv), Z_STRLEN_P(zv));
+ zend_string *str = zend_string_dup(Z_STR_P(zv));
  ...
- efree(str);
+ zend_string_release(str);

The code with old macros must be supported as well, so switching to the new ones is not necessary.

In some cases it makes sense to allocate string buffer before the actual string data is known. You may use zend_string_alloc() and zend_string_realloc() functions to do it.

- char *ret = emalloc(16+1);
- md5(something, ret); 
- RETURN_STRINGL(ret, 16, 0);
+ zend_string *ret = zend_string_alloc(16, 0);
+ md5(something, ret->val);
+ RETURN_STR(ret);

Not all of the extensions code have to be converted to use zend_string instead of char*. It's up to extensions maintainer to decide which type is more suitable in each particular case.

Look into zend_string.h code for more details: https://github.com/php/php-src/blob/master/Zend/zend_string.h

smart_str and smart_string

For consistent naming convention the old smart_str API was renamed into smart_string. it may be used as before except for new names.

- smart_str str = {0};
- smart_str_appendl(str, " ", sizeof(" ") - 1);
- smart_str_0(str);
- RETURN_STRINGL(implstr.c, implstr.len, 0);
+ smart_string str = {0};
+ smart_string_appendl(str, " ", sizeof(" ") - 1);
+ smart_string_0(str);
+ RETVAL_STRINGL(str.c, str.len);
+ smart_string_free(&str);

In addition we introduced a new zend_str API that works with zend_string directly

- smart_str str = {0};
- smart_str_appendl(str, " ", sizeof(" ") - 1);
- smart_str_0(str);
- RETURN_STRINGL(implstr.c, implstr.len, 0);
+ smart_str str = {0};
+ smart_str_appendl(&str, " ", sizeof(" ") - 1);
+ smart_str_0(&str);
+ if (str.s) {
+   RETURN_STR(str.s);
+ } else {
+   RETURN_EMPTY_STRING();
+ }

smart_str defined as

typedef struct {
    zend_string *s;
    size_t a;
} smart_str;

The API of both smart_str and smart_string are very similar and actually they repeat the API used in PHP5. So it must not be a big problems to adopt the code. the biggest question what AI to select for each particular case, but it depends the way the final result is used.

Note that the previously check for a empty smart_str might need to be changed

- if (smart_str->c) {
+ if (smart_str->s) {

strpprintf

In addition to spprintf() and vspprintf() functions we introduced similar functions that produce zend_string instead char*. it's up to you to decide when you should change to the new variants.

PHPAPI zend_string *vstrpprintf(size_t max_len, const char *format, va_list ap);
PHPAPI zend_string *strpprintf(size_t max_len, const char *format, ...);

Arrays

Arrays implemented more or less the same, however, if previously the underlining structure was imlemented as a pointer to HashTable now we have here a pointer to zend_array that keep HashTable inside. The HashTable may be read as before using Z_ARRVAL*() macros, but now it's not possible to change pointer to HashTable. It's only possible to get/set pointer to the whole zend_array through macro Z_ARR*().

The best way to create arrays is to use old array_init() function, but it's also possible to create new uninitialized arrays using ZVAL_NEW_ARR() or initialize it using zend_array structure through ZVAL_ARR()

Some arrays might be immutable (may be checked using Z_IMMUTABLE() macro). And in case code need to modify them, they have to be duplicated first. Iteration through immutable arrays using internal position pointer is not possible as well. It's possible to walk through such arrays using old iteration API with external position pointer or using new HashTable iteration API described in separate section.

HashTable API

HashTable API was changed significantly, and it may cause some troubles in extensions porting.

- zend_hash_update(ht, Z_STRVAL_P(key), Z_STRLEN_P(key)+1, (void*)&zv, sizeof(zval**), NULL) == SUCCESS) {
+ if (zend_hash_update(EG(function_table), Z_STR_P(key), zv)) != NULL) {
- if (zend_hash_find(ht, Z_STRVAL_P(key), Z_STRLEN_P(key)+1, (void**)&zv_ptr) == SUCCESS) {
+ if ((zv = zend_hash_find(ht, Z_STR_P(key))) != NULL) {
- if (zend_hash_find(ht, "value", sizeof("value"), (void**)&zv_ptr) == SUCCESS) {
+ if ((zv = zend_hash_str_find(ht, "value", sizeof("value")-1)) != NULL) {

This also applies to other hashtable-related APIs outside of zend_hash. For example:

- add_assoc_bool_ex(&zv, "valid", sizeof("valid"), 0);
+ add_assoc_bool_ex(&zv, "valid", sizeof("valid") - 1, 0);
- if (zend_hash_find(EG(class_table), Z_STRVAL_P(key), Z_STRLEN_P(key)+1, (void**)&ce_ptr) == SUCCESS) {
+ if ((ce_ptr = zend_hash_find_ptr(EG(class_table), Z_STR_P(key))) != NULL) {

- zend_hash_update(EG(class_table), Z_STRVAL_P(key), Z_STRLEN_P(key)+1, (void*)&ce, sizeof(zend_class_entry*), NULL) == SUCCESS) {
+ if (zend_hash_update_ptr(EG(class_table), Z_STR_P(key), ce)) != NULL) {
- zend_hash_update(EG(function_table), Z_STRVAL_P(key), Z_STRLEN_P(key)+1, (void*)func, sizeof(zend_function), NULL) == SUCCESS) {
+ if (zend_hash_update_mem(EG(function_table), Z_STR_P(key), func, sizeof(zend_function))) != NULL) {
zval* zend_hash_add_new(HashTable *ht, zend_string *key, zval *zv);
zval* zend_hash_str_add_new(HashTable *ht, char *key, int len, zval *zv);
zval* zend_hash_index_add_new(HashTable *ht, pzval *zv);
zval* zend_hash_next_index_insert_new(HashTable *ht, pzval *zv);
void* zend_hash_add_new_ptr(HashTable *ht, zend_string *key, void *pData);
...
- void my_ht_destructor(void *ptr)
+ void my_ht_destructor(zval *zv)
  {
-    my_ht_el_t *p = (my_ht_el_t*) ptr;
+    my_ht_el_t *p = (my_ht_el_t*) Z_PTR_P(zv);
     ...
+    efree(p); // this efree() is not always necessary
  }
);
typedef struct _zend_hash_key {
	ulong        h;
	zend_string *key;
} zend_hash_key;

In some cases, it makes sense to change usage of zend_hash_apply*() functions into usage of new HashTable iteration API. This may lead to smaller and more efficient code.

Reviewing zend_hash.h is a very good idea: https://github.com/php/php-src/blob/master/Zend/zend_hash.h

HashTable Iteration API

We provide few specialized macros to iterate through elements (and keys) of HashTables. The first argument of the macros is the hashtables, the others are variables to be assigned on each iteration step.

The best suitable macro should be used instead of the old reset, current, and move functions.

- HashPosition pos;
  ulong num_key;
- char *key;
- uint key_len;
+ zend_string *key;
- zval **pzv;
+ zval *zv;
-
- zend_hash_internal_pointer_reset_ex(&ht, &pos);
- while (zend_hash_get_current_data_ex(&ht, (void**)&ppzval, &pos) == SUCCESS) {
-   if (zend_hash_get_current_key_ex(&ht, &key, &key_len, &num_key, 0, &pos) == HASH_KEY_IS_STRING){
-   }
+ ZEND_HASH_FOREACH_KEY_VAL(ht, num_key, key, val) {
+   if (key) { //HASH_KEY_IS_STRING
+   }
    ........
-   zend_hash_move_forward_ex(&ht, &pos);
- }
+ } ZEND_HASH_FOREACH_END();

Objects

TODO: …

Custom Objects

TODO: …

zend_object struct is defined as:

struct _zend_object {
    zend_refcounted   gc;
    zend_uint         handle; // TODO: may be removed ???
    zend_class_entry *ce;
    const zend_object_handlers *handlers;
    HashTable        *properties;
    HashTable        *guards; /* protects from __get/__set ... recursion */
    zval              properties_table[1];
};

We inlined the properties_table for better access performance, but that also brings a problem, we used to define a custom object like this:

struct custom_object {
   zend_object std;
   void  *custom_data;
}
 
 
zend_object_value custom_object_new(zend_class_entry *ce TSRMLS_DC) {
 
   zend_object_value retval;
   struct custom_object *intern;
 
   intern = emalloc(sizeof(struct custom_object));
   zend_object_std_init(&intern->std, ce TSRMLS_CC);
   object_properties_init(&intern->std, ce);
   retval.handle = zend_objects_store_put(intern,
        (zend_objects_store_dtor_t)zend_objects_destroy_object,
        (zend_objects_free_object_storage_t) custom_free_storage, 
        NULL TSRMLC_CC);
   intern->handle = retval.handle;
   retval.handlers = &custom_object_handlers;
   return retval;
}
 
struct custom_object* obj = (struct custom_object *)zend_objects_get_address(getThis());

but now, zend_object is variable length now(inlined properties_table). thus above codes should be changed to:

struct custom_object {
   void  *custom_data;
   zend_object std;
}
 
zend_object * custom_object_new(zend_class_entry *ce TSRMLS_DC) {
     # Allocate sizeof(custom) + sizeof(properties table requirements)
     struct custom_object *intern = ecalloc(1, 
         sizeof(struct custom_object) + 
         zend_object_properties_size(ce));
     # Allocating:
     # struct custom_object {
     #    void *custom_data;
     #    zend_object std;
     # }
     # zval[ce->default_properties_count-1]
     zend_object_std_init(&intern->std, ce TSRMLS_CC);
     ...
     custom_object_handlers.offset = XtOffsetOf(struct custom_obj, std);
     custom_object_handlers.free_obj = custom_free_storage;
 
     intern->std.handlers = custom_object_handlers;
 
     return &intern->std;
}
 
# Fetching the custom object:
 
static inline struct custom_object * php_custom_object_fetch_object(zend_object *obj) {
      return (struct custom_object *)((char *)obj - XtOffsetOf(struct custom_object, std));
}
 
#define Z_CUSTOM_OBJ_P(zv) php_custom_object_fetch_object(Z_OBJ_P(zv));
 
struct custom_object* obj = Z_CUSTOM_OBJ_P(getThis());

zend_object_handlers

a new item offset is added to zend_object_handlers, you should always define it as the offset of the zend_object in your custom object struct.

it is used by zend_objects_store_* to find the right start address of the allocated memory.

// An example in spl_array
memcpy(&spl_handler_ArrayObject, zend_get_std_object_handlers(), sizeof(zend_object_handlers));
spl_handler_ArrayObject.offset = XtOffsetOf(spl_array_object, std);

the memory of the object now will be released by zend_objects_store_*, thus you should not free the memory in you custom objects free_obj handler.

Resources

- long handle = Z_LVAL_P(zv);
- int  type;
- void *ptr = zend_list_find(handle, &type);
+ long handle = Z_RES_P(zv)->handle;
+ int  type = Z_RES_P(zv)->type;
+ void *ptr = = Z_RES_P(zv)->ptr;
- long handle = Z_RESVAL_P(zv);
+ long handle = Z_RES_P(zv)->handle;
- ZEND_FETCH_RESOURCE2(ib_link, ibase_db_link *, &link_arg, link_id, LE_LINK, le_link, le_plink);

//if you are sure that link_arg is a IS_RESOURCE type, then use :
+if ((ib_link = (ibase_db_link *)zend_fetch_resource2(Z_RES_P(link_arg), LE_LINK, le_link, le_plink)) == NULL) {
+    RETURN_FALSE;
+}

//otherwise, if you know nothing about link_arg's type, use
+if ((ib_link = (ibase_db_link *)zend_fetch_resource2_ex(link_arg, LE_LINK, le_link, le_plink)) == NULL) {
+    RETURN_FALSE;
+}

- REGISTER_RESOURCE(return_value, result, le_result);
+ RETURN_RES(zend_register_resource(result, le_result);
- zend_list_addref(Z_LVAL_P(zv));
+ Z_ADDREF_P(zv);

it's the same

- zend_list_addref(Z_LVAL_P(zv));
+ Z_RES_P(zv)->gc.refcount++;
- zend_list_delete(Z_LVAL_P(zv));
+ zend_list_delete(Z_RES_P(zv));
- zend_list_delete(Z_LVAL_P(zv));
+ zend_list_close(Z_RES_P(zv));

Parameters Parsing API changes

- long lval;
+ zend_long lval;
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "l", &lval) == FAILURE) {
  char *str;
- int len;
+ size_t len;
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &len) == FAILURE) {
- char *str;
- int len;
- if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &len) == FAILURE) {
+ zend_string *str;
+ if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "S", &str) == FAILURE) {
- zval **pzv;
- if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "Z", &pzv) == FAILURE) {
+ zval *zv;
+ if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zv) == FAILURE) {
- zval ***argv = NULL;
+ zval *argv = NULL;
  int argn;
  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "+", &argv, &argn) == FAILURE) {
- zval **ret;
- if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "Z", &ret) == FAILURE) {
+ zval *ret;
+ if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z/", &ret) == FAILURE) {
    return;
  }
- ZVAL_LONG(*ret, 0);
+ ZVAL_LONG(ret, 0);

Call Frame Changes (zend_execute_data)

Information about each function call recorded in a chain of zend_execute_data structures. EG(current_execute_data) points into call frame of currently executed functions (previously zend_execute_data structures were created only for user-level PHP functions). I'll try to explain the difference between old and new call frame structures field by field.

Arguments to functions stored in zval slots directly after zend_execute_data structure. they may be accessed using ZEND_CALL_ARG(execute_data, arg_num) macro. For user PHP functions first argument overlaps with first compiled variable - CV0, etc. In case caller passes more arguments that callee receives, all extra arguments are copied to be after all used by callee CVs and TMP variables.

Executor Globals - EG() Changes

- symbols = zend_hash_num_elements(&EG(symbol_table));
+ symbols = zend_hash_num_elements(&EG(symbol_table).ht);
  zend_execute_data *ex = EG(current_execute_data);
+ while (ex && (!ex->func || !ZEND_USER_CODE(ex->func->type))) {
+    ex = ex->prev_execute_data;
+ }
  if (ex) {

Opcodes changes

temp_variable

PCRE

Some pcre APIs use or return zend_string now. F.e. php_pcre_replace returns a zend_string and takes a zend_string as 1st argument. Double check their declarations as well as compilers warnings, which are very likely about wrong arguments types.