internals:engine:objects

This is an old revision of the document!


This is a bottom-top approach to Zend 2 objects.

Despite the wording, this document is not a specification, it results from analyzing the PHP implementation. Since it attempts to extract general rules from specific code snippets, it may contain wrong inferences. If you find errors, correct them (requires permission to edit the wiki) or send an e-mail to glopes ~at~ nebm.ist.utl.pt.

Definitions

We'll deal with three separate entities here: references, objects and classes. Classes represent a type and define the behavior of all the objects (the instances of that class) of that type. Each object will typically have one or more references to it. These are abstract concepts; in terms of implementation in PHP, references are mapped to object zvals, and the two terms are here used interchangeably.

Internally, the word object is overloaded to also mean “references” -- to avoid confusion, I shall not use the term object to refer to references, reserving it to the objects (either the concept or the storage of their data in memory). Additionally, the term type may mean having a certain handler table. Two classes may have different behavior (different methods, etc.) and share the same handler table. Finally, the term reference, when applied to a zval, may also mean the zval is part of a reference set (Z_ISREF) -- the ambiguity should be cleared in the context.

The object zval

A zval that is an object reference will have the type IS_OBJECT. In this case, its value, which can be retrieved with Z_OBJVAL, will be of type zend_object_value, which is defined like this:

typedef struct _zend_object_value {
    zend_object_handle handle;        /* retrieve with Z_OBJ_HANDLE(zval) */
    zend_object_handlers *handlers;   /* retrieve with Z_OBJ_HT(zval) */
} zend_object_value;

The field handle (which can be retrieved with the Z_OBJ_HANDLE macro family) identifies the object to which the reference refers among those of the same type; zend_object_handle is actually just an integer -- that is, an object is uniquely identified by an integer and a zend_object_handlers structure. Consequently, two references are identical in the sense of the === operator if and only if they share both the handlers and the handle.

The handlers field has another purpose besides identifying the referred object. The structure it points to (the handler table) defines, at a low level, the behavior of the objects of that type. A specific handler function can be accessed with Z_OBJ_HANDLER(zval, hf).

The lifecycle of the references is associated with that of the objects. We likely want the object to be destroyed once there are no more references to it. The handler table has two entries for that purpose: add_ref and del_ref. The first should be called when a new reference for that object is created; the second when is deleted. Functions like zval_copy_ctor, zval_ptr_dtor and zval_dtor take care of calling the handler. So there are two types of refcounts one should have in mind when dealing with objects -- those of references (i.e., the zvals) and those of the objects themselves.

The handler table

Let's now explore the members of the handler table, which define the behavior of a class at a low level. Once we introduce the zend standard object, we'll see what their default values are (TODO).

typedef struct _zend_object_handlers {
    /* general object functions */
    zend_object_add_ref_t              add_ref;
    zend_object_del_ref_t              del_ref;
    zend_object_clone_obj_t            clone_obj;
    /* individual object functions */
    zend_object_read_property_t        read_property;
    zend_object_write_property_t       write_property;
    zend_object_read_dimension_t       read_dimension;
    zend_object_write_dimension_t      write_dimension;
    zend_object_get_property_ptr_ptr_t get_property_ptr_ptr;
    zend_object_get_t                  get;
    zend_object_set_t                  set;
    zend_object_has_property_t         has_property;
    zend_object_unset_property_t       unset_property;
    zend_object_has_dimension_t        has_dimension;
    zend_object_unset_dimension_t      unset_dimension;
    zend_object_get_properties_t       get_properties;
    zend_object_get_method_t           get_method;
    zend_object_call_method_t          call_method;
    zend_object_get_constructor_t      get_constructor;
    zend_object_get_class_entry_t      get_class_entry;
    zend_object_get_class_name_t       get_class_name;
    zend_object_compare_t              compare_objects;
    zend_object_cast_t                 cast_object;
    zend_object_count_elements_t       count_elements;
    zend_object_get_debug_info_t       get_debug_info;
    zend_object_get_closure_t          get_closure;
} zend_object_handlers;

Except where indicated, the arguments are guaranteed not to be null pointers.

add_ref

void (*add_ref)(zval *object TSRMLS_DC)
  • Called when a new zval referring to the object is created. Called by zval_copy_ctor.
  • It may also be called when there is a need to hold some other kind of reference to the object. For instance, some instance method of an object a of class A may create and return an object b of class B that depends on data of the object a that spawned it. In that case, a possible strategy is to add reference to the a when b is created and store in b the handle of a so that a reference can be deleted when b is destroyed. Alternatively, b may store (e.g. as a property) a zval object to a.
  • Should not be NULL.

del_ref

void (*del_ref)(zval *object TSRMLS_DC)
  • Called when a zval referring to the object is destroyed. Called by zval_dtor and therefore by functions that call zval_dtor such as zval_ptr_dtor and the convert_to family.
  • See also add_ref.
  • Should not be NULL.

clone_obj

zend_object_value (*clone_obj)(zval *object TSRMLS_DC)
  • Called when an object is to be cloned (associated with usage of the clone operator in user space).
  • Should return a zend_object_value that refers to a newly created object that is equal to the object referred to the passed reference. The two objects should not be identical, i.e., == applied to the references should return true but === should return false. The compare_objects handlers must be the same since that is a requisite for == returning true, but typically, as one would want the two objects to have identical behavior, they ought to share all the handlers.
  • The created object should be initialized as if it had one reference.
  • May be NULL to forbid cloning.

read_property

zval *(*read_property)(zval *object, zval *member, int type TSRMLS_DC)
  • Retrieves a property of an object as a pointer to a zval; corresponds to $obj->prop in (mainly) a reading context in userspace.
  • If the argument member is not of type IS_STRING, you convert it (after deep copying!). You may use convert_to_string function for the conversion.
  • The argument type is of the BP family, which consists of:
/* var status for backpatching */
#define BP_VAR_R          0  /* read */
#define BP_VAR_W          1  /* write */
#define BP_VAR_RW         2  /* read/write */
#define BP_VAR_IS         3  /* check for existence */
#define BP_VAR_NA         4  /* if not applicable; unused? */
#define BP_VAR_FUNC_ARG   5  /* function argument */
#define BP_VAR_UNSET      6  /* unset */
  • The types BP_VAR_R and BP_VAR_IS are the most relevant here. Typically, this decides how chatty the implementation will be. Note that it's the has_property that is called when checking the existence of the property; BP_VAR_IS is used when retrieving a property to check whether it has a (sub-)dimension or (sub-)property, as in empty($rarF->prop[7][8]).
  • However, BP_VAR_W, BP_VAR_RW and BP_VAR_UNSET are also possible values if the get_property_ptr_ptr handler is undefined or fails. These types are used in write-like operations wherein a (sub-)dimension or (sub-)property of the the property value is being targeted (e.g. $obj->prop[32] = $h or unset($obj->prop[32]); the type BP_VAR_W may also appear when assigning or passing the property (or (sub-)property/dimension thereof) by reference. If these cases are to be supported, one should return either a reference (in the Z_ISREF sense) or a proxy object (see the get handler for more information), otherwise one should warn the user -- for instance, the default handler emits a warning if returning for write a zval that is not of type IS_OBJECT and is not referenced anywhere else because the write would necessarily have no effect (an object of type IS_OBJECT is permitted because it may be a proxy object).
  • The reference count of a returned zval which is not otherwise referenced by the extension or the engine's symbol table should be 0. Likewise, the reference count of a zval being returned that exists elsewhere should not be incremented by this handler (it might be, but only on the account of some side effect, for instance creating the property on the fly and storing it in a hash table for future retrieval, not owing to the call to the handler per se).
  • Note this handler itself (and the other ones) has no notion of accessibility.
  • Should return EG(uninitialized_zval_ptr) if the property is undefined.
  • Should not be NULL, even for classes that have no properties, though it's not strictly forbidden. An empty implementation can be:
zval *read_property_empty_implementation(zval *object, zval *member, int type TSRMLS_DC)
{
	/* maybe raise an error/exception here */
	return EG(uninitialized_zval_ptr);
}
  • NOTE (as of PHP 5.3.2): You may think the behavior of read_property is the same as that of read_dimension when get_property_ptr_ptr is undefined/fails. There are, however, subtle differences.
  1. For pre- and post-increments and -decrements on properties, there is a pair of read_property/write_property calls; the read_property has type BP_VAR_R. For the same operations on dimensions, there is only read_dimension call with BP_VAR_RW type; the operation may succeed only if a reference (Z_ISREF) or proxy object are returned. Note that compound assignments on dimensions (e.g. $obj['index'] += 1) are handled with a pair of operations, just like properties.
  2. If read_property returns a zval with refcount 1 not belonging to a reference set in the context of a write-like operation (see discussion of the type argument above), this zval will be turned into a reference. In the case of read_dimension, a notice would be emitted and a reference set with the left part of the assignment and the dimension would not be built. Note that this special “turn into ref” case would not work if the returned zval had a higher refcount. Consider the following implementations:
zval *z;
zval *read_property(zval *object, zval *offset, int type TSRMLS_DC)
{
    return z;
}
void write_property(zval *object, zval *offset, zval *value TSRMLS_DC)
{
    z = value;
    /* Z_SET_ISREF_P(z); -- if uncommented, both would work*/
    zval_add_ref(&value);
}

Then, these two scripts would have different results:

$obj['prop'] = "hhh";
$a = &$obj['prop'];
$a = "bbb";
echo $obj['prop']; //echoes "bbb"
$str = "hhh";
$obj['prop'] = $str;
$a = &$obj['prop'];
$a = "bbb";
echo $obj['prop']; //echoes "hhh"

write_property

void (*write_property)(zval *object, zval *member, zval *value TSRMLS_DC)
  • Writes the value of a property of an object; corresponds to $obj->prop in a writing context in userspace.
  • If the member argument is not a string, it should be (deep) copied and converted into one.
  • The calling function does not admit that the value will be accepted and stored. Hence, if the value is to be stored by the handler, its refcount should be incremented. No modifications to the value are allowed. If one is to modify the value before storing it, one must deep copy it before (i.e., include a call zval_copy_ctor). The only exception is if the value's refcount is 0 -- in that case, one may modify it at will and, if one wants to copy the value zval into another zval, one can make a shallow copy.
  • Separate zvals that are references (in the sense of Z_ISREF). Note that this handler is not meant to deal with reference assignments such as $obj->prop = &$var or $var = &$obj->prop, which are the correct ways of making property values part of a reference set (the engine will call get_property_ptr_ptr or, failing that, read_property).
  • Should not be NULL, even for classes that have no modifiable properties, though it's not strictly forbidden. An empty implementation will do nothing except maybe raise an error/exception.

read_dimension

zval *(*read_dimension)(zval *object, zval *offset, int type TSRMLS_DC)
  • This is similar to read_property, except it's called in response to attempts to treat the object as an array, as in $obj['key'] in (mainly) a reading context.
  • You may not modify (except transiently) the offset argument.
  • The argument offset can be a C NULL (when $obj[]) is used. Despite the name, offset may be of any type of zval -- if it is an object reference with a get handler, you may want to call it and use that result as an offset instead.
  • The remarks made in read_property with respect to the type argument also apply.
  • Since there's no analogous to get_property_ptr_ptr, you ought to return a reference (in the sense of Z_ISREF) or a proxy object (see the get handler) in write-like contexts (types BP_VAR_W, BP_VAR_RW and BP_VAR_UNSET), though it's not mandatory. If read_dimension is being called in a write-like context such as in $val =& $obj['prop'], and you return neither a reference nor an object, the engine emit a notice. Obviously, returning a reference is not enough for those operations to work correctly, it is necessary that modifying the returned zval actually has some effect. Note that assignments such as $obj['key'] = &$a are still not possible -- for that one would need the dimensions to actually be storable as zvals (which may or may not be the case) and two levels of indirection.
  • The remarks made relative to the refcount of the returned value in read_property also apply. Should return a C NULL in case the offset do not exist (the engine will then use error_zval or uninitialized_zval depending on whether it's a read or write context).
  • May be NULL when the object is not to be treated as an array.

write_dimension

void (*write_dimension)(zval *object, zval *offset, zval *value TSRMLS_DC)
  • This is similar to write_property, except it's called in response to attempts to treat the object as an array, as in $obj['key'] in a writing context.
  • You may not modify (except transiently) the offset argument.
  • The argument offset can be a C NULL (when $obj[]) is used. It can be any type of zval -- if it is an object reference, you may call its get handler and use the result as the offset instead.
  • The same remarks made in write_property apply -- should increment the refcount of value if storing it and should not change the value zval in most circumstances (a deep copy should be made first).
  • If a reference is passed, it should be separated. While you may think you may want to store a reference so that the value of the dimension may be changed indirectly (through another symbol), this is not the way (what you can do is $obj['key'] = $a; $a = &$obj['key'] -- the first assignment is handled by write_dimension the second by read_dimension returning a reference or a proxy object).
  • May be NULL when the object is not to be treated as an array.

get_property_ptr_ptr

zval **(*get_property_ptr_ptr)(zval *object, zval *member TSRMLS_DC)
  • Returns a property with double indirection so that the caller may directly replace the zval. This may be for efficiency reasons (a read/write pair of calls would otherwise be needed and would be unnecessary if the underlying storage of the properties are in fact zvals) or because the nature of the operation requires double indirection -- namely, send by reference and assign by reference of object properties.
  • If the member argument is not a string, it should be converted.
  • The rules about the refcount of the returned value given for the read_property handler also apply here, though it doesn't make much sense to return a pointer to memory that is not held by the extension or the engine here (the case with refcount 0).
  • If the property does not exist, the handler may try to create it on the fly. If one initializes these properties to the same value, one should consider using a single initialization zval, for instance EG(uninitialized_zval) -- allocate a zval*, give it the value of EG(uninitialized_zval_ptr), increment its refcount and return the address the created pointer (see the behavior of zend_std_get_property_ptr_ptr()).
  • Returning a C NULL signifies failure and causes a fallback to the read_property or write_property handlers.
  • Prefer an empty implementation (always returning a C NULL) to a NULL in the handler table. A NULL in the handler table ought to have the same effect, but there are bugs.

get

zval* (*get)(zval *object TSRMLS_DC)
  • This handler is called when attempting to treat the object as a scalar value in a read context. That situation arises when using the pre- and post-increment and -decrement operators and compound assignment operators (e.g. $obj++ and $obj += 6) -- it is then followed by a call to set -- and as a fallback for type conversions in case there is no cast_object handler (as a preferred method relatively to cast_object in a few circumstances such as when defining a constant or comparing objects to scalars).
  • A common application for get/set handlers is when implementing proxy objects. These are used when the underlying storage of the properties or dimensions are not zvals (e.g. when the PHP object is an interface to an object in another language). In that case, it would be impossible to have those properties/dimensions part of a reference set and operate in this manner:
$a = &$obj->prop;
$a++;
$a = 6;
  • Proxy objects make this possible. If one returns from read_property or read_dimension an IS_OBJECT zval with get and set handlers, the read in the post-increment would be handled by the get handler and the writes in the post-increment and the assignment would be handled by the set handler.
  • The remarks made in read_property about the refcount of the return value also apply.
  • One should not expect calls to get being followed by calls to set in the context of compound assignments and increments/decrements. In $obj['prop']++, read_dimension would be called with offset “prop” and type BP_VAR_R. Suppose it then returns a proxy element. The get handler of this proxy handler would be called in order to generate a zval; the zval would be separated if not a reference and incremented; then not set but instead write_dimension would be called so as to write the result.
  • Should not return NULL; if implemented must return a valid zval.
  • May be NULL, in that case the object cannot be treated as a scalar in the mentioned circumstances.

set

void (*set)(zval **object, zval *value TSRMLS_DC)
  • This handler is called when attempting to make a (non-reference) assignment to an object zval, including when using the pre- and post-increment and -decrement operators and compound assignment operators (e.g. $obj++ and $obj += 6) -- in this case, preceded by a call to get.
  • Note the double indirection on the object argument. The pointed zval may be changed or completely replaced by changing the value of *object. Remember to adjust the refcounts and consider whether the zval is part of a reference set.
  • A common application for get/set handlers is when implementing proxy objects. See get for more information.
  • The remarks made in write_property about the value argument also apply.
  • See also the description of get.
  • May be NULL, in that case the object cannot be treated as a scalar in the mentioned circumstances.

has_property

int (*has_property)(zval *object, zval *member, int has_set_exists TSRMLS_DC)
  • This handler is called whenever the engine needs to determine whether a property exists.
  • If the member argument is not a string, it should be (deep) copied and converted.
  • The parameter has_set_exists can take the following values:
    • 0 -- check whether the property exists and is not NULL; used by the isset operator
    • 1 -- check whether the property exists and is true; semantics of empty; one may want to use zend_is_true
    • 2 -- check whether the property exists, even if it is NULL; used by the property_exists function. Note that this parameter slightly differs from has_dimension's check_empty in that the latter cannot take the value 2.
  • An empty implementation ought not to emit an error/exception (or have any other side effects) even if the type does not admit properties and especially if has_set_exists is 2, so that property_exists can be quiet.
  • Read also the note on the usage of the BP_VAR_IS type for the read_property handler.
  • Should return either 0 (doesn't have the property) or 1 (has the property).
  • Should not be NULL, though it's not strictly forbidden by the engine.

unset_property

void (*unset_property)(zval *object, zval *member TSRMLS_DC)
  • Called in order to unset an object property.
  • If the member argument is not a string, it should be (deep) copied and converted.
  • If member refers to a property that does not exist, this function should fail silently (no notices!). However, if the object type does not support properties, an error/exception may be emitted.
  • Should not be NULL, though it's not strictly forbidden by the engine.

has_dimension

int (*has_dimension)(zval *object, zval *member, int check_empty TSRMLS_DC)
  • Determines whether an object has a certain dimension.
  • The argument check_empty has the same meaning as has_property's has_set_exists parameter, with the exception that it cannot take the value 2.
  • This handler should have no side effects.
  • Read also the note on the usage of the BP_VAR_IS type for the read_property handler.
  • Should return either 0 (doesn't have the dimension) or 1 (has the dimension).
  • May be NULL when the object is not to be treated as an array.

unset_dimension

void (*unset_dimension)(zval *object, zval *offset TSRMLS_DC)
  • Called in order to unset an object's dimension.
  • The argument offset may be of any type; if it is an object with a get handler, one may want to call it and use instead the result as the offset.
  • It should fail silently (without notices) in the case the offset refer to a dimension that does not exist.
  • May be NULL if the object is not to be treated as an array.

get_properties

HashTable *(*get_properties)(zval *object TSRMLS_DC)
  • Retrieves the object as a hash table. This is usually a hash table containing the object instance properties and some code may (incorrectly) use this hash table to retrieve object properties. This is function is used, even in preference to cast_object, in explicit conversions (i.e. convert_to_array family and convert_to_explicit_type, not convert_object_to_type) to convert an object into an array.
  • In practice, you may use this function return other data, for instance dimensions. Since several array functions (such as end, prev, next, reset, current, key, array_walk, array_walk_recursive and array_key_exists) call this handler when an object is passed, it can be used to provide a more array-like experience of the object (together read_dimension, write_dimension, has_dimension, unset_dimension and count_elements).
  • The garbage collector uses this handler to reach the zvals that the object is holding. If the hash table is lazily generated (on the first call to the handler) and it hasn't been built yet (it's the first call), it may be appropriate to refuse to do so (and return NULL) on calls by the garbage collector. This is especially true if such generation involves the creation of new zvals. The global GC_G(gc_active) tells whether the garbage collector is running.
  • The zend_parse_parameters has the specifiers H and A which accept both arrays and objects.
  • The Z_OBJPROP macro family are shortcuts to access this handler. They assume the handler exists! The Z_OBJDEBUG macro family fall back on this handler if get_debug_info doesn't exist.
  • If the underlying storage of the hash table values are in fact zvals, you may return a hash table that stores the same zval * values. Depending on how the hash table is then exposed in userspace (whether reference sets are separated), this may allow indirect modification of the underlying storage. If those zvals are stored in a hash table, you can go further and return the hash table itself -- this will generally still not allow replacement/addition/deletion of the hash table's values in user space (e.g. turning an object into an array requires the hash table to be copied), yet may be faster and allow internal code to replace/add/delete entries directly to the hash table.
  • The handler owns the hash table. Typically, this handler always returns the same hash table, which accompanies the life cycle of the object (is created when the object is created, etc.).
  • The Zend engine does not forbid it to be NULL, but several extensions (including the standard extension) assume it exists. The built-in function get_object_vars assumes a Zend standard object if it exists. Prefer an implementation that returns an empty hash table.
  • See also get_debug_info.

get_method

zend_function *(*get_method)(zval **object_ptr, char *method, int method_len TSRMLS_DC)
  • Called in order to fetch a method as zend_function.
  • Should return NULL if the method does not exist, otherwise should return a zend_function. The rules for who owns the return value are as follows:
    • If the type is ZEND_INTERNAL_FUNCTION, then it's owned by the caller if and only if the subfield common.fn_flags has the flag ZEND_ACC_CALL_VIA_HANDLER. However, note that if the caller then uses the return value to make a function call, it should not free it since it will already have been done.
    • If the type is ZEND_OVERLOADED_FUNCTION or ZEND_OVERLOADED_FUNCTION_TEMPORARY, then it's owned by the caller.
    • For all other cases, the caller is not responsible for freeing the return.
    • If the caller owns the result, and the type is not ZEND_OVERLOADED_FUNCTION, it should also free (with efree) the subfield common.function_name.
  • The argument object_ptr is given with double indirection. Altering *object_ptr allows one to change the this pointer passed to the method and the called scope into the class entry of the the written value. The refcount of the original value should be decreased, the new value's should be increased (or set one if created from scratch). (unconfirmed)
  • One ought to convert the method name to lowercase to mimic the usual (half-assed) case insensitiveness of method names. See zend_str_tolower_copy.
  • May be NULL if the object is not to support method calls.

call_method

int (*call_method)(char *method, INTERNAL_FUNCTION_PARAMETERS)
  • This method is called whenever the engine tries the call a function with type ZEND_OVERLOADED_FUNCTION or ZEND_OVERLOADED_FUNCTION_TEMPORARY. It's conceptually related to the __call magic method.
  • The method argument is a string that should identify the function to be called.
  • This method is unique in that the first parameter is not the an object zval pointer. A zval pointer is included in the INTERNAL_FUNCTION_PARAMETERS and can be retrieved with getThis().
  • May be NULL, but in that case you should not return zend_functions's of type ZEND_OVERLOADED_FUNCTION or ZEND_OVERLOADED_FUNCTION_TEMPORARY from get_method or get_constructor (and neither should get_closure if object_ptr is filled with object zvals that do not have this handler).

get_constructor

zend_function *(*get_constructor)(zval *object TSRMLS_DC)
  • This handler has the same semantics as get_method and is called to retrieve a function that is to perform initialization operations on the object.
  • May be NULL.

get_class_entry

zend_class_entry *(*get_class_entry)(const zval *object TSRMLS_DC)
  • Gives a pointer to a zend_class_entry. This structure provides a scope for object operations and defines PHP classes. The default handlers defer to this structure for much of their behavior; additionally, much functionality, such as reflection and the instaceof operator is restricted to PHP classes.
  • If implemented, all objects of the same class should have a get_class_entry handler returning the same value. Should not return NULL.
  • You may use the Z_OBJCE macro family for accessing the return of the handler. It resolves to zend_get_class_entry, which either returns NULL and emits a fatal error if the handler does not exist or returns the (non-null for correct implementations) result of the this handler.
  • The IS_ZEND_STD_OBJECT and HAS_CLASS_ENTRY macros (the last one should only be used if the zval is known to be an object reference) are a shortcut for determining whether a zval is of a type which has this handler implemented.
  • Will never be NULL for Zend standard objects (and derivations thereof) and will be NULL for all other objects (by definition).

get_class_name

int (*get_class_name)(const zval *object, char **class_name, zend_uint *class_name_len, int parent TSRMLS_DC)
  • Extracts a class name for display or reflection purposes. This name has no special meaning.
  • If parent is 0, the name of the class of the passed object is being requested, otherwise it's the parent class of the passed object. The handler may return FAILURE if there is no parent class or it doesn't know.
  • On success, *class_name should be set with a pointer to a null-terminated string allocated with non-persistent storage (emalloc) and *name_len should be set with the length of *class_name (excluding terminator).
  • Should return SUCCESS or FAILURE. If it fails, *class_name and *class_name_len should retain their original values or be set to NULL/0.
  • May be NULL. Note that, as of PHP 5.3.2, some portions of the standard extension expect the handler to exist and not fail when parent is 0.

compare_objects

int (*compare)(zval *object1, zval *object2 TSRMLS_DC)
  • Compares two objects. Used for the operators ==, !=, <, >, <= and >=.
  • The implementations should follow these rules -- for any objects a, b and c that share the same compare handler:
    1. compare(a, a) = 0
    2. sign(compare(a, b)) = -sign(compare(b, a)) where sign(x) is 1 if x is positive, -1 if it's negative and 0 if it's 0.
    3. if compare(a, b) = 0 and compare(b, c) = 0, then compare(a, c) = 0
  • This means one must implement a total order.
  • One may find an equivalent set of conditions on the documentation of Java's java.lang.Comparable.compareTo(T).
  • The handler may return only -1, 0 and 1 (a < b, a = b and a > b). If not, one is encouraged to implement the handler so that compare(a, b) > compare(a, c), then compare(b, c) < 0.
  • Should not be NULL; a possible simple implementation is just returning the result of an object handle subtraction.

cast_object

int (*cast)(zval *readobj, zval *retval, int type TSRMLS_DC)
  • Called when an object is to be converted into another type.
  • If not defined or if the call fails, the engine will use fallback strategies that include calling get, or using a number of default conversion strategies (the strategies used for the standard objects).
  • The readobj contains the object to be converted; it should not be modified in any way.
  • The handler may assume readobj and retval have different values.
  • The retval is an allocated zval on which the handler should write the result. It should first be initialized (INIT_ZVAL) ignoring its previous value.
  • In case of error, FAILURE should be returned. If the retval was already initialized and is holding further resources, it should be destroyed (as in zval_dtor) by the handler; if it was not initialized, it should be left untouched. In case of success, SUCCESS should be returned.
  • See also get and get_properties.
  • May be NULL.

count_elements

int (*count_elements)(zval *object, long *count TSRMLS_DC)
  • Called to determine the count of some countable object. A count is a non-negative value.
  • Objects that have array-like access will probably want to implement this, so that they can behave more like an array.
  • Note that this handler is not used by the engine itself, only by count and other extensions.
  • This handler writes a non-negative number in *count and returns SUCCESS if the passed object is countable; returns FAILURE otherwise.
  • May be NULL if the type does not support the notion of “countable”; the effect would be the same of having an implementation always returning FAILURE.

get_debug_info

HashTable *(*zend_object_get_debug_info_t)(zval *object, int *is_temp TSRMLS_DC)
  • Returns a hash table with arbitrary key/value pairs for debugging purposes.
  • The Z_OBJDEBUG macro is a shortcut to access this handler; it may be used if one knows the handler not to be NULL.
  • The is_temp argument cannot be NULL. The value 1 should be written in *is_temp if the returned hash table is owned by the caller (and hence the caller must destroy and free it with zend_hash_destroy and efree); otherwise 0 should be written.
  • Should not return NULL.
  • Avoid having this handler set to NULL; although the engine does not require its existence, the standard extension does (as of PHP 5.3.2).

get_closure

int (*get_closure)(zval *obj, zend_class_entry **ce_ptr, zend_function **fptr_ptr, zval **zobj_ptr TSRMLS_DC)
  • This handler allows the object to be used as a function.
  • The argument ce_ptr will not be NULL and should be filled with the scope of the function or NULL. The written value will be used as the called scope. The calling scope will be taken from the scope associated with the returned zend_function. Only under exceptional circumstances will it be used as a calling scope (maybe internal functions where the zend_function has no associated scope – unconfirmed).
  • The argument ftpr_ptr should be populated with the desired function. The caller is not responsible for freeing it, so the structure should accompany the life cycle of the class or object. It's outside the scope of this document to describe this structure, see for example zend_register_functions for how to create internal functions.
  • The argument zobj_ptr may be NULL; if it isn't, *zobj_ptr is to be filled with NULL or, in case the function is an instance method of a standard object stored in EG(objects_store), the object it refers to. One should not increment the refcount of that object only because one is passing it to the caller.
  • See also get_method.
  • Returns SUCCESS or FAILURE.
  • May be NULL.

The zend_class_entry

TODO

Default handlers

TODO

PHP internal class declaration

Let's assume all the internal functions and custom object handlers are written. A PHP class declaration can then be divided in these tasks (some may be omitted):

  • Definition of zend_function_entry array, which groups the internal functions that were defined.
  • Definition and initialization of the handlers table.
  • Initialization of the class entry.
  • Registration of the class.
  • Declaration of static and instance properties and constants.
  • Other tweaks of the class entry.

We'll cover this items and then address the question of how to properly define a class so that it can be extended in userspace.

zend_function_entry array

The zend_function_entry structure contains the name of the method, a pointer to the (native) function that implements it, arginfo (describing arginfo structures is out of the scope of this text), and some flags for the method. The array is traditionally declared as a static global variable. Its purpose is to group and qualify the functions so that they can be converted to zend_function structures.

The array is terminated with a zeroed structure. Several macros exist for declaring the zend_function_entry structures. The most important are:

PHP_ME(classname, name, arg_info, flags)
PHP_MALIAS(classname, name, alias, arg_info, flags)
PHP_NAMED_ME(zend_name, name, arg_info, flags)
PHP_ME_MAPPING(name, func_name, arg_types, flags)

The standard way to declare a method is to use PHP_ME. It takes, in this order:

  • The name of the class. This is an arbitrary name, not reflected in userspace, that is consistent with how the method's internal implementations were declared.
  • The name of the method. This is how the method will be called in userspace AND how it was declared with PHP_METHOD. If you need those two to differ, you should use PHP_MALIAS.
  • An arginfo structure.
  • A bitmask defining the accessibility of the method.

The macro PHP_ME can be used when the method implementation was declared in a standard way, i.e., with PHP_METHOD(classname, name).

The bitmask is built with the ZEND_ACC family of macros. Let's see the relevant part of the family. Some are used only for classes or properties, not methods:

#define ZEND_ACC_STATIC                     0x01     /* fn_flags, zend_property_info.flags */
#define ZEND_ACC_ABSTRACT                   0x02     /* fn_flags */
#define ZEND_ACC_FINAL                      0x04     /* fn_flags */
#define ZEND_ACC_IMPLEMENTED_ABSTRACT       0x08     /* fn_flags */
#define ZEND_ACC_IMPLICIT_ABSTRACT_CLASS    0x10     /* ce_flags */
#define ZEND_ACC_EXPLICIT_ABSTRACT_CLASS    0x20     /* ce_flags */
#define ZEND_ACC_FINAL_CLASS                0x40     /* ce_flags */
#define ZEND_ACC_INTERFACE                  0x80     /* ce_flags */
#define ZEND_ACC_INTERACTIVE                0x10     /* fn_flags */
#define ZEND_ACC_PUBLIC                     0x100    /* fn_flags, zend_property_info.flags */
#define ZEND_ACC_PROTECTED                  0x200    /* fn_flags, zend_property_info.flags */
#define ZEND_ACC_PRIVATE                    0x400    /* fn_flags, zend_property_info.flags */
#define ZEND_ACC_PPP_MASK \
    (ZEND_ACC_PUBLIC | ZEND_ACC_PROTECTED | ZEND_ACC_PRIVATE)
#define ZEND_ACC_CHANGED                    0x800    /* fn_flags, zend_property_info.flags */
#define ZEND_ACC_IMPLICIT_PUBLIC            0x1000   /* zend_property_info.flags; unused (1) */
#define ZEND_ACC_CTOR                       0x2000   /* fn_flags */
#define ZEND_ACC_DTOR                       0x4000   /* fn_flags */
#define ZEND_ACC_CLONE                      0x8000   /* fn_flags */
#define ZEND_ACC_ALLOW_STATIC               0x10000  /* fn_flags */
#define ZEND_ACC_SHADOW                     0x20000  /* fn_flags */
#define ZEND_ACC_DEPRECATED                 0x40000  /* fn_flags */
#define ZEND_ACC_CLOSURE                    0x100000 /* fn_flags */
#define ZEND_ACC_CALL_VIA_HANDLER           0x200000 /* fn_flags */
 
/* (1) ZEND_ACC_IMPLICIT_PUBLIC is unused since zend_do_declare_implicit_property is ifdef'd out */

These apply to methods:

  • ZEND_ACC_PUBLIC, ZEND_ACC_PROTECTED, ZEND_ACC_PRIVATE - exactly one of these these flags 'must' be included.
  • ZEND_ACC_STATIC, ZEND_ACC_ABSTRACT and ZEND_ACC_FINAL - define a method as static, abstract or final, respectively.
  • ZEND_ACC_ALLOW_STATIC - allows an instance method to be called statically; also allows an instance method to assume a $this from an incompatible context (see implementation for opcode INIT_STATIC_METHOD_CALL). New code ought not to set this flag.
  • ZEND_ACC_DEPRECATED - marks a method as deprecated.

These also apply to methods, but you needn't include them in your function entries:

  • ZEND_ACC_IMPLEMENTED_ABSTRACT - used if the method is declared abstract somewhere up the hierarchy. Despite the name, the method may have no implementation -- if an abstract subclass does not implement an abstract method from the superclass, the subclass copy of the method will have this flag set. Do not set this; it will be set automatically.
  • ZEND_ACC_CHANGED - used for methods of a subclass that had their visibility increased from protected to public when overridden. Do not set this; it will be set automatically.
  • ZEND_ACC_CALL_VIA_HANDLER is applied to zend_function structures that are generated on-the-fly in response to calls to __call, __callstatic, or by the get_method handler. This determines the memory freeing procedure. It also allows overriding pass-by-value semantics of functions with zend_call_function (used by call_user_func_array). See also get_method.
  • ZEND_ACC_CLONE marks a method as a clone method. This is automatically set for methods named __clone, but appears to have no effect at the engine level (the standard clone handler looks for a method called __clone, with no regard for this flag).
  • ZEND_ACC_CTOR, although commonly manually set in the arginfo, is set automatically for methods with the appropriate name (either old-style or new -style). Setting this manually a method that would not be selected as a constructor is an error.
  • ZEND_ACC_DTOR, is set automatically for methods with the name __destruct. Beyond that, it has no effect at the engine level.

The macro PHP_MALIAS(classname, name, alias, arg_info, flags) allows you to declare a method with a name and expose another name in user space, e.g.:

PHP_METHOD(myclass, mymethod) { ... }
...
PHP_MALIAS(myclass, myMethodInUserspace, mymethod, NULL, 0),
...

The macro PHP_NAMED_ME(zend_name, name, arg_info, flags) goes lower, you can specify the actual name of the C function you implemented, e.g.:

ZEND_NAMED_FUNCTION(my_arbitrary_name) { ... }
/* this resolves to my_arbitrary_name(INTERNAL_FUNCTION_PARAMETERS) { } */
...
PHP_NAMED_ME(my_arbitrary_name, myMethodInUserspace, NULL, 0),
...

Finally, PHP_ME_MAPPING(name, func_name, arg_types, flags) is usually to expose methods that also have a non-OOP interface, e.g.:

PHP_FUNCTION(ext_func) {
    zval *this; /* if not NULL, called as a method */
 
    /* Use zend_parse_method_parameters to parse parameters in double interface implementations */
    if (zend_parse_method_parameters(ZEND_NUM_ARGS() TSRMLS_CC, getThis(), "O"
            &this, ext_class_ce_ptr) == FAILURE) {
        return;
    }
 
    /* alternative with plain zend_parse_parameters */
    /* this = getThis();
     * if (this == NULL) {
     *     if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "O",
     *             &this, ext_class_ce_ptr) == FAILURE) {
     *         return;
     * }
     * else if (zend_parse_parameters_none() == FAILURE) {
     *     return;
     * }
     */
}
...
static zend_function_entry ext_functions[] = {
    PHP_FE(ext_func, NULL),
    ...
    {NULL, NULL, NULL, 0, 0}
}
...
static zend_function_entry ext_class_methods[] = {
    PHP_ME_MAPPING(myMethodName, ext_func,	NULL, 0),
    ...
    {NULL, NULL, NULL, 0, 0}
}

Namespaced names can be built using ZEND_NS_NAME(namespace, name). Don't prefix the namespace with \.

Setup the Handlers Table

When one is implementing internal PHP classes, it is almost always undesirable to replace all of the standard handlers. The usual procedure for overriding the handlers follows these steps:

  • Implement the handlers you wish to override (read this section to understand their semantics).
  • Define a global variable of type zend_object_handlers. Mostly likely, the handler table will need not be referred to from other compilation units, so it can have file scope.
  • On module startup, the standard handlers are copied into the defined zend_object_handlers variable.
  • The fields corresponding to the handlers that are to be overridden are written with pointers to the custom handlers that were implemented.

The handler table should not be initialized with initializer lists. Otherwise, when new handlers are added to the language, they will take the value NULL instead of the default value.

Example:

static zend_object_handlers myclass_handlers;
...
static HashTable *myclass_object_debug_info(zval *object, int *is_temp TSRMLS_DC)
{
    ...
}
static int myclass_object_compare_objects(zval *object1, zval *object2 TSRMLS_DC)
{
    ...
}
static zend_object_value myclass_object_clone(zval *object TSRMLS_DC)
{
    ...
}
...
ZEND_MODULE_STARTUP_D(myext)
{
    ...
    memcpy(&myclass_handlers, zend_get_std_object_handlers(),
        sizeof myclass_handlers);
 
    myclass_handlers.get_debug_info  = myclass_object_debug_info;
    myclass_handlers.compare_objects = myclass_object_compare_objects;
    myclass_handlers.clone_obj       = myclass_object_clone;
    ...
}

Initialization of the Class Entry

Before registering the class, it's necessary to define and initialize a class entry structure. This structure is temporary -- upon class registration, a new class entry structure is allocated. The initialization is done with one of these macros:

INIT_CLASS_ENTRY(class_container, class_name, functions)
INIT_NS_CLASS_ENTRY(class_container, ns, class_name, functions)
INIT_CLASS_ENTRY_EX(class_container, class_name, class_name_len, functions)
INIT_OVERLOADED_CLASS_ENTRY(class_container, class_name, functions, handle_fcall,
    handle_propget, handle_propset)
INIT_OVERLOADED_NS_CLASS_ENTRY(class_container, ns, class_name, functions,
    handle_fcall, handle_propget, handle_propset)
INIT_OVERLOADED_CLASS_ENTRY_EX(class_container, class_name, class_name_len,
    functions, handle_fcall, handle_propget, handle_propset, handle_propunset,
    handle_propisset)
INIT_OVERLOADED_NS_CLASS_ENTRY_EX(class_container, ns, class_name, functions,
    handle_fcall, handle_propget, handle_propset, handle_propunset,
    handle_propisset)

This is the meaning of the parameters:

  • class_container - The temporary zend_class_entry to initialize.
  • class_name - The class name to expose in userspace (a string).
  • functions - A zend_function_entry array terminated with an empty entry.
  • class_name_len - The length of class_name, excluding the terminator.
  • ns - The namespace of the class. Don't prefix it with \.
  • handle_fcall, handle_propget, handle_propset, handle_propunset and handle_propisset - these are zend_function pointers (or NULL) that can be used to populate the respective fields in the zend_class_entry structure.

Example:

static zend_function_entry myclass_functions[] = {
    PHP_ME(myclass, myMethod, NULL, 0),
    ...
    {NULL, NULL, NULL, 0, 0}
}
...
ZEND_MODULE_STARTUP_D(myext)
{
    zend_class_entry ce;
    ...
 
    INIT_CLASS_ENTRY(ce, "MyClass", myclass_functions);
    ...
}

Do not further modify the class entry. Other modifications should be made in the class entry returned upon class registration. In particular, setting the class flags (e.g. final) at this point will not work.

Class Registration

The registration step serves two purposes:

  1. Automates the definition of certain aspects of the class definition (the zend_class_entry structure)
  2. Exposes the class to userspace.

The following functions/macros are available:

/* Functions */
zend_class_entry *zend_register_internal_class(zend_class_entry *class_entry TSRMLS_DC)
zend_class_entry *zend_register_internal_class_ex(zend_class_entry *class_entry,
    zend_class_entry *parent_ce, char *parent_name TSRMLS_DC)
zend_class_entry *zend_register_internal_interface(
    zend_class_entry *orig_class_entry TSRMLS_DC)
int zend_register_class_alias_ex(const char *name, int name_len,
    zend_class_entry *ce TSRMLS_DC)
/* Macros (they expand to zend_register_class_alias_ex, so return an int) */
zend_register_class_alias(name, ce)
zend_register_ns_class_alias(ns, name, ce)

The functions/macros with “alias” in their name only expose the class to userspace; they do not change the class entry in any way. In general, these should be used only if the class entry pointed to by the argument was previously created with a zend_register_* function.

The function zend_register_internal_class_ex should be used when defining a subclass. If parent_ce is given, the corresponding class will be used as the parent. If it is NULL and parent_name is not NULL, the given superclass name will be resolved. If both are NULL, it will behave like zend_register_internal_class.

The zend_register_internal_* classes execute these steps:

  1. Allocate a new class entry structure.
  2. Copy (in a shallow fashion) the passed class entry into the allocated one.
  3. Set its type to ZEND_INTERNAL_CLASS.
  4. Initializes the new class entry structure by allocating and initializing its hash tables and resetting a few “scalar” fields (the magic methods set in the original class entry through INIT_OVERLOADED_CLASS_ENTRY_EX are not replaced).
  5. Set the flags of the class entry to none or to ZEND_ACC_INTERFACE (according to the function called).
  6. Convert the zend_function_entry structures into zend_function's of type ZEND_INTERNAL_FUNCTION. These functions are added to the call entry function table. If it finds methods that match the name of magic methods, the corresponding class entry fields are set.
  7. If a parent is given, execute operations related to inheritance, e.g. copying inherited functions from the parent.

After class registration, the original zend_class_entry variable should not be used anymore.

After registration, it's also possible to retrieve the zend_class_entry variable through the class name. This is done with zend_lookup_class:

int zend_lookup_class(const char *name, int name_length, zend_class_entry ***ce TSRMLS_DC);

Notice the triple (not double) indirection. In pratice, most extensions opt to use a global variable (and even export it for other extensions through the macro ZEND_API) so as to avoid the performance penalty associated with the function call/hash table lookup.

Properties and Constants

In the module startup, after the PHP class is registered, it is time to add constants and properties.

For constants, the Zend API exposes the following functions:

int zend_declare_class_constant(zend_class_entry *ce, const char *name,
    size_t name_length, zval *value TSRMLS_DC)
int zend_declare_class_constant_null(zend_class_entry *ce, const char *name,
    size_t name_length TSRMLS_DC)
int zend_declare_class_constant_long(zend_class_entry *ce, const char *name,
    size_t name_length, long value TSRMLS_DC)
int zend_declare_class_constant_bool(zend_class_entry *ce, const char *name,
    size_t name_length, zend_bool value TSRMLS_DC)
int zend_declare_class_constant_double(zend_class_entry *ce, const char *name,
    size_t name_length, double value TSRMLS_DC)
int zend_declare_class_constant_stringl(zend_class_entry *ce, const char *name,
    size_t name_length, const char *value, size_t value_length TSRMLS_DC)
int zend_declare_class_constant_string(zend_class_entry *ce, const char *name,
    size_t name_length, const char *value TSRMLS_DC)

These all return SUCCESS or FAILURE. They are all straightforward to use, with the exception of zend_declare_class_constant. The passed zval should be allocated with ALLOC_PERMANENT_ZVAL (and then initialized and the intended value set).

For properties, the following functions are available:

int zend_declare_property(zend_class_entry *ce, char *name, int name_length,
    zval *property, int access_type TSRMLS_DC)
int zend_declare_property_ex(zend_class_entry *ce, const char *name,
    int name_length, zval *property, int access_type, char *doc_comment,
    int doc_comment_len TSRMLS_DC)
int zend_declare_property_null(zend_class_entry *ce, char *name,
    int name_length, int access_type TSRMLS_DC)
int zend_declare_property_bool(zend_class_entry *ce, char *name,
    int name_length, long value, int access_type TSRMLS_DC);
int zend_declare_property_long(zend_class_entry *ce, char *name,
    int name_length, long value, int access_type TSRMLS_DC);
int zend_declare_property_double(zend_class_entry *ce, char *name,
    int name_length, double value, int access_type TSRMLS_DC);
int zend_declare_property_string(zend_class_entry *ce, char *name,
    int name_length, char *value, int access_type TSRMLS_DC);
int zend_declare_property_stringl(zend_class_entry *ce, char *name,
     int name_length, char *value, int value_len, int access_type TSRMLS_DC)

These are analogous to their zend_declare_class_constant* counterparts, with the following differences:

  • There is a zend_declare_property_ex that accepts a doc comment. The doc comment can be retrieved through reflection.
  • All functions access an access type.

The access type flags are taken from the ZEND_ACC_* family. See under the ''zend_function_entry'' array section.

These apply to properties and can be set by the extension programmer:

  • ZEND_ACC_STATIC - define a property as static.
  • ZEND_ACC_PUBLIC, ZEND_ACC_PROTECTED, ZEND_ACC_PRIVATE - only one of these these flags can be included. If none is included, it will default to ZEND_ACC_PUBLIC.

These are used internally and should not be passed to the functions above:

  • ZEND_ACC_CHANGED - set in instance properties duplicated in the subclass properties where the correspondent superclass property a) has ZEND_ACC_CHANGED, b) has ZEND_ACC_PRIVATE, c) has ZEND_ACC_SHADOW.
  • ZEND_ACC_SHADOW - set in instance properties copied from the superclass that are not duplicated in the subclass and which have ZEND_ACC_PRIVATE or ZEND_ACC_SHADOW.
  • ZEND_ACC_IMPLICIT_PUBLIC - formerly (?) used for properties implicitly public (e.g. dynamic properties, i.e., undeclared instance properties).

Note that zend_declare_property(_ex) also require a zval allocated with ALLOC_PERMANENT_ZVAL.

Note also that interfaces cannot have properties and access level cannot be decreased in subclasses.

Other class definition tweaks

The class entry structure can be changed in other ways after registration.

See also iterators and serialization callbacks.

Create object handler

Almost all internal classes will want to replace the class entry's create_object handler in order to be able to store arbitrary data in the object's data structure. See the section Data allocation and initialization for more on this.

Class flags

Class flags use the ZEND_ACC_* macro family. See under the ''zend_function_entry'' array section. At this point, the class may already have the flag ZEND_ACC_INTERFACE if you called zend_register_internal_interface.

These can be set after class registration:

  • ZEND_ACC_FINAL_CLASS, ZEND_ACC_EXPLICIT_ABSTRACT_CLASS - define a class as final or static. It's unnecessary to explicitly set ZEND_ACC_EXPLICIT_ABSTRACT_CLASS if the class has (or inherits) abstract methods.
  • ZEND_ACC_PUBLIC, ZEND_ACC_PROTECTED, ZEND_ACC_PRIVATE - only one of these these flags can be included. If none is included, it will default to ZEND_ACC_PUBLIC.

These are automatically set by the engine and should not be set by the programmer:

  • ZEND_ACC_IMPLICIT_ABSTRACT_CLASS - set automatically for classes that have abstract methods. Interfaces may have it too. Internal functions are also automatically given ZEND_ACC_ABSTRACT_CLASS whenever an abstract method is found.
  • ZEND_ACC_CLOSURE - used internally for objects that are closures.
  • ZEND_ACC_IMPLEMENT_INTERFACES - the class implements one or more interfaces. See below.

Implement interfaces

A class may declare it implements one or more interfaces by calling the function:

void zend_class_implements(zend_class_entry *class_entry TSRMLS_DC,
    int num_interfaces, ...)

The ellipses represents one or more zend_class_entry * variables that point to the class entries of the interfaces to be implemented.

Designing subclassable classes

Designing internal classes so that they can be extended on userspace is simple. The subclass will have the same class entry create_object handler, not the default one which sets the standard object handlers and allocates a plain zend_object. Therefore, if the internal class has a different handler table or its storage is a different data structure, that will not be a problem.

The only thing about which one must be careful is constructors. Subclasses may define a new constructor that does not call the parent constructor. If the internal class relies on the constructor to set a consistent internal state, it can be changed in the following alternative ways:

  • Moving the necessary initialization to the create_object class entry handler.
  • Overriding the get_constructor handler. It could, for example, be modified to always return a function that does the necessary initializations, calls the default get_constructor handler (zend_std_get_constructor) and then executes the returned constructor, if any.

Often, the internal constructor requires several arguments to be passed. The constructor for the subclass may be defined so that it takes less or different arguments. This is clearly a problem that cannot be handled by the first approach. The second one can at least fail if not enough arguments are given or bad arguments are given, but even that's not a very good idea, because the arguments, even being apparently correct, may have different semantics. In sum, if the construction requires arguments, there is no good solution except requiring the super constructor to be called. This can be accomplished this way:

static zend_object_handlers object_handlers;
static zend_class_entry *ce_ptr;
static zend_function constr_wrapper_fun;
 
typedef struct test_object {
    zend_object std;
    zend_bool constructed; /* TestClass constructor was called? */
    /* more properties follow */
    ...
} test_object;
 
static zend_object_value ce_create_object(zend_class_entry *class_type TSRMLS_DC)
{
    zend_object_value zov;
    test_object       *tobj;
 
    tobj = emalloc(sizeof *tobj);
    zend_object_std_init((zend_object *) tobj, class_type TSRMLS_CC);
    tobj->constructed = 0;
 
#if PHP_VERSION_ID < 50399
    zend_hash_copy(tobj->std.properties, &(class_type->default_properties),
        (copy_ctor_func_t) zval_add_ref, NULL, sizeof(zval*));
#else
    object_properties_init(&tobj->std, class_type);
#endif
 
    zov.handle = zend_objects_store_put(tobj,
        (zend_objects_store_dtor_t) zend_objects_destroy_object,
        (zend_objects_free_object_storage_t) zend_objects_free_object_storage,
        NULL TSRMLS_CC);
    zov.handlers = &object_handlers;
    return zov;
}
 
PHP_METHOD(testclass, __construct)
{
    zval *this = getThis();
 
    test_object *tobj = zend_object_store_get_object(this TSRMLS_CC);
    assert(tobj != NULL);
 
    tobj->constructed = (zend_bool) 1;
 
    ...
}
 
static zend_function *get_constructor(zval *object TSRMLS_DC)
{
    /* Could always return constr_wrapper_fun, but it's uncessary to call the
     * wrapper if instantiating the superclass */
    if (Z_OBJCE_P(object) == ce_ptr)
        return zend_get_std_object_handlers()->
            get_constructor(object TSRMLS_CC);
    else
        return &constr_wrapper_fun;
}
 
static void construction_wrapper(INTERNAL_FUNCTION_PARAMETERS) {
    zval *this = getThis();
    test_object *tobj;
    zend_class_entry *this_ce;
    zend_function *zf;
    zend_fcall_info fci = {0};
    zend_fcall_info_cache fci_cache = {0};
    zval *retval_ptr = NULL;
    unsigned i;
 
    tobj = zend_object_store_get_object(this TSRMLS_CC);
    zf = zend_get_std_object_handlers()->get_constructor(this TSRMLS_CC);
    this_ce = Z_OBJCE_P(this);
 
    fci.size = sizeof(fci);
    fci.function_table = &this_ce->function_table;
    fci.object_ptr = this;
    /* fci.function_name = ; not necessary to bother */
    fci.retval_ptr_ptr = &retval_ptr;
    fci.param_count = ZEND_NUM_ARGS();
    fci.params = emalloc(fci.param_count * sizeof *fci.params);
    /* Or use _zend_get_parameters_array_ex instead of loop: */
    for (i = 0; i < fci.param_count; i++) {
        fci.params[i] = (zval **) (zend_vm_stack_top(TSRMLS_C) - 1 -
            (fci.param_count - i));
    }
    fci.object_ptr = this;
    fci.no_separation = 0;
 
    fci_cache.initialized = 1;
    fci_cache.called_scope = EG(current_execute_data)->called_scope;
    fci_cache.calling_scope = EG(current_execute_data)->current_scope;
    fci_cache.function_handler = zf;
    fci_cache.object_ptr = this;
 
    zend_call_function(&fci, &fci_cache TSRMLS_CC);
    if (!EG(exception) && tobj->constructed == 0)
        zend_throw_exception(NULL, "parent::__construct() must be called in "
            "the constructor.", 0 TSRMLS_CC);
    efree(fci.params);
    zval_ptr_dtor(&retval_ptr);
}
 
static zend_function_entry ext_class_methods[] = {
    PHP_ME(testclass, __construct, 0, ZEND_ACC_PUBLIC)
    ...
    {NULL, NULL, NULL, 0, 0}
}
 
ZEND_MODULE_STARTUP_D(testext)
{
    zend_class_entry ce;
 
    memcpy(&object_handlers, zend_get_std_object_handlers(),
        sizeof object_handlers);
    object_handlers.get_constructor = get_constructor;
 
    INIT_CLASS_ENTRY(ce, "TestClass", ext_class_methods);
    ce_ptr = zend_register_internal_class(&ce TSRMLS_CC);
    ce_ptr->create_object = ce_create_object;
 
    constr_wrapper_fun.type = ZEND_INTERNAL_FUNCTION;
    constr_wrapper_fun.common.function_name = "internal_construction_wrapper";
    constr_wrapper_fun.common.scope = ce_ptr;
    constr_wrapper_fun.common.fn_flags = ZEND_ACC_PROTECTED;
    constr_wrapper_fun.common.prototype = NULL;
    constr_wrapper_fun.common.required_num_args = 0;
    constr_wrapper_fun.common.arg_info = NULL;
#if PHP_VERSION_ID < 50399
    /* moved to common.fn_flags with rev 303381 */
    constr_wrapper_fun.common.pass_rest_by_reference = 0;
    constr_wrapper_fun.common.return_reference = 0;
#endif
    constr_wrapper_fun.internal_function.handler = construction_wrapper;
    constr_wrapper_fun.internal_function.module = EG(current_module);
 
    return SUCCESS;
}

Another option is to check on every internal method call whether the native structure has been properly initialized by the native constructor. Since most instance methods will need to fetch the object, this is a good opportunity to do the check. For instance, the cairo extension does this:

static inline cairo_surface_object* cairo_surface_object_get(zval *zobj TSRMLS_DC)
{
    cairo_surface_object *pobj = zend_object_store_get_object(zobj TSRMLS_CC);
    if (pobj->surface == NULL) {
        php_error(E_ERROR, "Internal surface object missing in %s wrapper, you must call parent::__construct in extended classes", Z_OBJCE_P(zobj)->name);
    }
    return pobj;
}

This has two disadvantages relatively to the previous method:

  1. It defers the check until an instance method is called, instead of immediately when the problem occurs (when the user-land constructor doesn't call the parent native constructor).
  2. The check is made on every method call, instead of only once.

However, this is by far a more popular approach, since it's simple and portable -- it uses only stable parts of the API.

Finally, another option, certainly less complex but more limiting, is to make the superclass constructor final.

Iterators

TODO

Serialization callbacks

TODO

Object creation and destruction

Object creation involves these steps:

  1. Allocate and initialize the underlying data structure
  2. Store the object
  3. Build a reference to the object
  4. (optional) Call the constructor

Calling the constructor is uncommon internally because there are easier ways to initialize the object (calling a zend_function is verbose). The initialization steps that are common to all the objects of a given type can be done in step 1. The initialization of a particular instance (which e.g. depends on some other data, the kind of data that would be passed to a constructor) can be done in a separate auxiliary C function. Every time an object is instantiated internally, the programmer must also call this function to do instance-specific initialization. A constructor is still necessary to properly support the new operator, but this strategy does not imply duplication of code -- the internal implementation of the constructor may rely on the same auxiliary function.

Data allocation and initialization

In general, this part is completely domain dependent. The programmer may allocate and initialize an object however he wants.

However, zend standard objects (those with a class entry) rely on the class entry's create_object handler. Typically, these have a data structure whose pointer can be passed to functions that expect zend_object*. Hence, the typical class entry create_object handler will look like test_create_object in the example below:

typedef struct test_object {
    zend_object std;
    /* more properties follow */
    ...
} test_object;
 
static zend_object_handlers object_handlers;
 
static zend_object_value test_create_object(zend_class_entry *class_type TSRMLS_DC)
{
    zend_object_value zov;
    test_object       *tobj;
 
    tobj = emalloc(sizeof *tobj);
    zend_object_std_init((zend_object *) tobj, class_type TSRMLS_CC);
 
#if PHP_VERSION_ID < 50399
    zend_hash_copy(tobj->std.properties, &(class_type->default_properties),
        (copy_ctor_func_t) zval_add_ref, NULL, sizeof(zval*));
#else
    object_properties_init((zend_object*)tobj, class_type);
#endif
 
    /* The destroy and free callbacks should be replaced if necessary */
    zov.handle = zend_objects_store_put(tobj,
        (zend_objects_store_dtor_t) zend_objects_destroy_object,
        (zend_objects_free_object_storage_t) zend_objects_free_object_storage,
        NULL TSRMLS_CC);
 
    /* other specific stuff */
    ...
 
    zov.handlers = &object_handlers;
    return zov;
}
 
ZEND_MODULE_STARTUP_D(testext)
{
    zend_class_entry ce;
    zend_class_entry *ce_ptr;
 
    memcpy(&object_handlers, zend_get_std_object_handlers(),
        sizeof object_handlers);
    /* replace necessary handlers */
    ...
 
    INIT_CLASS_ENTRY(ce, "TestClass", ext_class_methods);
    ce_ptr = zend_register_internal_class(&ce TSRMLS_CC);
    ce_ptr->create_object = ce_create_object;
 
    /* Other startup stuff */
    ...
 
    return SUCCESS;
}

The create_object handler can also be NULL, in which case the general operations listed in `test_create_object` are executed except a vanilla zend_object structure is initialized (instead of a test_object).

Object storage

Objects are accessed through their references, the only thing linking the references to object instances is a integer (the object handle). This handle is a key that allows access to the object data structure. How this is done depends entirely on the type of the object.

Of particular relevance, are, of course, zend standard objects. These are stored in the 'objects store'. The zend objects API exposes these functions:

/* Storage */
typedef void (*zend_objects_store_dtor_t)(void *object,
    zend_object_handle handle TSRMLS_DC);
typedef void (*zend_objects_free_object_storage_t)(void *object TSRMLS_DC);
typedef void (*zend_objects_store_clone_t)(void *object,
    void **object_clone TSRMLS_DC);
zend_object_handle zend_objects_store_put(void *object,
    zend_objects_store_dtor_t dtor, zend_objects_free_object_storage_t storage,
    zend_objects_store_clone_t clone TSRMLS_DC);
 
/* Retrieval */
void *zend_object_store_get_object(const zval *object TSRMLS_DC);
void *zend_object_store_get_object_by_handle(zend_object_handle handle TSRMLS_DC);
 
/* refcount related */
void zend_objects_store_add_ref(zval *object TSRMLS_DC);
void zend_objects_store_del_ref(zval *object TSRMLS_DC);
void zend_objects_store_add_ref_by_handle(zend_object_handle handle TSRMLS_DC);
oid zend_objects_store_del_ref_by_handle_ex(zend_object_handle handle,
    const zend_object_handlers *handlers TSRMLS_DC);
void zend_objects_store_del_ref_by_handle(zend_object_handle handle TSRMLS_DC);
zend_uint zend_objects_store_get_refcount(zval *object TSRMLS_DC);
 
/* Misc */
zend_object_value zend_objects_store_clone_obj(zval *object TSRMLS_DC);
/* zend_object_store_set_object:
 * It is ONLY valid to call this function from within the constructor of an
 * overloaded object.  Its purpose is to set the object pointer for the object
 * when you can't possibly know its value until you have parsed the arguments
 * from the constructor function.  You MUST NOT use this function for any other
 * weird games, or call it at any other time after the object is constructed.
 * (This is rarely used)
 */
void zend_object_store_set_object(zval *zobject, void *object TSRMLS_DC);
 
/* Called when the constructor was terminated by an exception. Prevents the
 * "destroy object" store callback from being called */
void zend_object_store_ctor_failed(zval *zobject TSRMLS_DC);
 
/* Used to destroy all the objects in the store */
void zend_objects_store_free_object_storage(zend_objects_store *objects TSRMLS_DC);

The objects store can actually store any type of data structures; the data structure doesn't have to be an extension of zend_object. The header file zend_objects.h provides some functions to deal exclusively with zend standard objects:

/* To be used in the create_object class entry handler to initialize the
 * zend_object structure */
void zend_object_std_init(zend_object *object, zend_class_entry *ce TSRMLS_DC);
 
/* Despite the name, this is actually related to object freeing. It frees all
 * the memory used by the inner structures of zend_object */
void zend_object_std_dtor(zend_object *object TSRMLS_DC);
 
/* The default implementation of the create_object handler */
zend_object_value zend_objects_new(zend_object **object,
    zend_class_entry *class_type TSRMLS_DC);
 
/* The default implementation of the free object store callback. Calls
 * the PHP destructor, if any. */
void zend_objects_destroy_object(zend_object *object,
    zend_object_handle handle TSRMLS_DC);
 
/* Alias of zend_object_store_get_object, except it returns a zend_object
 * pointer instead of void* */
zend_object *zend_objects_get_address(const zval *object TSRMLS_DC);
 
/* Copies the properties of the old_object and calls the class entry
 * clone handler. Used in the implementation of zend_objects_clone_obj
 * In PHP > 5.3, it also initializes the properties before. */
void zend_objects_clone_members(zend_object *new_object,
    zend_object_value new_obj_val, zend_object *old_object,
    zend_object_handle handle TSRMLS_DC);
 
/* Allocates a new object with zend_objects_new and clones the members.
 * It's the default implementation of the clone object handler. The fact
 * it uses zend_objects_new means you almost certainly will want to replace
 * the clone object handler when implementing internal classes. */
zend_object_value zend_objects_clone_obj(zval *object TSRMLS_DC);
 
/* default implementation of the free storage store callback.
 * Calls zend_object_std_dtor and then frees the object itself */
void zend_objects_free_object_storage(zend_object *object TSRMLS_DC);

The function zend_objects_store_put adds an object to the store. This is the function that must be called during the creation of the object, as exemplified in the listing of the section before. All of the three last arguments may be NULL.

  • 'Destructor': If NULL, zend_objects_destroy_object is used instead, which calls the PHP destructor, if any. This is called prior to the “free storage” callback when destroying the object. Cleanup of the memory allocated to the object data structures is left to the “free storage” callback. This callback is not called if the object construction failed. If passing a custom store destructor callback, calling the PHP destructor can be delegated to zend_objects_destroy_object.
  • 'Free storage': Used to free the object data structures. For vanilla zend objects, this should be zend_objects_free_object_storage; if extending zend standard objects, in the custom callback one should delegate to zend_objects_free_object_storage the cleanup of the zend_object field and of the outer object data structure (hence, the call to zend_objects_free_object_storage should be the last thing). There's no default if NULL is specified.
  • 'Clone': Most likely, this should be NULL. One should only use this callback if implementing objects without class entries and using zend_objects_store_clone_obj as a clone_obj handler. Then, that function will call this callback, which should allocate a new object, use the passed double indirection pointer to store a pointer to it, and clone the passed object into this new one.

After the call to zend_objects_store_put, the object will have reference count = 1 in the store.

Object reference creation

If we're creating a zend standard object, the create_object handler already returned a zend_object_value. The creation of an object reference zval is handled automatically by the new operator.

To instantiate new objects internally, the following macros are available:

int object_init(zval *arg);
int object_init_ex(zval *arg, zend_class_entry *ce);
 
/* This function requires 'properties' to contain all props declared in the
 * class and all props being public. If only a subset is given or the class
 * has protected members then you need to merge the properties separately by
 * calling zend_merge_properties(). */
int object_and_properties_init(zval *arg, zend_class_entry *ce,
     HashTable *properties);
 
/* This function should be called after the constructor has been called
 * because it may call __set from the uninitialized object otherwise. */
void zend_merge_properties(zval *obj, HashTable *properties,
    int destroy_ht TSRMLS_DC);

These all take an allocated and initialized (INIT_ZVAL) or partially initialized (INIT_PZVAL) zval pointer. object_init is not particularly useful, since it will instantiate a stdClass object. object_and_properties_init also allows efficient initialization of the object properties, but it has the limitations indicated in the comments. If all instances are initialized with the same property values, the default property levels, defined when the class is registered, should be used instead.

Flow of the construction/destruction process

This is an overview of the process of object construction for zend standard object derivatives.

For internal instantiations:

  1. Allocate and (partially) initialize a zval *.
  2. Call object_init_ex. A pointer to the class entry should be available.
    1. Call the class entry's create_object handler.
      1. Allocate the object structure. This structure's first field should be a zend_object.
      2. Call zend_object_std_init to initialize the zend_object part of the object data.
      3. Copy the default properties from class entry into the properties hash table of the new object. In PHP >= 5.3.99, object_properties_init should be called instead because non-dynamic properties are stored in C arrays instead of the properties hash table (though the hash table is still used when it's requested or when there are dynamic properties).
      4. Call zend_objects_store_put, passing a custom “destroy object” callback which does cleanup specific to properly constructed objects and a custom “destroy object” callback that frees all the memory and other resources taken by the object (which is always called).
      5. Assign the return value of zend_objects_store_put to the zend_object_value that is to be returned.
      6. Set the field handlers of zend_object_value that's to be returned to the appropriate object handlers table.
    2. Set the zval type to IS_OBJECT and the value to that returned by the create_object handler.
  3. Do post-creation initialization on the new objected (the construction phase), typically through an auxiliary function.

For instantions with new:

  1. Call the class entry's create_object handler.
    1. (see above)
    2. ...
  2. Call the PHP constructor, if any. Typically, the internal implementation of the constructor delegates the construction task to the same auxiliary function referred to in the last step of the list before.

Stored objects should not be destroyed explicitly; in fact, the store API doesn't even expose a function to destroy a particular object. Instead, the destruction should be managed through the refcount. When the reference count hits 0, the store will call the object “destruct” store handler (if the object construction didn't fail) and the “free object” handler and remove the entry from its table. See also the add_ref and add_ref handlers.

internals/engine/objects.1286506307.txt.gz · Last modified: 2017/09/22 13:28 (external edit)