rfc:pecl_http

This is an old revision of the document!


PHP RFC: Add pecl_http to core

Introduction

About

What is pecl_http?

pecl_http is an extension that aims to provide a convenient and powerful set of functionality for one of PHP’s major applications.

It eases handling of HTTP urls, headers and messages, provides means for negotiation of a client's preferred content type, language and charset, as well as a convenient way to send any arbitrary data with caching and resuming capabilities.

It provides powerful request functionality with support for parallel requests and an event loop library.

Why pecl_http?

pecl_http was first created more than ten years ago. Back in that time I was a PEAR guy and maintained a set of HTTP related packages. I ever wondered so much why PHP does not support HTTP in a more sophisticated manner and thought I’ve got to change that.

Back then there wasn’t much more than a few PEAR libraries like HTTP_Request, HTTP_Download and HTTP_Cache. What followed was a journey through PHP4 and 5, OOP and procedural programming compatible APIs with good reception from the users, but bad critics by architects and evangelists. v2 tried to settle on a more concise and modern set of OOP APIs. I still get threats from users because of this. Development was slow and it took about three years until 2.0 stable was released in late 2013.

Ever so often there were requests to bundle pecl_http with the source distribution but until now I never felt like I want to do that nor that the code was ready for such a move. This time has come, though. The only real concern I still have is the burden it creates on all core developers when moving such a big amount of code from the hands of one into the hands of a few.

Code base size

I have to admit, that the amount of code that comes with pecl_http can be called big, but that is mostly only due to core PHP lacking in many if not all areas considered important to more sophisticated processing of HTTP. Everything works out of the box to certain extent with the tools already in core unless you have to dig deeper. pecl_http tries to give you the tools you need, out of the box, instantly available for more differentiated situations, which have become more not less in the last ten years.

Miscellaneous

Usage numbers

Frankly, I don’t know. PECL stats show about 50k source package downloads per month in average, whatever that has to say. It’s placed in the top 10 where good old APC still has the lead.

Test coverage

The current test suite provides a code coverage of about 90% and is subject to improvement.
Coverage resport of v2: http://dev.iworks.at/ext-http/lcov/http/index.html

There’s one test that currently fails for me due to porting the extension to ZE3, because of reference mismatch and a leak in zend_assign_to_variable_ref() as cause.

Licensing

All of the affected code was licensed under 2-clause BSD and will be re-licensed to PHP-3.01 license.

FIG and PSR-7

There are no plans to follow or adhere to the Framework Interoperability Group’s “PHP Standard Recommendation” #7.

The Guts

A fully merged tree can be inspected here:
https://github.com/m6w6/php-src/tree/merge-http

Documentation

The current docs are available here:
http://devel-m6w6.rhcloud.com/mdref/http

The documentation presumably has to be converted to docbook to be included in php.net, though, at php.net/http currently reside the docs for pecl_http v1. As far as I know, the docs team did not come to a conclusion how to handle that situation.

I had hoped that in the meantime the php-docs approach/investigation of using some sort of markdown would have been successful, because the current documentation is written in a simple style of markdown, but anyway, conversion should not be too hard, I even had a volunteer in December, but did not hear back from him.

Markdown sources of the documentation can be found here:
https://github.com/m6w6/mdref-http

Dependencies

libz AKA zlib

  • Type: of dep: hard build dep
  • Minimum version: 1.2.0.4
  • Provided functionality: encoding and decoding of message bodies with zlib/gzip/deflate encodings
  • Current state: essential

libidn

  • Type of dep: soft build dep
  • Provided functionality: IDNA support for http\Url
  • Current state: feature completive, might look into ICU as alternative

libcurl

  • Type of dep: soft build dep
  • Minimum version: 7.18.2
  • Provided functionality: HTTP request support
  • Current state: feature completive; might look into additional alternatives, but nothing has proven competitive enough yet

libevent(2)

  • Type of dep: soft build dep
  • Provided functionality: multi_socket API from libcurl
  • Current state: nice to have - must have for more parallel requests than select() can reasonably handle; might look into additional alternatives like libuv (libev already has libevent compatibility)

pecl/propro

  • Type of dep: hard build dep
  • Provided functionality: byref access to object properties representing C struct members
  • Current state: feature completive, suggested to be merged to ext/standard or main, please discuss
Property Proxy API

ZE2 doxygen reference can be found here:
http://php.github.io/pecl-php-propro/php__propro_8h.html

There’s been similar functionality in ZE2, but it was dysfunctional to my findings back then and has obviously been removed in ZE3.

Internals thread from 2010:
http://marc.info/?t=128351193900002&r=1&w=2

When a property is requested byref (BP_VAR_RW) from an object that stores state in a member of the objects C struct, and that object implementation uses its own property handlers, it can use a property proxy to enable that kind of access to that state.

This is accomplished by returning an instance of the property proxy object instead of the property directly. The proxy has its set and get handlers overridden and does deferred and cascaded fetch/push of the original member of the container (object) to update the state.

Every extension maintaining state outside of real object properties (e.g. dom) can make use of this functionality and does not have to emit the error “Properties of XYClass can not be accessed by ref or array key/index”.

Actual implementation for http\Message properties:
https://github.com/php/pecl-http-pecl_http/blob/phpng/php_http_message.c#L854-L879

static zval *php_http_message_object_read_prop(zval *object, zval *member, int type, void **cache_slot, zval *tmp)
{
	zval *return_value;
	zend_string *member_name = zval_get_string(member);
	php_http_message_object_prophandler_t *handler = php_http_message_object_get_prophandler(member_name);
 
	if (!handler || type == BP_VAR_R || type == BP_VAR_IS) {
		return_value = zend_get_std_object_handlers()->read_property(object, member, type, cache_slot, tmp);
 
		if (handler) {
			php_http_message_object_t *obj = PHP_HTTP_OBJ(NULL, object);
 
			PHP_HTTP_MESSAGE_OBJECT_INIT(obj);
			handler->read(obj, tmp);
 
			zval_ptr_dtor(return_value);
			ZVAL_COPY_VALUE(return_value, tmp);
		}
	} else {
		return_value = php_property_proxy_zval(object, member_name);
	}
 
	zend_string_release(member_name);
 
	return return_value;
}

pecl/raphf

  • Type of dep: hard build dep
  • Provided functionality: unified API for managing (persistent) resource handles
  • Current state: feature completive, suggested to be merged to main, please discuss
Resource And Persistent Handle Factory API

ZE2 doxygen reference can be found here:
http://php.github.io/pecl-php-raphf/php__raphf_8h.html

I once said that raphf provides a similar set of functionality like zend_list, but unfortunately this is a bit misleading, so let’s look at the differences first:

zend_list manages refcounted zend_resources with opaque handles and custom destructors for persistent and non-persistent handles. Persistent handles are returned to the caller as is.

raphf does not need refcount support, because we’re working with objects instead of resources, and so should you probably, too. Instead a kind of copy constructor is optionally supported for object cloning.

Here’s the actual implementation of curl_easy and curl_multi ctor/copy/dtor handlers for raphf:
https://github.com/php/pecl-http-pecl_http/blob/phpng/php_http_client_curl.c#L99-L180

typedef struct php_http_curle_storage {
	char *url;
	char *cookiestore;
	CURLcode errorcode;
	char errorbuffer[0x100];
} php_http_curle_storage_t;
 
static inline php_http_curle_storage_t *php_http_curle_get_storage(CURL *ch) {
	php_http_curle_storage_t *st = NULL;
 
	curl_easy_getinfo(ch, CURLINFO_PRIVATE, &st);
 
	if (!st) {
		st = pecalloc(1, sizeof(*st), 1);
		curl_easy_setopt(ch, CURLOPT_PRIVATE, st);
		curl_easy_setopt(ch, CURLOPT_ERRORBUFFER, st->errorbuffer);
	}
 
	return st;
}
 
static void *php_http_curle_ctor(void *opaque, void *init_arg)
{
	void *ch;
 
	if ((ch = curl_easy_init())) {
		php_http_curle_get_storage(ch);
		return ch;
	}
	return NULL;
}
 
static void *php_http_curle_copy(void *opaque, void *handle)
{
	void *ch;
 
	if ((ch = curl_easy_duphandle(handle))) {
		curl_easy_reset(ch);
		php_http_curle_get_storage(ch);
		return ch;
	}
	return NULL;
}
 
static void php_http_curle_dtor(void *opaque, void *handle)
{
	php_http_curle_storage_t *st = php_http_curle_get_storage(handle);
 
	curl_easy_cleanup(handle);
 
	if (st) {
		if (st->url) {
			pefree(st->url, 1);
		}
		if (st->cookiestore) {
			pefree(st->cookiestore, 1);
		}
		pefree(st, 1);
	}
}
 
static php_resource_factory_ops_t php_http_curle_resource_factory_ops = {
	php_http_curle_ctor,
	php_http_curle_copy,
	php_http_curle_dtor
};
 
static void *php_http_curlm_ctor(void *opaque, void *init_arg)
{
	return curl_multi_init();
}
 
static void php_http_curlm_dtor(void *opaque, void *handle)
{
	curl_multi_cleanup(handle);
}
 
static php_resource_factory_ops_t php_http_curlm_resource_factory_ops = {
	php_http_curlm_ctor,
	NULL,
	php_http_curlm_dtor
};

A resource factory can be created from this ops and be used directly, or can be transparently wrapped by the persistent handle ops to support process/thread wide persistency:

Actual implementation of creating the resource/persistent handle factory for http\Client\Request of the curl driver:
https://github.com/php/pecl-http-pecl_http/blob/phpng/php_http_client_curl.c#L1788-L1816

static php_resource_factory_t *create_rf(php_http_url_t *url)
{
	php_persistent_handle_factory_t *pf;
	php_resource_factory_t *rf = NULL;
	zend_string *id;
	char *id_str = NULL;
	size_t id_len;
 
	if (!url || (!url->host && !url->path)) {
		php_error_docref(NULL, E_WARNING, "Cannot request empty URL");
		return NULL;
	}
 
	id_len = spprintf(&id_str, 0, "%s:%d", STR_PTR(url->host), url->port ? url->port : 80);
	id = php_http_cs2zs(id_str, id_len);
 
	pf = php_persistent_handle_concede(NULL, PHP_HTTP_G->client.curl.driver.request_name, id, NULL, NULL);
	zend_string_release(id);
 
	if (pf) {
		rf = php_resource_factory_init(NULL, php_persistent_handle_get_resource_factory_ops(), pf, (void (*)(void*)) php_persistent_handle_abandon);
	} else {
		rf = php_resource_factory_init(NULL, &php_http_curle_resource_factory_ops, NULL, NULL);
	}
 
	zend_string_release(id);
 
	return rf;
}

For php_persistent_handle_concede() to succeed, a provider has to be registered at MINIT:

	if (SUCCESS != php_persistent_handle_provide(PHP_HTTP_G->client.curl.driver.client_name, &php_http_curlm_resource_factory_ops, NULL, NULL)) {
		return FAILURE;
	}
	if (SUCCESS != php_persistent_handle_provide(PHP_HTTP_G->client.curl.driver.request_name, &php_http_curle_resource_factory_ops, NULL, NULL)) {
		return FAILURE;
	}

php_persistent_handle_concede() would also take pointers to a wakeup and a retire function for persistent handles, so that e.g. network sockets or database handles can be prepared for the idle time or be checked that they are still valid when requested.

This is the last notable difference from zend_list and is not needed by curl, but here’s an example of wakeup and retire functions located in pecl/pq, a PostgreSQL Client:
https://github.com/php/pecl-database-pq/blob/master/src/php_pqconn.c#L563-L651

static void php_pqconn_wakeup(php_persistent_handle_factory_t *f, void **handle TSRMLS_DC)
{
	PGresult *res = PQexec(*handle, "");
	PHP_PQclear(res);
 
	if (CONNECTION_OK != PQstatus(*handle)) {
		PQreset(*handle);
	}
}
 
static void php_pqconn_retire(php_persistent_handle_factory_t *f, void **handle TSRMLS_DC)
{
	php_pqconn_event_data_t *evdata = PQinstanceData(*handle, php_pqconn_event);
	PGcancel *cancel;
	PGresult *res;
 
	/* go away */
	PQsetInstanceData(*handle, php_pqconn_event, NULL);
 
	/* ignore notices */
	PQsetNoticeReceiver(*handle, php_pqconn_notice_ignore, NULL);
 
	/* cancel async queries */
	if (PQisBusy(*handle) && (cancel = PQgetCancel(*handle))) {
		char err[256] = {0};
 
		PQcancel(cancel, err, sizeof(err));
		PQfreeCancel(cancel);
	}
	/* clean up async results */
	while ((res = PQgetResult(*handle))) {
		PHP_PQclear(res);
	}
 
	/* clean up transaction & session */
	switch (PQtransactionStatus(*handle)) {
	case PQTRANS_IDLE:
		res = PQexec(*handle, "RESET ALL");
		break;
	default:
		res = PQexec(*handle, "ROLLBACK; RESET ALL");
		break;
	}
 
	if (res) {
		PHP_PQclear(res);
	}
 
	if (evdata) {
		/* clean up notify listeners */
		zend_hash_apply_with_arguments(&evdata->obj->intern->listeners TSRMLS_CC, apply_unlisten, 1, evdata->obj);
 
		/* release instance data */
		efree(evdata);
	}
}

Any extension providing (networked) services should be able to take advantage of raphf unless it needs to expose resources to userland.

raphf INI setting

There’s a global INI setting (SYSTEM), persistent_handle_limit (defaults to -1, unlimited) of debatable usefulness.

Features

C API

Most of the features are directly accessible through pecl_http's C API.

Globals

Nothing in the global namespace, except the namespace Http.

Client

The HTTP stream wrapper is of limited functionality, but may work for most simple needs.

Better support for more complicated applications like different authentication schemes, proxy types, encodings, SSL/TLS layers and what not is a desirable out of the box functionality.

All of that would actually be available by ext/curl but the existing libcurl binding is in an subpar maintenance state and suffers from its own quirks. Also, to my great surprise, there are only about five people enjoying the libcurl API, or what is available from it in PHP.

Currently only libcurl is implemented as a provider for http\Client, providing most of the functionality of most-current libcurl. This should be a good bet, because libcurl is mature and ubiquitously available.This does not mean that we may not implement our own “PHP” driver for http\Client.

http\Client supports sending parallel requests, optionally driven by an event loop library like libev{,ent}.

Encoding

Actually ext/zlib supports all the same three encodings since I fixed it a few years ago, so there could be the occasion for few shared code lines. What it definitely lacks, though, are incremental encoders/decoders not depending on php_streams in form of stream filters.

AFAIK there no accessible implementation of chunked encoding in core.

Env

http\Env provides negotiation of content type, character set and language which is a feature set often asked for, nothing comparable exists in core.

Most of what the environmental/server request constitutes is represented by superglobals and any request body by php:input, which is absolutely no problem, especially since php:input does not point to an allocated string in the SAPI globals (even possibly allocated twice).

http\Env\Request provides a central access point for all of that data and makes it safe to change/mock it without actually changing the original environment.

http\Env\Response provides features for sending responses beyond header(), ob_start() and readfile() with support for ranges/resuming, caching and throttling.

Message

Message parser and tools. http\Message is the base class of all request and response classes. Note that a “parent message” denotes any message appearing before a message in a stream of messages.

Also message bodies with support for building and (basic) parsing of multipart bodies, utilizing a (temporary) stream for memory efficiency.

Splitting a multipart body creates a chain of http\Message objects.

Header parser and tools.

I'm really not sure what case I should make about a header and message parser implementation in an HTTP package.

Cookie and Set-Cookie headers come in a special format and are ubiquitous, thus they deserve a discrete parser.

One could argue that there is related functionality in core, namely php_default_treat_data(), but it is inaccessible from userland and cumbersome to call internally.

Params

Header params parser; think of a content-type or an accept header. Negotiation, cookie and query parser build on it.

QueryString

Query string parser and tools. Actually builds on http\Params.

parse_str() suffers from its legacy/heritage as it replaces f.e. dots with underlines.

Url

URL parser and tools with UTF-8, locale multibyte and IDNA support (need to check if, and how much it diverges from IRIs). See RFC3987 and RFC3988.

I'm not sure what recommendation parse_url() follows, if any.

Unaffected PHP Functionality

The http: stream wrapper is unaffected by pecl_http.

Vote

50%+1 combined “Yes” votes needed for acceptance.

Changelog

  • 1.1
    • Added “Feature corner stones”
  • 1.2
    • Improvements to “Introduction” and “Proposal”
  • 1.3
    • Added another link to the current docs
  • 1.4
    • Added link to fully merged tree
    • Expand voting options
  • 2.0
    • Complete rewrite
  • 2.1
    • Expanded feature section
  • 2.2
    • Removed optional dependencies on all three extensions (json, iconv, hash), and the one INI entry related to it
  • 2.3
    • Removed http\Env RINIT section
    • Changed namespace from http to Http
    • Fixed some wordings and list formattings
rfc/pecl_http.1423312966.txt.gz · Last modified: 2017/09/22 13:28 (external edit)