rfc:rfc1867-non-post

This is an old revision of the document!


PHP RFC: RFC1867 for non-POST HTTP verbs

Proposal

If you need a refresher on RFC1867, there's a section below.

RFC1867 defines the multipart/form-data content type. This content type is used primarily for submitting HTTP forms that contain files. PHP supports the parsing of this content type natively, but only for POST requests. Specifically, if a request has the method POST and the content type multipart/form-data the request body is immediately consumed before starting the PHP script and populated into the $_POST and $_FILES superglobals. This functionality triggers automatically and is not exposed to userland directly.

With the emergence of REST it has become increasingly popular to use other HTTP verbs such as PUT and PATCH where the usage of multipart/form-data is entirely valid but not processed by PHP. This requires manual parsing of the request body of this non-trivial format. Handling large amounts of data in userland may also be suboptimal in terms of performance. The corresponding issue has been upvoted over 350 times. https://bugs.php.net/bug.php?id=55815

This RFC suggests adding a new function request_parse_body() to expose the existing functionality to userland so that it may be used for other HTTP verbs.

/**
 * @param resource<stream>|null $input_stream
 * @param array<string, int>|null $options
 */
function request_parse_body($input_stream = null, ?string $content_type = null, ?array $options = null): array {}
 
// This is a PUT request
var_dump($_POST);  // []
var_dump($_FILES); // []
 
[$_POST, $_FILES] = request_parse_body();
 
var_dump($_POST);  // [...]
var_dump($_FILES); // [...]

The function returns an array pair with index 0 in the shape of $_POST and index 1 in the shape of $_FILES. If desired, the return values can overwrite the $_POST and $_FILES superglobals. This makes it simple to reuse existing POST endpoints.

The function also accepts an optional stream as input. This is useful for applications like RoadRunner that use the CLI SAPI for webserver workers that process multiple requests per execution. If the input stream is not provided, the function will obtain its input directly from sapi_module.read_post(). If the input stream is set, the content type must also be set, for the following reasons:

  • The content type is needed to infer the desired parser, as request_parse_body() also supports application/x-www-form-urlencoded for consistency with the automatically invoked parsing.
  • If the content type is multipart/form-data, the parser requires a boundary, which is embedded in the Content-Type header.

The function accepts a stream rather than a string because the input may be arbitrarily large (i.e. too large to hold in memory at once).

The function accepts an array of options to override global INI settings. See the “$options parameter” section for more details.

Sanitization

Like the automatically invoked multipart parsing, request_parse_body() will fail under various conditions.

  • If the content size exceeds post_max_size.
  • If Content-Type is missing the boundary attribute.
  • If the number of multipart parts exceeds max_multipart_body_parts.
  • If the number of “POST” (non-file) inputs exceeds max_input_vars.
  • If the number of files exceeds max_file_uploads.
  • If an input is missing both the name and filename attribute.

All of these conditions emit warnings for the automatically invoked multipart parsing. However, with a dedicated function we can benefit from exceptions to make parsing errors unmissable. A new Exception class RequestParseBodyException is created.

class RequestParseBodyException extends Exception {}

It's worth noting that the request_parse_body() function is not idempotent (when reading directly from the SAPI), even if the function throws. It is thus not safe to call request_parse_body() twice for the same request without providing an input stream. This is explained in more detail in the php://input section below.

Supported content types

Apart from multipart/form-data which is the primary motivation for this RFC, request_parse_body() also supports the application/x-www-form-urlencoded format. For application/x-www-form-urlencoded the $_FILES-equivalent array is empty. If the content type is not supported, an InvalidArgumentException is thrown.

php://input

Usually, the requests content is accessible through the php://input stream. This stream buffers the content of the request so that it may be read multiple times. For POST requests, the entire content is read and buffered before control is handed over to the PHP script. For non-POST verbs the content remains unread until the PHP script does so. As the input stream is read it is buffered on the fly.

The singular exception to this buffering mechanism is multipart/form-data for which the input stream is empty. The reasoning is most likely that multipart requests should not need to read the input stream again, since the parsed result is available in $_POST and $_FILES. Buffering the input for these requests essentially means that all files are written to disk twice, doubling the load on the disk in terms of time and space.

For the same reason, request_parse_body() does not buffer to php://input. This also means that request_parse_body() may not be called twice for the same request without providing an input stream, as it destructively consumes sapi_module.read_post().

If you really need this behavior, you may pass php://input to request_parse_body() which will buffer it on the fly.

$options parameter

A dedicated function presents the opportunity to customize parsing limits based on endpoints rather than globally. For example, your website may have a public and a login-protected multipart form. Increasing post_max_size, upload_max_filesize or similar settings globally may increase the risk for DoS attacks. As such, it may be preferable to increase these limits only for specific endpoints.

request_parse_body() accepts a $options parameter to set override the following INI values:

  • max_file_uploads
  • max_input_vars
  • max_multipart_body_parts
  • post_max_size
  • upload_max_filesize
#[Route('/api/videos', methods: ['PUT'])]
public function index(): Response {
    [$post, $files] = request_parse_body(options: [
        'post_max_size' => '128M',
    ]);
 
    // ...
}

This is particularly useful for long-running processes that may call request_parse_body multiple times and as such would otherwise need to restore the old INI values.

#[Route('/api/videos', methods: ['PUT'])]
public function index(): Response {
    $previousValue = ini_get('post_max_size');
    ini_set('post_max_size', '128M');
    try {
        [$post, $files] = request_parse_body();
    } finally {
        ini_set('post_max_size', $previousValue);
    }
 
    // ...
}

Providing invalid keys or values will throw a ValueError.

Why not parse the content automatically?

One could argue that since POST automatically triggers the parsing of the application/x-www-form-urlencoded and multipart/form-data requests the same should be done for PUT, PATCH and other verbs. There are two primary reasons not to do that.

The first one is backwards compatibility. At least for multipart, the request body is consumed without buffering. Existing code that manually parses multipart will break as the input stream will be empty.

The second reason is that a separate function provides more flexibility. An endpoint that does not accept multipart can terminate early, instead of parsing the request, potentially storing large files, erroring, and then deleting the buffered files again. Moreover, a separate function allows the parsing mechanism to be reused for RoadRunner and similar services as explained above.

If you'd like to make use of these benefits for POST, you may disable the enable_post_data_reading ini-setting and then call request_parse_body() from your application.

Backwards incompatible changes

Other than reserving request_parse_body() in the global namespace there are no backwards incompatible changes.

RFC1867 refresher

RFC1867 defines the multipart/form-data content type. This content type is used primarily for submitting HTTP forms that contain files. It is similar to application/x-www-form-urlencoded in that it contains a list of key-value pairs for each form input. Each input may contain attributes, as well as the content of the input. Each of the inputs are separated by a boundary which is an arbitrary string sequence not used in any of the input content sections. The boundary is specified in the Content-Type header, so that the client knows how to split the sections. For files, the original filename and content type are passed as attributes. Here's a simple example of what this might look like.

POST / HTTP/1.1
Host: localhost:9000
Content-Type: multipart/form-data; boundary=---------------------------84000087610663814162942123332

-----------------------------84000087610663814162942123332
Content-Disposition: form-data; name="post_field"

post content
-----------------------------84000087610663814162942123332
Content-Disposition: form-data; name="file_field"; filename="original_filename.txt"
Content-Type: text/plain

file content
-----------------------------84000087610663814162942123332--

The resulting $_POST and $_FILES superglobals may look like this:

var_dump($_POST);
array(1) {
  ["post_field"]=>
  string(9) "post data"
}
var_dump($_FILES);
array(1) {
  ["file_field"]=>
  array(6) {
    ["name"]=>
    string(21) "original_filename.txt"
    ["full_path"]=>
    string(21) "original_filename.txt"
    ["type"]=>
    string(10) "text/plain"
    ["tmp_name"]=>
    string(%d) "/tmp/sometmpfilename"
    ["error"]=>
    int(0)
    ["size"]=>
    int(12)
  }
}
echo file_get_contents($_FILES['file_name']['tmp_name']);
// file content

RFC1867 requests are automatically parsed when the request has the POST HTTP verb. Each non-file input is populated to the $_POST superglobal. For files, the content is stored in a temporary file and an entry is created in $_FILES to provide its metadata, along with a path to the temporary file. At the end of the request, any uploaded files that were not moved by the application get cleaned up. This avoids attacks that attempt to fill the servers disk space.

Future scope

Removing files

PHP automatically removes uploaded files at the end of the request to avoid DoS attacks that attempt to fill disk storage. For users of request_parse_body() with an input stream, it is expected that the process may handle multiple requests and thus multiple calls to this function. Waiting until process termination may accumulate many temporary files. It may be desired to add a function that cleans up temporary files that may be called on demand.

The same could be achieved in userland by inspecting the returned $_FILES-equivalent array, and thus a separate function might not be necessary. This would be slightly inconsistent with the current cleanup mechanism which tracks uploaded files independent of the $_FILES superglobal.

Vote

Voting starts 2023-xx-xx and ends 2023-xx-xx.

As this is a language change, a 2/3 majority is required.

Introduce request_parse_body() in PHP 8.x?
Real name Yes No
Final result: 0 0
This poll has been closed.
rfc/rfc1867-non-post.1699964202.txt.gz · Last modified: 2023/11/14 12:16 by ilutov