====== PHP RFC: Improve unserialize() error handling ======
* Version: 1.4
* Date: 2022-09-01
* Author: Tim Düsterhus, timwolla@php.net
* Status: Implemented (Partly Accepted)
* Target Version: PHP 8.3
* Implementation: https://github.com/php/php-src/pull/9629
* First Published at: http://wiki.php.net/rfc/improve_unserialize_error_handling
===== Introduction =====
PHP's current error reporting in unserialize() is very inconsistent, making it hard to reliably handle errors that occur during unserialization.
Depending on how exactly the input string is malformed, PHP will emit an E_NOTICE, an E_WARNING, or throw an arbitrary \Exception or \Error (or even a combination of the 3 options):
getMessage(), PHP_EOL; // Incomplete or ill-typed serialization data
}
try {
unserialize(
'a:2:{i:12345678901234567890;O:19:"SplDoublyLinkedList":3:{i:0;i:0;i:1;N;i:2;a:0:{}}}'
); // Warning: unserialize(): Numerical result out of range in /tmp/php-src/test.php on line 17
// Notice: unserialize(): Unexpected end of serialized data in /tmp/php-src/test.php on line 17
// Notice: unserialize(): Error at offset 83 of 84 bytes in /tmp/php-src/test.php on line 17
} catch (UnexpectedValueException $e) {
echo $e->getMessage(), PHP_EOL; // Incomplete or ill-typed serialization data
}
?>
To reliably handle these currently, the following code is required:
try {
set_error_handler(static function ($severity, $message, $file, $line) {
throw new \ErrorException($message, 0, $severity, $file, $line);
});
$result = unserialize($serialized);
} catch (\Throwable $e) {
// Unserialization failed. Catch block optional if the error should not be handled.
} finally {
restore_error_handler();
}
var_dump($result); // Do something with the $result. Must not appear
// in the 'try' to not 'catch' unrelated errors.
In fact adding a custom error handler before calling unserialize() is the example used in the documentation page for restore_error_handler() (https://www.php.net/manual/en/function.restore-error-handler.php).
===== Proposal =====
The proposed solution to the described problem consists of two parts:
==== Add wrapper Exception 'UnserializationFailedException' ====
A new \UnserializationFailedException will be added. Whenever a \Throwable is thrown during unserialize (e.g. within an __unserialize() handler or because a throwing error handler converts E_NOTICE/E_WARNING into an Exception), this \Throwable will be wrapped in a new instance of \UnserializationFailedException.
This enables a developer to use a single catch(\UnserializationFailedException $e) to:
- Catch all possible \Throwables happening during unserialization, reliably handling all possible unserialization error cases.
- Reliably include more than just the call unserialize() in the try, as unrelated \Throwables will not be caught. This can improve readability, because the try-catch to handle unserialization failures does not need to be mixed with the regular “happy path” logic. It also allows to handle multiple calls to unserialize() with a single try-catch, without accidentally suppressing \Throwables thrown by unrelated logic in-between the calls to unserialize().
The original \Throwable will be accessible through the $previous property of \UnserializationFailedException, allowing the developer to learn about the actual cause of the unserialization failure by inspecting ->getPrevious().
Translated into simplified PHP code, the unserialize() implementation could look roughly like this with the proposal implemented:
function unserialize(string $data, array $options = []): mixed
{
try {
// The existing unserialization logic happens here.
} catch (\Throwable $e) {
throw new \UnserializationFailedException(previous: $e);
}
}
The implementation of the \UnserializationFailedException looks like this:
/**
* @strict-properties
*/
class UnserializationFailedException extends \Exception
{
}
==== Increase the error reporting severity in the unserialize() parser ====
Apart from unserialize handlers (e.g. __unserialize()) throwing, unserialization can also fail because of a syntax error in the input string, out-of-range integers and similar issues.
Currently these cases will not throw. Instead an E_NOTICE or an E_WARNING (or both!) are emitted depending on the type of error.
The severity of these should be unified, because the cases that emit an E_NOTICE are not any more or less actionable by the developer than the cases that emit E_WARNING.
Specifically the severity should be increased, because unserialization failing outside of an object's unserialize handler commonly implies that an fundamentally unsafe operation is performed:
- Storing or transmitting the output of serialize() with a medium that is not binary-safe and truncates at NUL bytes. This will break the unserialization of payloads that include classes that contain protected or private properties.
- Storing the output of serialize() in a storage that will transcode between charsets (e.g. a MySQL TEXT column and setting different connection charsets). This will break the unserialization of payloads that include non-ASCII strings, because the byte length might change.
- Passing an untrusted input string into unserialize(). Apart from allowing an attacker to emit arbitrary errors, this might also allow for remote code execution unless allowed_classes is configured to something safe.
All of these cases have in common that the application will no longer be operating on the original data that itself serialized. In the best case unserialization will outright fail, thus returning false from unserialize(), in the worst case the information will be silently modified (e.g. when attempting to unserialize 64 Bit integers with a 32 Bit binary).
At the very least the E_NOTICE cases should be adjusted to emit an E_WARNING to consistently emit an E_WARNING in all cases.
However changing both the existing E_NOTICE and the existing E_WARNING to throw the new \UnserializationFailedException might be a better solution. For applications that use a throwing error handler, the \Throwable thrown by the error handler for the E_NOTICE or E_WARNING will be handled by the mechanisms described in the previous section and thus will be wrapped into \UnserializationFailedException.
In other words: This would only affect applications that do not use a throwing error handler.
For these applications without a throwing error handler throwing an \UnserializationFailedException might be a steep increase in severity, however these applications already need to be prepared to handle \Throwables thrown from an unserialize handler. The facts that an unserialization failure commonly indicates something inherently unsafe and that unserialize() currently might return bogus data without completely failing (e.g. for out-of-range integers) also warrants that the operation fails “loudly”, instead of silently continuing. Furthermore consistently throwing the proposed \UnserializationFailedException allows the developer to take advantage of the benefits of the unified error handling as described in the previous section, even when they are not yet ready to use a throwing error handler.
==== Resulting code ====
The example in the introduction can be simplified to:
try {
$result = unserialize($serialized);
var_dump($result); // Do something with the $result. Can appear in the 'try' right
// beside the unserialize(), because the 'catch' block is specific
// to unserialization and will not catch anything unrelated.
} catch (\UnserializationFailureException $e) {
// unserialization failed.
}
… if both the \UnserializationFailureException is introduced and the severity is raised to \UnserializationFailureException from E_WARNING/E_NOTICE.
===== Backward Incompatible Changes =====
==== Addition of a new exception class ====
The UnserializationFailedException will no longer be available. The query symbol:UnserializationFailedException language:php
in GitHub's Code Search did not return any non-namespaced results, making this a theoretical issue.
==== Existing error handling for unserialize() might need to be updated ====
This is not considered an issue, because due to the inconsistency in error handling, the only safe solution to handle all possible errors during unserialize() is to catch(\Throwable). This is documented in the [[https://www.php.net/manual/en/function.unserialize.php|manual page for unserialize()]] ("Objects may throw Throwables in their unserialization handlers") and this continues to work as-is.
Every other solution is already broken in the face of untrusted input data, because (internal) classes can and will throw various types of Exception and Error. Examples of unsafe code includes:
* @unserialize() (will fail if a class throws).
* catch(Exception) (will fail for ext/date which throws Errors, will fail for readonly classes that remove properties in a later version).
* catch(Error) (will fail for various extensions which throw Exception or UnexpectedValueException).
Nothing will change for unserializing trusted, well-formed input strings, because unserialization will not fail for them by definition.
===== Proposed PHP Version(s) =====
Next PHP 8.x.
===== RFC Impact =====
==== To SAPIs ====
None.
==== To Existing Extensions ====
Existing extensions may update their error handling in the unserialize handlers to throw \Exceptions that best describe the type of error in the serialized data without needing to concern themselves with compatibility, because anything that is thrown will automatically be wrapped in the unified \UnserializationFailedException.
If an extension uses unserialize() internally then the same backwards compatibility concerns as with userland code applies.
Extensions replacing the serializer by a custom serializer (e.g. igbinary) may need to be adjusted to ensure their behavior is consistent with the behavior of the default serializer.
==== To Opcache ====
None.
==== New Constants ====
None.
==== php.ini Defaults ====
None.
===== Open Issues =====
None.
===== Unaffected PHP Functionality =====
Anything that does not interact with unserialize(). Developers who interact with __unserialize(), __wakeup(), and Serializable::unserialize() might be affected depending on what exactly they are doing with those.
===== Future Scope =====
* A new throw_for_unknown_classes option to throw instead of unserializing to __PHP_Incomplete_Class: https://externals.io/message/118566#118613
* A unserialize_callback_func option as replacement for the ini setting with the same name: https://externals.io/message/118566#118672
* Make trailing data emit an error: https://github.com/php/php-src/pull/9630
===== Proposed Voting Choices =====
==== Add wrapper Exception 'UnserializationFailedException' ====
2/3 required to implement \UnserializationFailedException that wraps any other \Throwable.
* Yes
* No
==== Increase the error reporting severity in the unserialize() parser ====
=== Increase E_NOTICE to E_WARNING ===
2/3 required to increase the severity of the emitted notices to warning.
* Yes
* No
=== Throw Exception instead of emitting warning / notice ===
Independently of whether the E_NOTICE will be increased to E_WARNING (the previous vote), the following vote will decide whether unserialize() will throw \UnserializationFailedException for all types of error/warning/notice in PHP 9.0. The vote will only be valid if the first vote ("Add the \UnserializationFailedException") passes, as it depends on that.
2/3 required to make E_* an \UnserializationFailedException in 9.0.
* Yes
* No
===== Patches and Tests =====
PoC:
* Wrap Exceptions: https://github.com/php/php-src/pull/9425
* Notice to warning: https://github.com/php/php-src/pull/9629
===== Implementation =====
https://github.com/php/php-src/pull/9629
===== References =====
* https://externals.io/message/118311
* Wrap Exceptions: https://github.com/php/php-src/pull/9425
* Notice to warning: https://github.com/php/php-src/pull/9629
===== Rejected Features =====
n/a
===== Changelog =====
* 1.4: Drop secondary vote.
* 1.3: Ranked Choice Voting.
* 1.2: The third vote now is a three-way vote.
* 1.1: Add two options to "Future Scope".