rfc:unserialize_warn_on_trailing_data

PHP RFC: Make unserialize() emit a warning for trailing bytes

Introduction

PHP silently accepts trailing bytes after a valid serialization payload when unserializing. This is undesirable.

Once PHP’s unserialization parser finds the trailing delimiter of the serialized value (`;` for scalars and `}` for arrays and objects), parsing terminates, the serialized value is returned, and any additional input bytes are silently ignored.

As serialize returns a valid serialization payload without trailing bytes, having trailing bytes within the input to unserialize implies that the serialization payload was erroneously modified after-the-fact. Silently accepting such serialization payload therefore is likely to mask (serious) issues within the application’s handling of (serialized) data and is therefore undesirable.

One such issue would be overwriting existing serialized data with a shorter serialization payload, but without properly truncating the existing data to the shorter length. The newly written payload will properly unserialize, because it is well-formed and because trailing bytes are silently ignored. However the old data will still partly be visible within the storage location, possibly exposing sensitive information that was meant to be deleted.

The recent CVE-2023-21036 (“aCropalypse”) is an example of this issue. When cropping screenshots on Google Pixel smartphones, the original screenshot file was overwritten in-place, without truncating the file to the smaller size of the cropped screenshot. For affected screenshots it is possible to restore (parts) of the original pixel data from the trailing bytes, exposing the information that is intended to be cropped.

According to a comment on the GitHub PR implementing this RFC, the “igbinary” serializer already rejects serialization data with trailing bytes. Such far there was only one issue report, where the warning exposed an issue within the application’s file handling (missing 'b' flag on fopen). While “igbinary” is likely used much more rarely compared to PHP’s native serializer, this indicates that warning for trailing bytes exposes real issues, which is a good thing.

Silently accepting trailing bytes can also be confusing to the human reader, when needing to debug issues with serialized data. The human reader might wonder why some information that is part of the serialized payload does not appear within the unserialized return value.

Furthermore emitting a warning for trailing bytes makes it easier to extend the serialization format in the future when giving the trailing bytes a meaning. Existing serialized data with trailing bytes might either change its meaning or fail to unserialize. By emitting a warning for trailing bytes, the developer would be able to detect the issue before it actually becomes an issue.

Proposal

unserialize() shall emit a new E_WARNING whenever the input string contains additional bytes once the unserialization parser terminates after successfully parsing a value. In other words: A warning shall be emitted if bytes can be removed from the end of the input string without changing the return value of unserialize() .

Examples

<?php
 
var_dump(unserialize('i:5;i:6;'));
// Warning: unserialize(): Extra data starting at offset 4 of 8 bytes in %s on line %d
// int(5)
 
var_dump(unserialize('N;i:6;'));
// Warning: unserialize(): Extra data starting at offset 2 of 6 bytes in %s on line %d
// NULL
 
var_dump(unserialize('b:1;i:6;'));
// Warning: unserialize(): Extra data starting at offset 4 of 8 bytes in %s on line %d
// bool(true)
 
var_dump(unserialize('a:1:{s:3:"foo";b:1;}i:6;'));
// Warning: unserialize(): Extra data starting at offset 20 of 24 bytes in %s on line %d
// array(1) {
//   ["foo"]=>
//   bool(true)
// }

Backward Incompatible Changes

A new warning will be emitted when the input to unserialize() contains unconsumed bytes after successfully unserializing something.

For users that use a throwing error handler, this will result in unserialize() failing for this type of modified input string. The documentation already states that passing strings that were not created by serialize() must not be passed to unserialize(), because doing so might introduce security issues.

Users that do not use a throwing error handler might just see the new warning.

Users might not notice anything at all if they suppress unserialize warnings (e.g. by calling @unserialize()).

The new warning is the entire purpose of this RFC, thus the justification for the possible breaks is the reasoning given before.

Proposed PHP Version(s)

Next PHP 8.x (8.3).

RFC Impact

To SAPIs

None.

To Existing Extensions

The same impact as with userland code.

To Opcache

None.

New Constants

None.

php.ini Defaults

None.

Open Issues

None.

Unaffected PHP Functionality

Anything that isn’t unserialize() or more generally related to serialization.

Future Scope

  • Make this an exception.

Proposed Voting Choices

Make unserialize() emit a E_WARNING if the input contains trailing bytes?
Real name Yes No
ashnazg (ashnazg)  
beberlei (beberlei)  
bmajdak (bmajdak)  
crell (crell)  
cschneid (cschneid)  
derick (derick)  
galvao (galvao)  
girgias (girgias)  
ilutov (ilutov)  
jhdxr (jhdxr)  
kguest (kguest)  
nicolasgrekas (nicolasgrekas)  
nielsdos (nielsdos)  
ocramius (ocramius)  
petk (petk)  
sergey (sergey)  
svpernova09 (svpernova09)  
theodorejb (theodorejb)  
timwolla (timwolla)  
weierophinney (weierophinney)  
Final result: 20 0
This poll has been closed.

Patches and Tests

Implementation

References

Rejected Features

  • Throw an Exception instead of emitting a warning.
rfc/unserialize_warn_on_trailing_data.txt · Last modified: 2023/05/01 17:08 by timwolla