====== PHP RFC: Deprecate json_encode() on classes marked as non-serializable ======
* Version: 0.11
* Date: 2024-09-05
* Author: Philip Hofstetter, phofstetter@sensational.ch
* Status: Under Discussion
* First Published at: http://wiki.php.net/rfc/deprecate-json_encode-nonserializable
===== Introduction =====
PHP internally marks some classes as not fit for serialization using a flag ''%%ZEND_ACC_NOT_SERIALIZABLE%%'' which prevents instances of such classes from being serialized using serialize().
However, this flag is currently not respected by json_encode() which is another serialization method built into PHP and which has special support by the language for userland through [[https://www.php.net/manual/en/class.jsonserializable.php|JsonSerializable]]. json_encode() will encode all instances of internal classes as ''%%{}%%'' regardless of serializabiliy.
Especially for Generator this is harmful because converting a pre-computed ''%%array%%'' into a lazy iterator using Generator is a useful and relatively common refactoring which can otherwise be done transparently to a code-base. At that point it's inconvenient that json_encode() silently encodes Generator instances as ''%%{}%%'', changing its shame and contents.
===== Proposal =====
This RFC proposes to mark calling json_encode() on most instances of classes marked with ''%%ZEND_ACC_NOT_SERIALIZABLE%%'' as deprecated with the longer-term option of throwing an error in the next major version of PHP wich will follow the one this RFC is implemented in.
The flag ''%%ZEND_ACC_NOT_SERIALIZABLE%%'' was intended to mark classes as non-serializable because they either represent a temporary local resource (like a file or database handle) which could not possibly be unserialized later on or because serializing them could have large side-effects (in case of Generator and Iterator).
The same reasoning applies to json_encode() which right now doesn't invoke any of the side-effects (good) but also silently encodes any such object instance as ''%%{}%%''.
For temporary resources (file handles, etc.) this is potentially an acceptable behavior, albeit a bit inconsistent to how serialization is handled, but for Generator, doing this silently is very inconvenient for a developer in the process of converting a code-base from pre-built arrays to generators for either performance or memory consumption reasons.
This can be done mostly transparently to the rest of the code-base, but will require special handling for a potential json_encode() which will currently silently does the wrong thing and not just skip iterating the generator but will also silently change the shape of the output, potentially breaking API contracts without any notification to the user.
One exception to the rule is anonymous classes which are all marked as ''%%ZEND_ACC_NOT_SERIALIZABLE%%'' because then unserializing, their definition will not be present, so they cannot possibly be unserialized again.
In case of json_encode() though, where no generic unserialize operation is defined anyways, there's no reason for deprecating or forbidding to json_encode() anonymous classes, unless their parent class is marked as ''%%ZEND_ACC_NOT_SERIALIZABLE%%'' where the above reasoning applies again.
Thus, this RFC proposes to continue to permit json_encode() on anonymous classes unless they extend a class marked as ''%%ZEND_ACC_NOT_SERIALIZABLE%%''.
==== Other options considered ====
This RFC proposes a solution that handles all classes block-listed for serialization to create a consistent behavior beteween the two built-into PHP serialization mechanisms.
Other options considered concern themselves with handling just the Generator case:
- Add a special case to deprecate/disallow JSON encoding of Generator, but otherwise not look at ''%%ZEND_ACC_NOT_SERIALIZABLE%%''. This would be a proposed fallback option if the backwards compatility concerns are too large to consider. It would complicate the implementation consierably.
- Have json_encode() consume the generator and recurse as if it was encoding an array. While this would probably be the most ergonomic solution for the refactoring case outlined above, given the unforeseeable side-effects generator consumption can have, including endless loops, this is a dangerous operations and whas thus discarded as an option.
- Have json_encode() encode generators as ''%%[]%%'': This would help the refactoring case by upholding possible API contracts and would more cleanly match the shape of a generator (which is a list after all), but it would also be lying to the calling code because the generator likely won't be empty.
==== Impacted internal classes ====
At the time of writing this RFC, the following list of classes (and their subclasses) are affected by this RFC and calling json_encode() on them will throw a deprecation warning in the future.
Some of those are containers of a sort, where this encoding is especially misleading (aside of Generator which was the motivator for this RFC, WeakMap stands out specifically).
Most of the non-serializable classes have no public properties and thus encode as ''%%{}%%''.
Those which currently do have public properties are still mostly meant for internal usage and thus not ideal candidates to json_encode() them. However, if code wants to explicitly turn any such instances into JSON in light of the deprecation currently proposed, casting such instances into array before JSON encoding them is a valid workaround.
$a = new SimpleXmlElement('3foo');
echo json_encode($a); // {"b":"3","c":"foo"}, with deprecation warning
echo json_enode((array) $a); // {"b":"3","c":"foo"}, no deprecation warning
=== Classes with public fields appearing in json_encode() output ===
* CURLFile (has three public properties)
* PDORow (has one public property, queryString)
* PDOStatement (has one public property, queryString)
* SimpleXMLElement (might be useful)
* ReflectionAttribute
* ReflectionClass
* ReflectionClassConstant
* ReflectionConstant
* ReflectionExtension
* ReflectionFiber
* ReflectionFunctionAbstract
* ReflectionGenerator
* ReflectionParameter
* ReflectionProperty
* ReflectionReference
* ReflectionType
* ReflectionZendExtension
=== Backed by temporary resources ===
* AddressInfo
* Collator
* CurlHandle
* CurlMultiHandle
* CurlShareHandle
* DOMXPath
* Dba\Connection
* DeflateContext
* Dom\Implementation
* Dom\NamespaceInfo
* Dom\TokenList
* Dom\XPath
* Dom\XMLDocument
* EnchantBroker
* EnchantDictionary
* FFI
* FFI\CData
* FFI\CType
* FTP\Connection
* GdFont
* GdImage
* InflateContext
* IntlBreakIterator
* IntlCalendar
* IntlCodePointBreakIterator
* IntlDateFormatter
* IntlDatePatternGenerator
* IntlIterator
* IntlPartsIterator
* IntlRuleBasedBreakIterator
* IntlTimeZone
* LDAP\Connection
* LDAP\Result
* LDAP\ResultEntry
* MessageFormatter
* NumberFormatter
* Odbc\Connection
* Odbc\Result
* OpenSSLAsymmetricKey
* OpenSSLCertificate
* OpenSSLCertificateSigningRequest
* PDO
* Pdo\Dblib
* Pdo\Firebird
* Pdo\Mysql
* Pdo\Odbc
* Pdo\Pgsql
* Pdo\Sqlite
* PgSql\Connection
* PgSql\Lob
* PgSql\Result
* Random\Engine\Secure
* ResourceBundle
* SQLite3
* SQLite3Result
* SQLite3Stmt
* Shmop
* Soap\Sdl
* Soap\Url
* Socket
* SplFileInfo
* Spoofchecker
* SysvMessageQueue
* SysvSemaphore
* SysvSharedMemory
* Transliterator
* UConverter
* XMLParser
* finfo
* variant
=== Other ===
* Closure
* Fiber
* Generator
* InternalIterator
* SensitiveParameterValue
* WeakMap
* WeakReference
===== Backward Incompatible Changes =====
Code which accidentally runs json_encode() over instances of classes marked as non-serializable or over larger structures which contain such instances will cause a deprecation warning to be thrown when before there wasn't.
Given that the encoded output was mostly useless for any consumer of such JSON and given that producing the previous output manually is not hard, it's the belief of this RFC that the deprecation warning provides more value than the current behavior because current invocations of json_encode() over unserialized classes is likely unintentional (given the current output of json_encode()).
===== Proposed PHP Version(s) =====
PHP 8.5
===== RFC Impact =====
==== To SAPIs ====
The deprecation warning will bre raised in all SAPIs
==== To Existing Extensions ====
None
==== To Opcache ====
None
Ran the included test-case in the PR with Opcache enabled and got the expected result.
==== New Constants ====
None
===== Open Issues =====
None
===== Unaffected PHP Functionality =====
Any other argument to json_encode() is unaffected
===== Future Scope =====
In the next major version after this RFC passes, the deprecation warning can be changed to an Error, though this will be part of a separate RFC.
===== Proposed Voting Choices =====
Should calling json_encode() on instances of classes marked with ''%%ZEND_ACC_NOT_SERIALIZABLE%%'' be marked as deprecated? Yes, No?
===== Patches and Tests =====
https://github.com/php/php-src/pull/15724
===== Implementation =====
===== References =====
https://github.com/php/php-src/pull/15724