This is an old revision of the document!
PHP RFC: json_validate
- Version: 0.9
- Date: 2022-08-14
- Author: Juan Carlos Morales, dev.juan.morales@gmail.com
- Status: Under Discussion
- First Published at: http://wiki.php.net/rfc/json_validate
- Implementation: https://github.com/php/php-src/pull/9399
Introduction
This RFC introduces a new function called json_validate() to validate if an string contains a valid json.
Most userland implementations use json_decode() which by design generates a ZVAL(object/array/etc.) while parsing the string, ergo using memory and processing that could be save.
Proposal
Description
json_validate(string $json, int $depth = 512, int $flags = 0): bool
Parameters
json
The json string being analyzed.
This function only works with UTF-8 encoded strings.
Note:
PHP implements a superset of JSON as specified in the original » RFC 7159.
depth
Maximum nesting depth of the structure being decoded.
flags
Bitmask of JSON_INVALID_UTF8_IGNORE. The behavior of this constant is described on the JSON constants page.
Return values
Returns true if the string passed contains a valid json, otherwise returns false.
Examples
1. Validate a valid json-string
var_dump(json_validate('{ "test": { "foo": "bar" } }'));
Result will be
bool(true)
2. Validate an invalid json-string
var_dump(json_validate('{ "": "": "" } }'));
Result will be
bool(false)
Errors during validation can be fetch by using json_last_error() and/or json_last_error_msg().
General notes from discussion of the RFC
- Different users have tested the functionality and obtained the promised results. Also their feedback about it was positive.
- Most part of the community in the mailing list showed a positive opinion about this RFC, and looks forward for its integration into PHP.
- The ones that checked the code also agree that is small implementation, easy to maintain, and at the same time provides a big benefit for such small implementation.
- The community got involve very actively in the discussion of the RFC and provided all kind of useful feedback, and also took the time to test json_validate() by themselves.
Use cases contributed by the community
Eventhough not reduce to only these examples, during discussion in mailing list, some users expressed:
“Yes well-formed JSON from a trusted source tends to be small-ish. But a validation function also needs to deal with non-well-formed JSON, otherwise you would not need to validate it.”
“If with a new function (json_validate()) it becomes much easier to defend against a Denial-of-Service attack for some parts of a JSON API, then this can be a good addition just for security reasons.”
“fast / efficient validation of a common communication format reduces the attack surface for Denial-of-Service attacks.”
Reasons to have json_validate() function in the core
Based on discussion of the RFC in the mailing list
Disadvantages of using json_decode to only validate a json-string
By design, json_decode() generates a ZVAL (object/array/etc.) while parsing the string, ergo using memory and processing for it, that is not needed if the only thing to discover is if a string contains a valid json or not.
Disadvantages of using regex
Using a regex for this task forces different, error-prone, hard to maintain, implementations.
Disadvantages of userland solutions
- Writing a JSON parser is no trivial
- They need to be up-to-date with the existing PHP JSON parser used by json_decode() already, otherwise a json-string valid in userland solution might not be valid json-string for json_decode() or vice-versa.
- Is redundant work writing a JSON Parser in the userland, as PHP already has one.
PHP already has a JSON parser
As previously mentioned, PHP already has a JSON parser used by json_decode(). The proposed function will use that parser, guaranteeing 100% compatibility between json_decode() and json_validate()
Needs from major projects and developers
In the “References” section, there is a list of major open-source php projects that could benefit with this new function.
Also in the mentioned section a link to one of the most popular StackOverflow questions is provided, which somehow reflects the need from our developers to have a feature like this included.
Please check the “References” section.
Complexity added in the core
At the moment, there is a JSON parser in the core, used by json_decode to do its job, so there is no need to write a new JSON parser for this RFC; the proposed function will use the existing JSON parser exclusively to parse an string without generating any object/array/etc. in memory for it.
Reasons NOT to have json_validate() function in the core
One member of the mailing list expressed that:
- Incorporating such a small implementation that can be achieve with userland code is not a good idea.
Quote:
“If we keep the tendency to pollute already bloated standard library with an army of small functions that could have not exists and be replaced with normal PHP counterparts IMHO we'll end with frustration from developers as I believe DX slowly falls down here.”
- json_validate() would only be useful for edge cases.
Quote:
“A `json_decode()` is a substitute that IMO solves 99% of use cases. If I'd follow your logic and accept every small addition that handles 1% of use cases, somebody will raise another RFC for simplexml_validate_string or yaml_validate and the next PhpToken::validate. All above can be valid if we trust that people normally validate 300MB payloads to do nothing if they DON'T fail and there is nothing strange about that.”
- The user also provided an implementation of a JSON parser written in pure PHP. https://gist.github.com/brzuchal/37e888d9b13937891c3e05fead5042bc
Changes in the RFC that happened during discussion
Throw Exception on error
In my initial implementation, the developer had the option to provide a flag to indicate that in case of error during validation and exception should be throw.
The ability to throw an exception on error was remove from the implementation, as this was pointed not only by most of user in the mailing list (with good reasons), but also during code review; as it does not make sense to have such a behavior.
I also hae to admit that after they showed their arguments, I changed my mind, and now I also think it makes sense to have such a behavior in the function.
Others
- I removed 3 of the provided examples because did not adjust to the RFC purpose or were not clear.
- Had to adjust the wording regarding disadvantage of using json_decode() as pointed here:
Backward Incompatible Changes
None, as this is a new function only.
json_validate will no longer be available as a function name.
Proposed PHP Version(s)
next PHP 8.x
RFC Impact
This RFC has no impact on SAPIs, existing extensions, Opcache, etc.
Open Issues/Questions
Returning BOOLEAN or INT
Option 1) Return boolean, TRUe for valid string or FALSE for invalid. In case of error, rely on the functions json_last_error() and json_last_error_msg() to get the proper error. This is how is implemented at the moment.
Option 2) Return integer number, following the already existing constants for json errors, being the return values as:
JSON_ERROR_NONE
No error has occurred
JSON_ERROR_DEPTH
The maximum stack depth has been exceeded
JSON_ERROR_STATE_MISMATCH
Invalid or malformed JSON
JSON_ERROR_CTRL_CHAR
Control character error, possibly incorrectly encoded
JSON_ERROR_SYNTAX
Syntax error
JSON_ERROR_UTF8
Malformed UTF-8 characters, possibly incorrectly encoded
JSON_ERROR_RECURSION
One or more recursive references in the value to be encoded
JSON_ERROR_INF_OR_NAN
One or more NAN or INF values in the value to be encoded
JSON_ERROR_UNSUPPORTED_TYPE
A value of a type that cannot be encoded was given
JSON_ERROR_INVALID_PROPERTY_NAME
A property name that cannot be encoded was given
JSON_ERROR_UTF16
Malformed UTF-16 characters, possibly incorrectly encoded
From implementation point of view, is very easy to achieve both, we just have to come to an agreement.
Personally I prefer Option 1, the BOOLEAN return value, because TRUE is more appeal to the understanding that the json-string is valid ... than returning zero or JSON_ERROR_NONE, which in PHP is a falsy value by design. For me BOOLEAN will avoid confusions in the usage of the function.
Option 2 would force the developers to write code like:
if (!json_validate($input)) { ... }
if (json_validate($input) === JSON_ERROR_NONE) { ... }
if (json_validate($input) === 0) { ... }
... etc.
I also have the feeling that with “Option 2” developers will end up writing a wrapper around json_validate() to use it straight forward in their code.
I prefer Option 1. , but this is still open.
Future Scope
- (To be defined by future discussions if needed)
Proposed Voting Choices
- (To be defined)
Implementation
- Github: https://github.com/php/php-src/pull/9399
References
Major Open-Source projects that will benefit out of this
class JsonValidator extends ConstraintValidator
public function validateJson($attribute, $value) { if (is_array($value)) { return false; } if (! is_scalar($value) && ! is_null($value) && ! method_exists($value, '__toString')) { return false; } json_decode($value); return json_last_error() === JSON_ERROR_NONE; }
public static function isJson($value) {
protected function isValidJsonValue($value) { if (in_array($value, ['null', 'false', '0', '""', '[]']) || (json_decode($value) !== null && json_last_error() === JSON_ERROR_NONE) ) { return true; } //JSON last error reset json_encode([]); return false; }
public function isValid($string) { if ($string !== false && $string !== null && $string !== '') { json_decode($string); if (json_last_error() === JSON_ERROR_NONE) { return true; } } return false; }
public static function validateJson($value, $params) { return (bool) (@json_decode($value)); }
final class Json extends AbstractRule { /** * {@inheritDoc} */ public function validate($input): bool { if (!is_string($input) || $input === '') { return false; } json_decode($input); return json_last_error() === JSON_ERROR_NONE; } }
final class Json extends AbstractRule { /** * {@inheritDoc} */ public function validate($input): bool { if (!is_string($input) || $input === '') { return false; } json_decode($input); return json_last_error() === JSON_ERROR_NONE; } }
public static function isJson($string) { json_decode($string); return json_last_error() == JSON_ERROR_NONE; }
function is_json( $argument, $ignore_scalars = true ) { if ( ! is_string( $argument ) || '' === $argument ) { return false; } if ( $ignore_scalars && ! in_array( $argument[0], [ '{', '[' ], true ) ) { return false; } json_decode( $argument, $assoc = true ); return json_last_error() === JSON_ERROR_NONE; }
if (\is_string($value)) { json_decode($value); //<------ HERE // Check if value is a valid JSON string. if ($value !== '' && json_last_error() !== JSON_ERROR_NONE) { /** * If the value is not empty and is not a valid JSON string, * it is most likely a custom field created in Joomla 3 and * the value is a string that contains the file name. */ if (is_file(JPATH_ROOT . '/' . $value)) { $value = '{"imagefile":"' . $value . '","alt_text":""}'; } else { $value = ''; } }
Stackoverflow questions related to this
In PHP, this question is one of the most high ranked questions related to json && php in stackoverflow, “Fastest way to check if a string is JSON in PHP?”. The question
Viewed 484k times. The ranking
Person asking how to do exactly this, also providing a real use case; eventhough in python, the programming language is not important. In Python
Someone has also doing exactly this , in JAVA. In Java