Table of Contents

PHP RFC: var_representation() : readable alternative to var_export()

Introduction

var_export() is a function that gets structured information about the given variable. It is similar to var_dump() with one exception: the returned representation is (often) valid PHP code. However, it is inconvenient to work with the representation of var_export() in many ways, especially since that function was introduced in php 4.2.0 and predates both namespaces and the short [] array syntax. However, because the output format of var_export() is depended upon in php's own unit tests, tests of PECL modules, and the behavior or unit tests of various applications written in PHP, changing var_export() itself may be impractical. This RFC proposes to add a new function var_representation(mixed $value, int $flags = 0): string to convert a variable to a string in a way that fixes the shortcomings of var_export()

Proposal

Add a new function var_representation(mixed $value, int $flags = 0): string that always returns a string. This has the following differences from var_export()

  1. Unconditionally return a string instead of printing to standard output.
  2. Use null instead of NULL - the former is recommended by more coding guidelines such as PSR-2.
  3. Escape control characters including tabs, newlines, etc., unlike var_export()/var_dump(). See the appendix Comparison of string encoding with other languages to learn more.
  4. Change the way indentation is done for arrays/objects. Always add 2 spaces for every level of arrays, never 3 in objects, and put the array start on the same line as the key for arrays and objects)
  5. Render lists as “['item1']” rather than “array(\n 0 => 'item1',\n)”.
  6. Always render empty lists on a single line instead of two lines.
  7. Prepend \ to class names so that generated code snippets can be used in namespaces without any issues.
  8. Support the bit flag VAR_REPRESENTATION_SINGLE_LINE=1 in a new optional parameter int $flags = 0 accepting a bitmask. If the value of $flags includes this flags, var_representation() will return a single-line representation for arrays/objects.
php > echo var_representation(true);
true
php > echo var_representation(1);
1
php > echo var_representation(1.00);
1.0
php > echo var_representation(null);  // differs from uppercase NULL from var_export
null
php > echo var_representation(['key' => 'value']);  // uses short arrays, unlike var_export
[
  'key' => 'value',
]
php > echo var_representation(['a','b']);  // uses short arrays, and omits array keys if array_is_list() would be true
[
  'a',
  'b',
]
php > echo var_representation(['a', 'b', 'c'], VAR_REPRESENTATION_SINGLE_LINE);  // can dump everything on one line.
['a', 'b', 'c']
php > echo var_representation([]);  // always print zero-element arrays without a newline
[]
// lines are indented by a multiple of 2, similar to var_export but not exactly the same
php > echo var_representation([(object) ['key' => (object) ['inner' => [1.0], 'other' => new ArrayObject([2])], 'other' => false]]);
[
  (object) [
    'key' => (object) [
      'inner' => [
        1.0,
      ],
      'other' => \ArrayObject::__set_state([
        2,
      ]),
    ],
    'other' => false,
  ],
]
php > echo var_representation(fopen('test','w'));  // resources are output as null, like var_export
 
Warning: var_representation does not handle resources in php shell code on line 1
null
php > $x = new stdClass(); $x->x = $x; echo var_representation($x);
 
Warning: var_representation does not handle circular references in php shell code on line 1
(object) [
  'x' => null,
]
// If there are any control characters (\x00-\x1f and \x7f), use double quotes instead of single quotes
// (that includes "\r", "\n", "\t", etc.)
php > echo var_representation("Content-Length: 42\r\n"); 
"Content-Length: 42\r\n"
php > echo var_representation("uses double quotes: \$\"'\\\n");
"uses double quotes: \$\"'\\\n"
php > echo var_representation("uses single quotes: \$\"'\\");
'uses single quotes: $"\'\\'
 
 
php > echo var_representation(implode('', array_map('chr', range(0, 0x1f)))), "\n"; // ascii \x00-0x1f
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
php > echo var_representation(implode('', array_map('chr', range(0x20, 0x7f)))), "\n"; // ascii \x20-0x7f
" !\"#\$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f"

Advantages over var_export

Encoding binary data

This does a better job at encoding binary data in a form that is easy to edit. var_export() does not contain any bytes except for \\, \', and \0, not even control characters such as tabs, vertical tabs, backspaces, carriage returns, newlines, etc.

php > echo var_representation("\x00\r\n\x00");
"\x00\r\n\x00"
// var_export gives no visual indication that there is a carriage return before that newline
php > var_export("\x00\r\n\x00");
'' . "\0" . '
' . "\0" . ''
// Attempting to print control characters to your terminal with var_export may cause unexpected side effects
// and unescaped control characters are unreadable
php > var_export(implode('', array_map('chr', range(0, 0x1f))));
'' . "\0" . '
 
 
hp > // (first character and closing ' was hidden by those control characters)
php > echo var_representation(implode('', array_map('chr', range(0, 0x1f))));
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
 
 
// Bytes \x80 and above are passed through with no modification or encoding checks.
// PHP strings are internally just arrays of bytes and
// different php applications use different encodings.
// E.g. for this interactive shell session in a terminal expecting output with the utf-8 encoding:
php > echo var_representation('pi=π'); 
'pi=π'
php > var_export('pi=π');
'pi=π'
php > echo var_representation("\xcf\x80");
'π'

Cleaner output

This omits array keys when none of the array keys are required (i.e. when array_is_list() is true), and puts array values on the same line as array keys. Additionally, this outputs null or unrepresentable values as null instead of NULL, following modern coding guidelines such as PSR-2

Supporting namespaces

var_export() was written in php 4.2, long before php supported namespaces. Because of that, the output of var_export() has never included backslashes to fully qualify class names, which is inconvenient for objects that do implement __set_state (aside: ArrayObject currently doesn't)

php > echo var_representation(new ArrayObject([1,['key' => [true]]]));
\ArrayObject::__set_state([
  1,
  [
    'key' => [
      true,
    ],
  ],
])
php > echo var_representation(new ArrayObject([1,['key'=>[true]]]),VAR_REPRESENTATION_SINGLE_LINE);
\ArrayObject::__set_state([1, ['key' => [true]]])
php > var_export(new ArrayObject([1,['key' => [true]]]));
ArrayObject::__set_state(array(
   0 => 1,
   1 => 
  array (
    'key' => 
    array (
      0 => true,
    ),
  ),
))

Without the backslash, using var_export to build a snipppet such as NS\Something::__set_state([]) will have the class be incorrectly resolved to OtherNS\NS\Something if the output of var_export is used as part of a php file generated using anything other than the global namespace.

php > namespace NS { class Something { public static function __set_state($data) {} }}
php > $code = "namespace Other; return " . var_export(new NS\Something(), true) . ";\n";
php > echo $code;
namespace OtherNS; return NS\Something::__set_state(array(
));
php > eval($code);
 
Warning: Uncaught Error: Class "OtherNS\NS\Something" not found in php shell code(1) : eval()'d code:1
Stack trace:
#0 php shell code(1): eval()
#1 {main}
  thrown in php shell code(1) : eval()'d code on line 1

When would a user use var_representation?

https://externals.io/message/112967#112968

My hesitation remains that this is just duplicating existing functionality with only cosmetic differences.

As a user of PHP 8.1, how would I decide whether to use print_r, var_dump, var_export, or var_representation?

And under what circumstances would I bother to write “var_representation($var, VAR_REPRESENTATION_SINGLE_LINE);”?

An end user may wish to use these functions in the following situations: (These are the personal opinions of the RFC's author)

Use var_representation when:

var_representation returns a parsable string representation of a variable that is easier to read than var_export.

It may be useful when any of the following apply:

Use VAR_REPRESENTATION_SINGLE_LINE when:

This flag may be useful when any of the following apply:

    $this->assertSame("[\\NS\\MyClass::__set_state(['prop' => true]), 2]", $repr)
    // instead of the much longer and harder to type
    $this->assertSame("[\n  \\NS\\MyClass::__set_state([\n    prop' => true,\n  ],\n  2,\n]", $repr)
 

Use var_export when:

var_representation returns a parsable string representation of a variable (that has the limitations described in this RFC)

Use var_dump when:

Use debug_zval_dump when:

debug_zval_dump dumps a string representation of an internal zend value to output.

php > $y = [new stdClass()]; $y[1] = &$y[0]; 
php > debug_zval_dump($y);
array(2) refcount(2){
  [0]=>
  &object(stdClass)#1 (0) refcount(1){
  }
  [1]=>
  &object(stdClass)#1 (0) refcount(1){
  }
}
php > var_dump($y);
array(2) {
  [0]=>
  &object(stdClass)#1 (0) {
  }
  [1]=>
  &object(stdClass)#1 (0) {
  }
}
php > var_export($y);  // here, you get valid php code but don't see the object ids and can't tell if they're different objects
array (
  0 => 
  (object) array(
  ),
  1 => 
  (object) array(
  ),
)

Use print_r when:

print_r prints human-readable information about a variable - it is like print() but recursive.

The below snippet is an example of where you may not want to use print_r().

php > print_r([['key' => 'first', 'other' => 'second', 'third' => '1'], '1', 1, 1.0, true, false, null, '']);
Array
(
    [0] => Array
        (
            [key] => first
            [other] => second
            [third] => 1
        )
 
    [1] => 1
    [2] => 1
    [3] => 1
    [4] => 1
    [5] => 
    [6] => 
    [7] => 
)

Backward Incompatible Changes

None, except for newly added function and constant names. The output format of var_export() is not changed in any way.

Proposed PHP Version(s)

8.1

RFC Impact

To SAPIs

None

To Existing Extensions

No

To Opcache

No impact

New Constants

VAR_REPRESENTATION_SINGLE_LINE

Unaffected PHP Functionality

var_export() does not change in any way.

Future Scope

Extending $flags

Future RFCs may extend $flags by adding more flags, or by allowing an array to be passed to $flags.

Adding more flags here would increase the scope of the rfc and complexity of implementing the change and for reviewing/understanding the implementation.

Supporting an indent option

This was left out since I felt it would increase the scope of the RFC too much.

If an indent option might be supported by also allowing var_representation(value: $value, flags: ['flags' => VAR_REPRESENTATION_SINGLE_LINE, 'indent' => “\t”]) or by bitmask flags such as VAR_REPRESENTATION_INDENT_FOUR_SPACES/VAR_REPRESENTATION_INDENT_TABS/VAR_REPRESENTATION_INDENT_NONE.

The fact that embedded newlines are now no longer emitted as parts of strings makes it easier to efficiently convert the indentation to spaces or tabs using preg_replace or preg_replace_callback

php > echo var_representation([[['key' => 'value  with  space']]]);
[
  [
    [
      'key' => 'value  with  space',
    ],
  ],
]
php > echo preg_replace('/^((  )+)/m', '\1\1', var_representation([[['key' => 'value  with  space']]]));
[
    [
        [
            'key' => 'value  with  space',
        ],
    ],
]

````

Adding magic methods such as __toRepresentation() to PHP

This is outside of the scope of this RFC, but it is possible future RFCs by others may amend the representation of var_representation() before php 8.1 is released or through adding new options to $flags.

Others have suggested adding magic methods that would convert objects to a better representation. No concrete proposals have been made yet. Multiline formatting and the detection of recursive data structures is a potential concern.

Another possibility is to add a magic method such as __toConstructorArgs(): array which would allow converting `$point` to the string 'new Point(x: 1, y: 2)' or 'new Point(1, 2)' if that magic method is defined.

Customizing string representations

It may be useful to override this string representation through additional flags, callbacks, or other mechanisms. However, I don't know if there's widespread interest in that, and this would increase the scope of this RFC.

Emitting code comments in result about references/types/recursion

Adding a comment such as /* resource(2) of type (stream) */ null to the var_representation output with an opt-in flag (e.g. VAR_REPRESENTATION_ADD_TYPE_COMMENTS) to add this information may be useful to explore in follow-up work (to meet more use cases of var_dump).

(Or /* RECURSION */ NULL, or [/* reference */ 123, /* reference */ 123], etc.)

Discussion

PHP already has a lot of ways to dump variables

https://externals.io/message/112924#112943

While I agree that all the suggestions in this thread would improve var_export, I worry that it is failing a “smell test” that I often apply:

“If you're struggling to come up with the appropriate name for something that you're creating, maybe you're creating the wrong thing.”

In this case, the reason it's difficult to name is that PHP already has rather a lot of different ways to produce a human-readable string from a variable. The synopses in the manual aren't particularly enlightening:

print_r — Prints human-readable information about a variable var_dump — Dumps information about a variable var_export — Outputs or returns a parsable string representation of a variable Then there's the slightly more exotic (and rather less useful than it once was) debug_zval_dump; serialization formats that are reasonably human-friendly like json_encode; and any number of frameworks and userland libraries that define their own “dumper” functions because they weren't satisfied with any of the above.

The name of any new function in this crowded space needs to somehow tell the user why they'd use this one over the others - and, indeed, when they wouldn't use it over the others.

Should we be aiming for a single function that can take over from some or all of the others, and deprecate them, rather than just adding to the confusion?

https://externals.io/message/112924#112953

IMO print_r/var_dump should be kept out of this discussion. Those are human readable outputs for human consumption. var_export() is about a machine readable output for recreating initial state within a runtime. The requirements presented are wholly different.

-Sara

If the goal of var_export is only to have some machine-readable output, the following will do it:

<?php
function my_var_export(mixed $x): string {
$serialized = \base64_encode(\serialize($x));
return "\unserialize(\base64_decode('$serialized'))";
}
?>

In reality, the output of var_export() is both machine-readable and human-readable.

—Claude

I believe that the improvements of var_representation make adding a new function worth it. See the section "Use var_representation when".

As mentioned earlier, a lot of existing php code depends on the exact default output of var_export() (e.g. unit tests of php-src itself and otherwise), which was introduced in php 4.2 and predates namespaces and short arrays. Changing it would result in a lot of work in php-src, PECL, and projects written in PHP to support both old and new syntaxes for var_export.

The last time var_export() changed was from stdClass::__set_state(array()) to (object) [] in PHP 7.3.0, but that was something that had a clearer reason to fix - stdClass::__set_state is an undeclared function and many users were inconvenienced by being unable to generate code for stdClass instances.

Vote

This is a Yes/No vote, requiring 2/3 majority. Voting started on 2021-02-05 and ended 2021-02-19.

Add var_representation($value, int $flags=0): string to php?
Real name Yes No
alcaeus (alcaeus)  
asgrim (asgrim)  
ashnazg (ashnazg)  
bmajdak (bmajdak)  
brzuchal (brzuchal)  
cschneid (cschneid)  
davey (davey)  
derick (derick)  
galvao (galvao)  
kalle (kalle)  
kguest (kguest)  
nicolasgrekas (nicolasgrekas)  
ocramius (ocramius)  
pollita (pollita)  
reywob (reywob)  
rjhdby (rjhdby)  
santiagolizardo (santiagolizardo)  
sergey (sergey)  
tandre (tandre)  
Final result: 9 10
This poll has been closed.

References

Appendix

Comparison of string encoding with other languages

See https://man7.org/linux/man-pages/man7/ascii.7.html for details about ascii

ASCII is the American Standard Code for Information Interchange. It is a 7-bit code (with 128 characters). Many 8-bit codes (e.g., ISO 8859-1) contain ASCII as their lower half. The international counterpart of ASCII is known as ISO 646-IRV.

If there are any control characters (in the ranges \x00-\x1f and \x7f), var_representation() uses double quotes instead of single quotes. If there are no control characters, strings are represented the way var_export() currently represents them.

php > echo var_representation(implode('', array_map('chr', range(0, 0x1f)))), "\n"; // ascii \x00-0x1f
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
php > echo var_representation(implode('', array_map('chr', range(0x20, 0x7f)))), "\n"; // ascii \x20-0x7f
" !\"#\$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f"

Python appears to have the same inner representation with shorter representations only for \t\n\r (Python allows escaping inside of single quoted strings).

# \x00-\x1f
print(repr(''.join(chr(c) for c in range(0, 0x20))))                          
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
# \x20-\x7f
print(repr(''.join(chr(c) for c in range(0x20, 0x80))))                       
' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f'

JSON escapes a wider range of control characters, but the format does not require escaping backspaces(\x7f), which are permitted in string literals.

> console.log(JSON.stringify("\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"));
"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f"
> console.log(JSON.stringify(" !\"#$%&'()*+,-.\/0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f"));
" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"

Ruby has additional shorter escapes for \a\b\v\f and also escapes backslashes. For many users, \a\b\v\f are obscure terminal/text file functionality and the hex representation may be more useful.

puts("\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f".inspect)
"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\a\b\t\n\v\f\r\u000E\u000F\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\e\u001C\u001D\u001E\u001F"
puts(" !\"#$%&'()*+,-.\/0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f".inspect)
" !\"\#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007F"

Rejected Features

Printing to stdout by default or configurably

Printing to stdout and creating a string representation are two distinct behaviors, which some would argue should not be combined into the same function. It is simple enough to explicitly write echo var_representation($value);

The name var_representation() was chosen to make it clearer that the function returning a representation, rather than performing an action such as dumping or exporting the value.

https://externals.io/message/112924#112925

The formatting of var_export is certainly a recurring complaint, and previous discussions were not particularly open to changing current var_export behavior, so adding a new function seems to be the way to address the issue (the alternative would be to add a flag to var_export).

I like the idea of the “one line” flag. Actually, this is the main part I'm interested in :) With the one line flag, this produces the ideal formatting for PHPT tests that want to print something like “$v1 + $v2 = $v3”. None of our current dumping functions are suitable for this purpose (json_encode comes closest, but has edge cases like lack of NAN support.)

Some notes:

  • You should drop the $return parameter and make it always return. As this is primarily an export and not a dumping function, printing to stdout doesn't make sense to me. * For strings, have you considered printing them as double-quoted and escaping more characters? This would avoid newlines in oneline mode. And would allow you to escape more control characters. I also find the current '' . "\0" . '' format for encoding null bytes quite awkward.
  • I don't like the short_var_export() name. Is “short” really the primary characteristic of this function? Both var_export_pretty and var_export_canonical seem better to me, though I can't say they're great either. I will refrain from proposing real_var_export() ... oops :P

Regards,

Nikita

Calling this var_export_something

The var_export() function will print to stdout by default, unless $return = true is passed in. I would find it extremely inconsistent and confusing to add a new global function var_export_something() that does not print to stdout by default.

Using an object-oriented api

This was rejected because the most common use cases would not need the ability to customize the output. Additionally, it is possible to use $flags (possibly also allowing an array containing callbacks) to achieve a similar result to method overrides.

https://externals.io/message/112924#112944

Alternatively how about making a VarExporter class.

$exporter = new VarExporter; // Defaults to basic set of encoding options TBD
$exporter->setIndent('  '); // 2 spaces, 1 tab, whatever blows your dress up
$exporter->setUserShortArray(false); // e.g. use array(...)
etc...
 
$serialized = $exporter->serialize($var); // Exports to a var
$exporter->serializeToFile($var, '/tmp/include.inc'); // Exports to a file
$exporter->serializeToStream($var, $stream); // Exports to an already open stream

And if you want the defaults, then just:

$serialized = (var VarExporter)->serialize($var);

Potentially, one could also allow overriding helper methods to perform transformations along the way:

// VarExporter which encodes all strings as base64 blobs.
class Base64StringVarExporter extends VarExporter {
    public function encodeString(string $var): string {
      // parent behavior is `return '"' . addslashes($var) . '"';
      return "base64_decode('" . base64_encode($var) . "')";
    }
}

Not the most performant thing, but extremely powerful.

Dumping to a stream

https://externals.io/message/112924#112944

* You should drop the $return parameter and make it always return. As this is primarily an export and not a dumping function, printing to stdout doesn't make sense to me.

I'd argue the opposite. If dumping a particularly large tree of elements, serializing that to a single string before then being able to write it to file or wherever seems like packing on a lot of unnecessary effort. What I would do is expand the purpose of the $output parameter to take a stream. STDOUT by default, a file stream for writing to include files (one of the more common uses), or even a tmpfile() if you do actually want it in a var.

There's 3 drawbacks I don't like about that proposal:

  1. If a function taking a stream were to throw or encounter a fatal error while converting an object to a stream, then you'd write an incomplete object to the stream or file, which would have to be deleted
    E.g. internally, fprintf() and printf() calls sprintf before writing anything to the stream for related reasons.
  2. This may be much slower and end users may not expect that - a lot of small stream writes with dynamic C function calls would be something I'd expect to take much longer than converting to a string then writing to the stream. (e.g. I assume a lot of small echo $str; is much faster than \fwrite(\STDOUT, $str); in the internal C implementation) (if we call ->serialize() first, then there's less of a reason to expose ->serializeFile() and ->serializeStream())
  3. Adding even more ways to dump to a stream/file. Should that include stream wrappers such as http://? For something like XML/YAML/CSV, being able to write to a file makes sense because those are formats many other applications/languages can consume, which isn't the case for var_export.

Changing var_dump

var_dump is a function which I consider to have goals that are incompatible ways. If an exact representation of reference cycles, identical objects, and circular object data is needed, the code snippet unserialize(“....”) can be generated using var_representation(serialize($value)) (or var_export).

In particular, var_dump() dumps object ids, indicates objects that are identical to each other, shows recursion, and shows the presence of references. It also redundantly annotates values with their types, and generates output for types that cannot be evaluated (e.g. resource(2) of type (stream)).

Adding a comment such as /* resource(2) of type (stream) */ null to the var_representation output with an opt-in flag to add this information may be useful to explore in follow-up work.

https://externals.io/message/112967#112970

Changelog