This RFC proposes adding support for signed integers with specific endianness to PHP's pack() and unpack() functions using Perl's endianness modifier syntax. This addresses GitHub issue #17068.
Currently, PHP's pack/unpack functions support:
s, l, q (2, 4, 8 bytes)S, L, Q (2, 4, 8 bytes)v/n, V/N, P/J (2, 4, 8 bytes)However, there is no support for signed integers with specific endianness, forcing developers to use manual workarounds:
<?php // Current manual approach for signed little-endian 4-byte integer $unpackToSignedInt = static function (string $v) { $unpacked = unpack('va/Cb/cc', $v); return ($unpacked['c'] << 24) | ($unpacked['b'] << 16) | $unpacked['a']; }; // Proposed approach with modifiers $value = unpack('l<', $binaryData)[1]; // signed little-endian 4-byte ?>
According to the Perl documentation (https://perldoc.perl.org/functions/pack), Perl handles signed integers with endianness using modifier syntax:
s< signed 16-bit, little-endian byte order s> signed 16-bit, big-endian byte order l< signed 32-bit, little-endian byte order l> signed 32-bit, big-endian byte order q< signed 64-bit, little-endian byte order q> signed 64-bit, big-endian byte order
This RFC proposes adding endianness modifiers (< and >) to PHP's pack/unpack functions, following Perl's established syntax.
Proposed Syntax:
s</s> for signed 2-byte (little/big endian)l</l> for signed 4-byte (little/big endian)q</q> for signed 8-byte (little/big endian)S</S> for unsigned 2-byte (little/big endian)L</L> for unsigned 4-byte (little/big endian)Q</Q> for unsigned 8-byte (little/big endian)Here are the pros of this approach:
< and > symbols visually suggest byte order directionExample Usage:
<?php // Little-endian signed integers $data = pack('s<l<q<', -258, -16909060, -72340172838076673); // Big-endian signed integers $data = pack('s>l>q>', -258, -16909060, -72340172838076673); // Unsigned integers with explicit endianness $data = pack('S<L>Q<', 258, 16909060, 72340172838076673); // Mixed endianness (little-endian 16-bit, big-endian 32-bit) $data = pack('s<2l>2', 258, -2, 16909060, -16909060); // Unpacking with modifiers [$int16_le, $int32_le] = array_values(unpack('s<a/l<b', $data)); [$uint16_be, $uint32_le] = array_values(unpack('S>a/L<b', $data)); ?>
Error Handling:
The modifiers should emit a ValueError when used with unsupported format letters, preventing silent failures:
<?php // Using modifiers with unsupported format letters pack('a<', 'test'); // ValueError: Endianness modifier '<' is not supported for format code 'a' pack('Z>', 'test'); // ValueError: Endianness modifier '>' is not supported for format code 'Z' // Using modifiers on formats with inherent endianness pack('v<', 42); // ValueError: Endianness modifier '<' cannot be applied to format code 'v' which already has inherent endianness pack('N>', 42); // ValueError: Endianness modifier '>' cannot be applied to format code 'N' which already has inherent endianness ?>
Following Perl's design, endianness modifiers are prohibited on format codes that already have inherent endianness. This prevents ambiguity about which endianness takes precedence.
Formats that CANNOT use modifiers:
v/n - 2-byte unsigned with inherent endianness (little/big)V/N - 4-byte unsigned with inherent endianness (little/big)P/J - 8-byte unsigned with inherent endianness (little/big)
Perl explicitly prohibits modifiers on inherent-endian formats to avoid conflicts. For example, attempting v< in Perl raises: “'<' allowed only after types sSiIlLqQjJfFdDpP( in pack”.
When to use modifiers vs inherent formats:
<?php // For SIGNED integers, only modifiers work pack('s<', -42); // Signed little-endian 16-bit - no equivalent format exists pack('l>', -42); // Signed big-endian 32-bit - no equivalent format exists // For UNSIGNED integers, both work pack('S<', 42) === pack('v', 42); // Both: unsigned 2-byte little-endian pack('S>', 42) === pack('n', 42); // Both: unsigned 2-byte big-endian pack('L<', 42) === pack('V', 42); // Both: unsigned 4-byte little-endian pack('L>', 42) === pack('N', 42); // Both: unsigned 4-byte big-endian ?>
Alternative 1: New Format Letters
Initially, new format letters were proposed: w/W (2-byte), t/T (4-byte), r/R (8-byte).
This was rejected because:
</>), letter pairs don't visually convey endiannessAlternative 2: Creating a New Function
A completely new function for binary packing could be designed with modern syntax.
This was rejected as well because:
Perl vs PHP (Proposed):
| Perl Specification | Proposed PHP Implementation |
|---|---|
| s< (signed 2-byte LE) | s< |
| s> (signed 2-byte BE) | s> |
| S< (unsigned 2-byte LE) | S< |
| S> (unsigned 2-byte BE) | S> |
| l< (signed 4-byte LE) | l< |
| l> (signed 4-byte BE) | l> |
| L< (unsigned 4-byte LE) | L< |
| L> (unsigned 4-byte BE) | L> |
| q< (signed 8-byte LE) | q< |
| q> (signed 8-byte BE) | q> |
| Q< (unsigned 8-byte LE) | Q< |
| Q> (unsigned 8-byte BE) | Q> |
Complete PHP Format Letter Organization:
| Type | 2-byte | 4-byte | 8-byte |
|---|---|---|---|
| Unsigned LE (inherent) | v | V | P |
| Unsigned BE (inherent) | n | N | J |
| Unsigned machine-endian | S | L | Q |
| Unsigned LE (modifier) | S< (proposed) | L< (proposed) | Q< (proposed) |
| Unsigned BE (modifier) | S> (proposed) | L> (proposed) | Q> (proposed) |
| Signed machine-endian | s | l | q |
| Signed LE (modifier) | s< (proposed) | l< (proposed) | q< (proposed) |
| Signed BE (modifier) | s> (proposed) | l> (proposed) | q> (proposed) |
32-bit Platform Behavior:
On 32-bit platforms, 8-byte format codes (q</q>/Q</Q>) will throw a ValueError with the message “64-bit format codes are not available for 32-bit versions of PHP”, consistent with existing behavior for q/Q/P/J.
<?php // On 32-bit platforms try { pack('q<', 1); // signed 64-bit } catch (ValueError $e) { echo $e->getMessage(); // "64-bit format codes are not available..." } try { pack('Q>', 1); // unsigned 64-bit } catch (ValueError $e) { echo $e->getMessage(); // "64-bit format codes are not available..." } ?>
Modifier Applicability:
Endianness modifiers are supported for both signed and unsigned machine-endian integer format codes (s, l, q, S, L, Q). While unsigned integers already have dedicated endian-specific letters (v/n, V/N, P/J), supporting modifiers on uppercase letters provides better memorability and consistency. Using modifiers with other format codes will emit a ValueError.
While this RFC covers both signed and unsigned integer modifiers, Perl supports endianness modifiers on additional format types that could be considered in future RFCs. For example, the support could be added to floating-point formats (f, d).
Group Modifiers
Perl's () group syntax allows applying endianness to multiple formats at once:
There are no backward compatibility concerns. The modifier syntax is entirely opt-in:
< and > characters are not currently used in pack format stringsPHP 8.6 (next minor version)