PHP RFC: Add pack()/unpack() support for endianness modifiers on integers
- Version: 1.1
- Date: 2025-11-21
- Author: Alexandre Daubois, alexandredaubois@php.net
- Status: Under Discussion
- Implementation: https://github.com/php/php-src/pull/19368
Introduction
This RFC proposes adding support for signed integers with specific endianness to PHP's pack() and unpack() functions using Perl's endianness modifier syntax. This addresses GitHub issue #17068.
Currently, PHP's pack/unpack functions support:
- Machine-endian signed integers:
s,l,q(2, 4, 8 bytes) - Machine-endian unsigned integers:
S,L,Q(2, 4, 8 bytes) - Endian-specific unsigned integers:
v/n,V/N,P/J(2, 4, 8 bytes)
However, there is no support for signed integers with specific endianness, forcing developers to use manual workarounds:
<?php // Current manual approach for signed little-endian 4-byte integer $unpackToSignedInt = static function (string $v) { $unpacked = unpack('va/Cb/cc', $v); return ($unpacked['c'] << 24) | ($unpacked['b'] << 16) | $unpacked['a']; }; // Proposed approach with modifiers $value = unpack('l<', $binaryData)[1]; // signed little-endian 4-byte ?>
Perl Specification Reference
According to the Perl documentation (https://perldoc.perl.org/functions/pack), Perl handles signed integers with endianness using modifier syntax:
s< signed 16-bit, little-endian byte order s> signed 16-bit, big-endian byte order l< signed 32-bit, little-endian byte order l> signed 32-bit, big-endian byte order q< signed 64-bit, little-endian byte order q> signed 64-bit, big-endian byte order
Proposed Solution
This RFC proposes adding endianness modifiers (< and >) to PHP's pack/unpack functions, following Perl's established syntax.
Proposed Syntax:
s</s>for signed 2-byte (little/big endian)l</l>for signed 4-byte (little/big endian)q</q>for signed 8-byte (little/big endian)S</S>for unsigned 2-byte (little/big endian)L</L>for unsigned 4-byte (little/big endian)Q</Q>for unsigned 8-byte (little/big endian)
Here are the pros of this approach:
- Consistency with Perl: Maintains compatibility with Perl's well-established syntax, reducing cognitive load for developers working across languages
- Intuitive semantics: The
<and>symbols visually suggest byte order direction - Backward compatibility: Modifiers are opt-in; existing code continues to work unchanged
- No arbitrary choices: Unlike inventing new format letters, this leverages proven syntax with 15+ years of usage in Perl
- Minimal implementation: Proof-of-concept shows straightforward implementation without parser rewrite
Example Usage:
<?php // Little-endian signed integers $data = pack('s<l<q<', -258, -16909060, -72340172838076673); // Big-endian signed integers $data = pack('s>l>q>', -258, -16909060, -72340172838076673); // Unsigned integers with explicit endianness $data = pack('S<L>Q<', 258, 16909060, 72340172838076673); // Mixed endianness (little-endian 16-bit, big-endian 32-bit) $data = pack('s<2l>2', 258, -2, 16909060, -16909060); // Unpacking with modifiers [$int16_le, $int32_le] = array_values(unpack('s<a/l<b', $data)); [$uint16_be, $uint32_le] = array_values(unpack('S>a/L<b', $data)); ?>
Error Handling:
The modifiers should emit a ValueError when used with unsupported format letters, preventing silent failures:
<?php // Using modifiers with unsupported format letters pack('a<', 'test'); // ValueError: Endianness modifier '<' is not supported for format code 'a' pack('Z>', 'test'); // ValueError: Endianness modifier '>' is not supported for format code 'Z' // Using modifiers on formats with inherent endianness pack('v<', 42); // ValueError: Endianness modifier '<' cannot be applied to format code 'v' which already has inherent endianness pack('N>', 42); // ValueError: Endianness modifier '>' cannot be applied to format code 'N' which already has inherent endianness ?>
Modifier Restrictions
Following Perl's design, endianness modifiers are prohibited on format codes that already have inherent endianness. This prevents ambiguity about which endianness takes precedence.
Formats that CANNOT use modifiers:
v/n- 2-byte unsigned with inherent endianness (little/big)V/N- 4-byte unsigned with inherent endianness (little/big)P/J- 8-byte unsigned with inherent endianness (little/big)
Perl explicitly prohibits modifiers on inherent-endian formats to avoid conflicts. For example, attempting v< in Perl raises: “'<' allowed only after types sSiIlLqQjJfFdDpP( in pack”.
When to use modifiers vs inherent formats:
<?php // For SIGNED integers, only modifiers work pack('s<', -42); // Signed little-endian 16-bit - no equivalent format exists pack('l>', -42); // Signed big-endian 32-bit - no equivalent format exists // For UNSIGNED integers, both work pack('S<', 42) === pack('v', 42); // Both: unsigned 2-byte little-endian pack('S>', 42) === pack('n', 42); // Both: unsigned 2-byte big-endian pack('L<', 42) === pack('V', 42); // Both: unsigned 4-byte little-endian pack('L>', 42) === pack('N', 42); // Both: unsigned 4-byte big-endian ?>
Considered Alternatives
Alternative 1: New Format Letters
Initially, new format letters were proposed: w/W (2-byte), t/T (4-byte), r/R (8-byte).
This was rejected because:
- Needless divergence from Perl with arbitrary selection: no logical relationship to the underlying integer types or Perl's base letters
- Unlike directional modifiers (
</>), letter pairs don't visually convey endianness
Alternative 2: Creating a New Function
A completely new function for binary packing could be designed with modern syntax.
This was rejected as well because:
- This RFC aims to complete pack/unpack functionality, not replace it
Comparison Tables
Perl vs PHP (Proposed):
| Perl Specification | Proposed PHP Implementation |
|---|---|
| s< (signed 2-byte LE) | s< |
| s> (signed 2-byte BE) | s> |
| S< (unsigned 2-byte LE) | S< |
| S> (unsigned 2-byte BE) | S> |
| l< (signed 4-byte LE) | l< |
| l> (signed 4-byte BE) | l> |
| L< (unsigned 4-byte LE) | L< |
| L> (unsigned 4-byte BE) | L> |
| q< (signed 8-byte LE) | q< |
| q> (signed 8-byte BE) | q> |
| Q< (unsigned 8-byte LE) | Q< |
| Q> (unsigned 8-byte BE) | Q> |
Complete PHP Format Letter Organization:
| Type | 2-byte | 4-byte | 8-byte |
|---|---|---|---|
| Unsigned LE (inherent) | v | V | P |
| Unsigned BE (inherent) | n | N | J |
| Unsigned machine-endian | S | L | Q |
| Unsigned LE (modifier) | S< (proposed) | L< (proposed) | Q< (proposed) |
| Unsigned BE (modifier) | S> (proposed) | L> (proposed) | Q> (proposed) |
| Signed machine-endian | s | l | q |
| Signed LE (modifier) | s< (proposed) | l< (proposed) | q< (proposed) |
| Signed BE (modifier) | s> (proposed) | l> (proposed) | q> (proposed) |
Platform Considerations
32-bit Platform Behavior:
On 32-bit platforms, 8-byte format codes (q</q>/Q</Q>) will throw a ValueError with the message “64-bit format codes are not available for 32-bit versions of PHP”, consistent with existing behavior for q/Q/P/J.
<?php // On 32-bit platforms try { pack('q<', 1); // signed 64-bit } catch (ValueError $e) { echo $e->getMessage(); // "64-bit format codes are not available..." } try { pack('Q>', 1); // unsigned 64-bit } catch (ValueError $e) { echo $e->getMessage(); // "64-bit format codes are not available..." } ?>
Modifier Applicability:
Endianness modifiers are supported for both signed and unsigned machine-endian integer format codes (s, l, q, S, L, Q). While unsigned integers already have dedicated endian-specific letters (v/n, V/N, P/J), supporting modifiers on uppercase letters provides better memorability and consistency. Using modifiers with other format codes will emit a ValueError.
Future Scope
While this RFC covers both signed and unsigned integer modifiers, Perl supports endianness modifiers on additional format types that could be considered in future RFCs. For example, the support could be added to floating-point formats (f, d).
Group Modifiers
Perl's () group syntax allows applying endianness to multiple formats at once:
Backward Incompatible Changes
There are no backward compatibility concerns. The modifier syntax is entirely opt-in:
- Existing format strings without modifiers continue to work unchanged
- No existing format codes are removed or altered
- The
<and>characters are not currently used in pack format strings
Proposed PHP Version(s)
PHP 8.6 (next minor version)
Voting Choices
References
- GitHub Issue: https://github.com/php/php-src/issues/17068
- Implementation PR: https://github.com/php/php-src/pull/19368
- Discussion Thread: https://externals.io/message/128702
- Perl pack documentation: https://perldoc.perl.org/functions/pack
- PHP pack documentation: https://www.php.net/manual/en/function.pack.php