This RFC proposes adding support for signed integers with specific endianness to PHP's pack() and unpack() functions. This addresses GitHub issue #17068 and fixes the format letter choices in the proposed implementation (PR #19368).
Currently, PHP's pack/unpack functions support:
s, l, q (2, 4, 8 bytes)S, L, Q (2, 4, 8 bytes)v/n, V/N, P/J (2, 4, 8 bytes)However, there is no support for signed integers with specific endianness, forcing developers to use manual workarounds:
<?php // Current manual approach for signed little-endian 4-byte integer $unpackToSignedInt = static function (string $v) { $unpacked = unpack('va/Cb/cc', $v); return ($unpacked['c'] << 24) | ($unpacked['b'] << 16) | $unpacked['a']; }; // Proposed approach $value = unpack('w', $binaryData)[1]; // signed little-endian 2-byte ?>
According to the Perl documentation (https://perldoc.perl.org/functions/pack), Perl handles signed integers with endianness using modifier syntax:
s< signed 16-bit, little-endian byte order s> signed 16-bit, big-endian byte order l< signed 32-bit, little-endian byte order l> signed 32-bit, big-endian byte order q< signed 64-bit, little-endian byte order q> signed 64-bit, big-endian byte order
The Perl documentation states: “Starting with Perl 5.10.0, integer and floating-point formats... may all be followed by the '>' or '<' endianness modifiers to respectively enforce big- or little-endian byte-order.”
While Perl's specification provides the ideal reference, it might not be the best fit for several technical reasons:
1. Base Letters Already Taken
PHP already uses the base letters for machine-endian signed integers:
s = signed 16-bit (machine endian)l = signed 32-bit (machine endian)q = signed 64-bit (machine endian)2. Parser Architecture Limitations
Perl uses modifier syntax where endianness indicators (<, >) follow the base format letter. PHP's pack format parser is designed around single-character format codes in switch/case statements, not compound expressions like s< or s>.
3. Different Design Philosophy
PHP established a pattern of using completely different letters for endian-specific variants rather than modifiers like Perl's approach with the unsigned endian letters for example: v/n (2-byte), V/N (4-byte), P/J (8-byte)
The proposed PR #19368 introduces arbitrary format letters that don't follow any logical pattern, triggering the creation of this RFC:
m/y for signed 2-byte (little/big endian) M/Y for signed 4-byte (little/big endian) p/j for signed 8-byte (little/big endian)
Issues with current choices:
s, l, q)Currently Used Letters:
Lowercase: a, c, d, e, f, g, h, i, j, l, m, n, p, q, s, v, x, y
Uppercase: A, C, E, G, H, I, J, L, M, N, P, Q, S, V, X, Y, Z
Available Letters:
Lowercase: b, k, o, r, t, u, w, z
Uppercase: B, D, F, K, O, R, T, U, W
This RFC proposes to add the last two missing format letters to pack and unpack.
Proposed Format Letters:
w/W for signed 2-byte (little/big endian)t/T for signed 4-byte (little/big endian)r/R for signed 8-byte (little/big endian)Rationale:
s/l/q base letters (already taken), these letters provide a systematic alternative
Implementing Perl's modifier syntax (s<, s>) was considered but that would mean a significant overhaul of the parser, as well as starting a complex migration path for users.
That would also lead to confusion because of multiple ways to express the same format (e.g., both <s and w for signed 2-byte little-endian). This would also mean that we could try to match Perl format letters, which may not be feasible for all types.
A new function would solve all the problems above, but it would be out of scope for this RFC and may not be worth adding it, as nearly all formats are available already through pack() and unpack(). This RFC aims to add the very last formats missing from pack and unpack.
Perl vs PHP Approaches:
| Perl Specification | Current PR (Wrong) | Proposed Solution |
|---|---|---|
| s< (signed 2-byte LE) | m | w |
| s> (signed 2-byte BE) | y | W |
| l< (signed 4-byte LE) | M | t |
| l> (signed 4-byte BE) | Y | T |
| q< (signed 8-byte LE) | p | r |
| q> (signed 8-byte BE) | j | R |
PHP Format Letter Organization:
| Type | 2-byte | 4-byte | 8-byte |
|---|---|---|---|
| Unsigned LE | v | V | P |
| Unsigned BE | n | N | J |
| Signed LE | w (proposed) | t (proposed) | r (proposed) |
| Signed BE | W (proposed) | T (proposed) | R (proposed) |
32-bit Platform Behavior:
On 32-bit platforms, 8-byte format codes (r/R) will throw a ValueError with the message “64-bit format codes are not available for 32-bit versions of PHP”, consistent with existing behavior for q/Q/P/J.
<?php // On 32-bit platforms try { pack('r', 1); } catch (ValueError $e) { echo $e->getMessage(); // "64-bit format codes are not available..." } ?>
There are no backward compatibility concerns for existing code. The proposed letters (w, W, t, T, r, R) are currently unused in PHP's pack/unpack implementation.
PHP 8.6 (next minor version)