PHP RFC: Add pack()/unpack() support for signed integers with specific endianness
- Version: 1.0
- Date: 2025-09-15
- Author: Alexandre Daubois, alexandredaubois@php.net
- Status: Draft
- Implementation: https://github.com/php/php-src/pull/19368
Introduction
This RFC proposes adding support for signed integers with specific endianness to PHP's pack()
and unpack()
functions. This addresses GitHub issue #17068 and fixes the format letter choices in the current implementation (PR #19368).
Currently, PHP's pack/unpack functions support:
- Machine-endian signed integers:
s
,l
,q
(2, 4, 8 bytes) - Machine-endian unsigned integers:
S
,L
,Q
(2, 4, 8 bytes) - Endian-specific unsigned integers:
v
/n
,V
/N
,P
/J
(2, 4, 8 bytes)
However, there is no support for signed integers with specific endianness, forcing developers to use manual workarounds:
<?php // Current manual approach for signed little-endian 4-byte integer $unpackToSignedInt = static function (string $v) { $unpacked = unpack('va/Cb/cc', $v); return ($unpacked['c'] << 24) | ($unpacked['b'] << 16) | $unpacked['a']; }; // Proposed approach $value = unpack('w', $binaryData)[1]; // signed little-endian 2-byte ?>
Perl Specification Reference
According to the Perl documentation (https://perldoc.perl.org/functions/pack), Perl handles signed integers with endianness using modifier syntax:
s< signed 16-bit, little-endian byte order s> signed 16-bit, big-endian byte order l< signed 32-bit, little-endian byte order l> signed 32-bit, big-endian byte order q< signed 64-bit, little-endian byte order q> signed 64-bit, big-endian byte order
The Perl documentation states: “Starting with Perl 5.10.0, integer and floating-point formats... may all be followed by the '>' or '<' endianness modifiers to respectively enforce big- or little-endian byte-order.”
Why Perl's Approach Cannot Be Used in PHP
While Perl's specification provides the ideal reference, PHP cannot adopt Perl's exact syntax for several technical reasons:
1. Base Letters Already Taken
PHP already uses the base letters for machine-endian signed integers:
s
= signed 16-bit (machine endian)l
= signed 32-bit (machine endian)q
= signed 64-bit (machine endian)
2. Parser Architecture Limitations
Perl uses modifier syntax where endianness indicators (<
, >
) follow the base format letter. PHP's pack format parser is designed around single-character format codes in switch/case statements, not compound expressions like s<
or s>
.
3. Different Design Philosophy
PHP established a pattern of using completely different letters for endian-specific variants:
- Unsigned endian-specific:
v
/n
(2-byte),V
/N
(4-byte),P
/J
(8-byte) - Rather than modifiers like Perl's approach
Current Implementation Problems
The current PR #19368 introduces arbitrary format letters that don't follow any logical pattern:
m/y for signed 2-byte (little/big endian) M/Y for signed 4-byte (little/big endian) p/j for signed 8-byte (little/big endian)
Issues with current choices:
- No relationship to Perl's base letters (
s
,l
,q
) - No logical pairing with existing unsigned endian formats
- Arbitrary selection that doesn't follow PHP's established patterns
Format Letter Analysis
Currently Used Letters:
Lowercase: a
, c
, d
, e
, f
, g
, h
, i
, j
, l
, m
, n
, p
, q
, s
, v
, x
, y
Uppercase: A
, C
, E
, G
, H
, I
, J
, L
, M
, N
, P
, Q
, S
, V
, X
, Y
, Z
Available Letters:
Lowercase: b
, k
, o
, r
, t
, u
, w
, z
Uppercase: B
, D
, F
, K
, O
, R
, T
, U
, W
Proposed Solution
Replace the current arbitrary letter choices with letters that follow PHP's established conventions and create logical relationships with existing formats:
Proposed Format Letters:
w
/W
for signed 2-byte (little/big endian)t
/T
for signed 4-byte (little/big endian)r
/R
for signed 8-byte (little/big endian)
Rationale:
- Follows PHP convention: lowercase = little-endian, uppercase = big-endian
- Systematic approach: Creates consistent pairs rather than arbitrary letter choices
- Available letters: All proposed letters are currently unused
- Closest to Perl's intent: While we can't use Perl's exact `s`/`l`/`q` base letters (already taken), these letters provide a systematic alternative
Comparison Tables
Perl vs PHP Approaches:
Perl Specification | Current PR (Wrong) | Proposed Solution |
---|---|---|
s< (signed 2-byte LE) | m | w |
s> (signed 2-byte BE) | y | W |
l< (signed 4-byte LE) | M | t |
l> (signed 4-byte BE) | Y | T |
q< (signed 8-byte LE) | p | r |
q> (signed 8-byte BE) | j | R |
PHP Format Letter Organization:
Type | 2-byte | 4-byte | 8-byte |
---|---|---|---|
Unsigned LE | v | V | P |
Unsigned BE | n | N | J |
Signed LE | w (proposed) | t (proposed) | r (proposed) |
Signed BE | W (proposed) | T (proposed) | R (proposed) |
Platform Considerations
32-bit Platform Behavior:
On 32-bit platforms, 8-byte format codes (r
/R
) will throw a ValueError
with the message “64-bit format codes are not available for 32-bit versions of PHP”, consistent with existing behavior for q
/Q
/P
/J
.
<?php // On 32-bit platforms try { pack('r', 1); } catch (ValueError $e) { echo $e->getMessage(); // "64-bit format codes are not available..." } ?>
Backward Incompatible Changes
This change modifies the format letters introduced in PR #19368. Since that PR hasn't been released yet, there are no backward compatibility concerns for existing code.
The proposed letters (w
, W
, t
, T
, r
, R
) are currently unused in PHP's pack/unpack implementation.
Proposed PHP Version(s)
PHP 8.6 (next minor version)
Voting Choices
Implementation
The implementation is available in PR #19368, which requires updating the format letters from the current arbitrary choices to the proposed systematic approach outlined in this RFC.
Changes required in the pull request if this get accepted:
- Replace
m
withw
,y
withW
- Replace
M
witht
,Y
withT
- Replace
p
withr
,j
withR
References
- GitHub Issue: https://github.com/php/php-src/issues/17068
- Current Implementation: https://github.com/php/php-src/pull/19368
- Perl pack documentation: https://perldoc.perl.org/functions/pack
- PHP pack documentation: https://www.php.net/manual/en/function.pack.php