====== PHP RFC: Add pack()/unpack() support for signed integers with specific endianness ====== * Version: 1.0 * Date: 2025-09-15 * Author: Alexandre Daubois, * Status: Draft * Implementation: https://github.com/php/php-src/pull/19368 ===== Introduction ===== This RFC proposes adding support for signed integers with specific endianness to PHP's pack() and unpack() functions. This addresses GitHub issue #17068 and fixes the format letter choices in the current implementation (PR #19368). Currently, PHP's pack/unpack functions support: * Machine-endian signed integers: s, l, q (2, 4, 8 bytes) * Machine-endian unsigned integers: S, L, Q (2, 4, 8 bytes) * Endian-specific unsigned integers: v/n, V/N, P/J (2, 4, 8 bytes) However, there is **no support for signed integers with specific endianness**, forcing developers to use manual workarounds: ===== Perl Specification Reference ===== According to the Perl documentation (https://perldoc.perl.org/functions/pack), Perl handles signed integers with endianness using modifier syntax: s< signed 16-bit, little-endian byte order s> signed 16-bit, big-endian byte order l< signed 32-bit, little-endian byte order l> signed 32-bit, big-endian byte order q< signed 64-bit, little-endian byte order q> signed 64-bit, big-endian byte order The Perl documentation states: "Starting with Perl 5.10.0, integer and floating-point formats... may all be followed by the '>' or '<' endianness modifiers to respectively enforce big- or little-endian byte-order." ===== Why Perl's Approach Cannot Be Used in PHP ===== While Perl's specification provides the ideal reference, PHP cannot adopt Perl's exact syntax for several technical reasons: **1. Base Letters Already Taken** PHP already uses the base letters for machine-endian signed integers: * s = signed 16-bit (machine endian) * l = signed 32-bit (machine endian) * q = signed 64-bit (machine endian) **2. Parser Architecture Limitations** Perl uses modifier syntax where endianness indicators (<, >) follow the base format letter. PHP's pack format parser is designed around single-character format codes in switch/case statements, not compound expressions like s< or s>. **3. Different Design Philosophy** PHP established a pattern of using completely different letters for endian-specific variants: * Unsigned endian-specific: v/n (2-byte), V/N (4-byte), P/J (8-byte) * Rather than modifiers like Perl's approach ===== Current Implementation Problems ===== The current PR #19368 introduces arbitrary format letters that don't follow any logical pattern: m/y for signed 2-byte (little/big endian) M/Y for signed 4-byte (little/big endian) p/j for signed 8-byte (little/big endian) **Issues with current choices:** * No relationship to Perl's base letters (s, l, q) * No logical pairing with existing unsigned endian formats * Arbitrary selection that doesn't follow PHP's established patterns ===== Format Letter Analysis ===== **Currently Used Letters:** Lowercase: a, c, d, e, f, g, h, i, j, l, m, n, p, q, s, v, x, y Uppercase: A, C, E, G, H, I, J, L, M, N, P, Q, S, V, X, Y, Z **Available Letters:** Lowercase: b, k, o, r, t, u, w, z Uppercase: B, D, F, K, O, R, T, U, W ===== Proposed Solution ===== Replace the current arbitrary letter choices with letters that follow PHP's established conventions and create logical relationships with existing formats: **Proposed Format Letters:** * w/W for signed 2-byte (little/big endian) * t/T for signed 4-byte (little/big endian) * r/R for signed 8-byte (little/big endian) **Rationale:** * **Follows PHP convention**: lowercase = little-endian, uppercase = big-endian * **Systematic approach**: Creates consistent pairs rather than arbitrary letter choices * **Available letters**: All proposed letters are currently unused * **Closest to Perl's intent**: While we can't use Perl's exact `s`/`l`/`q` base letters (already taken), these letters provide a systematic alternative ===== Comparison Tables ===== **Perl vs PHP Approaches:** ^ Perl Specification ^ Current PR (Wrong) ^ Proposed Solution ^ | s< (signed 2-byte LE) | m | w | | s> (signed 2-byte BE) | y | W | | l< (signed 4-byte LE) | M | t | | l> (signed 4-byte BE) | Y | T | | q< (signed 8-byte LE) | p | r | | q> (signed 8-byte BE) | j | R | **PHP Format Letter Organization:** ^ Type ^ 2-byte ^ 4-byte ^ 8-byte ^ | Unsigned LE | v | V | P | | Unsigned BE | n | N | J | | Signed LE | w (proposed) | t (proposed) | r (proposed) | | Signed BE | W (proposed) | T (proposed) | R (proposed) | ===== Platform Considerations ===== **32-bit Platform Behavior:** On 32-bit platforms, 8-byte format codes (r/R) will throw a ValueError with the message "64-bit format codes are not available for 32-bit versions of PHP", consistent with existing behavior for q/Q/P/J. getMessage(); // "64-bit format codes are not available..." } ?> ===== Backward Incompatible Changes ===== This change modifies the format letters introduced in PR #19368. Since that PR hasn't been released yet, there are no backward compatibility concerns for existing code. The proposed letters (w, W, t, T, r, R) are currently unused in PHP's pack/unpack implementation. ===== Proposed PHP Version(s) ===== PHP 8.6 (next minor version) ===== Voting Choices ===== * Yes * No ===== Implementation ===== The implementation is available in PR #19368, which requires updating the format letters from the current arbitrary choices to the proposed systematic approach outlined in this RFC. Changes required in the pull request if this get accepted: * Replace m with w, y with W * Replace M with t, Y with T * Replace p with r, j with R ===== References ===== - GitHub Issue: https://github.com/php/php-src/issues/17068 - Current Implementation: https://github.com/php/php-src/pull/19368 - Perl pack documentation: https://perldoc.perl.org/functions/pack - PHP pack documentation: https://www.php.net/manual/en/function.pack.php