Table of Contents

PHP RFC: Add pack()/unpack() support for signed integers with specific endianness

Introduction

This RFC proposes adding support for signed integers with specific endianness to PHP's pack() and unpack() functions. This addresses GitHub issue #17068 and fixes the format letter choices in the current implementation (PR #19368).

Currently, PHP's pack/unpack functions support:

However, there is no support for signed integers with specific endianness, forcing developers to use manual workarounds:

<?php
// Current manual approach for signed little-endian 4-byte integer
$unpackToSignedInt = static function (string $v) {
    $unpacked = unpack('va/Cb/cc', $v);
    return ($unpacked['c'] << 24) | ($unpacked['b'] << 16) | $unpacked['a'];
};
 
// Proposed approach
$value = unpack('w', $binaryData)[1]; // signed little-endian 2-byte
?>

Perl Specification Reference

According to the Perl documentation (https://perldoc.perl.org/functions/pack), Perl handles signed integers with endianness using modifier syntax:

s<   signed 16-bit, little-endian byte order
s>   signed 16-bit, big-endian byte order
l<   signed 32-bit, little-endian byte order
l>   signed 32-bit, big-endian byte order
q<   signed 64-bit, little-endian byte order
q>   signed 64-bit, big-endian byte order

The Perl documentation states: “Starting with Perl 5.10.0, integer and floating-point formats... may all be followed by the '>' or '<' endianness modifiers to respectively enforce big- or little-endian byte-order.”

Why Perl's Approach Cannot Be Used in PHP

While Perl's specification provides the ideal reference, PHP cannot adopt Perl's exact syntax for several technical reasons:

1. Base Letters Already Taken

PHP already uses the base letters for machine-endian signed integers:

2. Parser Architecture Limitations

Perl uses modifier syntax where endianness indicators (<, >) follow the base format letter. PHP's pack format parser is designed around single-character format codes in switch/case statements, not compound expressions like s< or s>.

3. Different Design Philosophy

PHP established a pattern of using completely different letters for endian-specific variants:

Current Implementation Problems

The current PR #19368 introduces arbitrary format letters that don't follow any logical pattern:

m/y  for signed 2-byte (little/big endian)
M/Y  for signed 4-byte (little/big endian)
p/j  for signed 8-byte (little/big endian)

Issues with current choices:

Format Letter Analysis

Currently Used Letters:

Lowercase: a, c, d, e, f, g, h, i, j, l, m, n, p, q, s, v, x, y

Uppercase: A, C, E, G, H, I, J, L, M, N, P, Q, S, V, X, Y, Z

Available Letters:

Lowercase: b, k, o, r, t, u, w, z

Uppercase: B, D, F, K, O, R, T, U, W

Proposed Solution

Replace the current arbitrary letter choices with letters that follow PHP's established conventions and create logical relationships with existing formats:

Proposed Format Letters:

Rationale:

Considered Alternatives

Implementing Perl's modifier syntax (s<, s>) was considered but that would mean a significant overhaul of the parser, as well as starting a complex migration path for users.

That would also lead to confusion because of multiple ways to express the same format (e.g., both <s and w for signed 2-byte little-endian). This would also mean that we could try to match Perl format letters, which may not be feasible for all types.

A new function would solve all the problems above, but it would be out of scope for this RFC and may not be worth adding it, as nearly all formats are available already through pack() and unpack().

Comparison Tables

Perl vs PHP Approaches:

Perl Specification Current PR (Wrong) Proposed Solution
s< (signed 2-byte LE) m w
s> (signed 2-byte BE) y W
l< (signed 4-byte LE) M t
l> (signed 4-byte BE) Y T
q< (signed 8-byte LE) p r
q> (signed 8-byte BE) j R

PHP Format Letter Organization:

Type 2-byte 4-byte 8-byte
Unsigned LE v V P
Unsigned BE n N J
Signed LE w (proposed) t (proposed) r (proposed)
Signed BE W (proposed) T (proposed) R (proposed)

Platform Considerations

32-bit Platform Behavior:

On 32-bit platforms, 8-byte format codes (r/R) will throw a ValueError with the message “64-bit format codes are not available for 32-bit versions of PHP”, consistent with existing behavior for q/Q/P/J.

<?php
// On 32-bit platforms
try {
    pack('r', 1);
} catch (ValueError $e) {
    echo $e->getMessage(); // "64-bit format codes are not available..."
}
?>

Backward Incompatible Changes

This change modifies the format letters introduced in PR #19368. Since that PR hasn't been released yet, there are no backward compatibility concerns for existing code.

The proposed letters (w, W, t, T, r, R) are currently unused in PHP's pack/unpack implementation.

Proposed PHP Version(s)

PHP 8.6 (next minor version)

Voting Choices

Add signed integer endianness support to pack()/unpack() with proposed format letters?
Real name Yes No
Final result: 0 0
This poll has been closed.

Implementation

The implementation is available in PR #19368, which requires updating the format letters from the current arbitrary choices to the proposed systematic approach outlined in this RFC.

Changes required:

References

  1. Perl pack documentation: https://perldoc.perl.org/functions/pack