Table of Contents

PHP RFC: Add RFC 4648 compliant data encoding API

Introduction

To improve interoperability between PHP and other programming languages, and to simplify data encoding in PHP, we propose adding native support for encoding and decoding data using the family of RFC 4648 algorithms (Base16, Base32, and Base64). Currently, PHP only supports a limited subset of RFC 4648. With this RFC, we aim to provide full compliance with the standard and introduce the missing encoding algorithms for developers.

Downsides of the current approach

PHP provides partial support for Base64 via the base64_encode and base64_decode functions but they do not provide:

PHP provides partial support for Base16 via the bin2hex and hex2bin functions but they do not provide:

PHP currently lacks native support for Base32 encoding and decoding. In addition to the absence of this algorithm, the ecosystem suffers from a fragmented landscape of user-land packages—many of which claim Base32 compliance without clearly specifying which variant they implement. This lack of a consistent reference can become problematic when applications rely on a specific Base32 configuration for processing incoming or outgoing data. This challenge, which also applies to other RFC 4648 algorithms, makes working with data encoding in PHP more complex than necessary.

PHP currently offers Base58 encoding and decoding via a PECL extension. This proposal seeks to integrate these functions directly into the PHP core, with additional support for the Flickr variant of Base58. Although Base58 is not defined in RFC 4648, it has seen widespread adoption in production systems—most notably in Bitcoin and other cryptocurrencies for encoding addresses and keys, as well as in platforms like Flickr for generating compact, URL-safe identifiers. Compared to Base64, Base58 yields shorter output, avoids visually ambiguous characters (such as 0, O, I, and l), and is inherently safe for use in URLs without additional encoding. The Base58 algorithm is simple, deterministic, and has remained stable and well-understood in the software ecosystem for over a decade.

The goal of this RFC is to propose adding the encoding and decoding functionalities defined in RFC 4648 to the PHP standard library as well as Base58. It also introduces a native, constant-time implementation to address security concerns in data encoding. Once adopted, this feature will simplify data encoding in PHP, enhance interoperability with other programming languages, and strengthen security within the PHP ecosystem.

Proposal

A new, always available Encoding namespace will be added to the standard library. The namespace will contain classes and functions for encoding and decoding string or byte sequences.

For this purpose, the following internal classes and functions are added:

namespace Encoding {
    class EncodingException extends \Exception
    {
    }
 
    class UnableToDecodeException extends EncodingException
    {
    }
 
    enum Base16
    {
        case Upper;
        case Lower;
    }
 
    enum Base32
    {
        case Ascii;
        case Hex;
        case Crockford;
        case Z;
    }
 
    enum Base58
    {
        case Bitcoin;
        case Flickr;
    }
 
    enum Base64
    {
        case Standard;
        case UrlSafe;
        case Imap;
    }
 
    enum PaddingMode
    {
        case VariantControlled;
        case StripPadding;
        case PreservePadding;
    }
 
    enum DecodingMode
    {
        case Lenient;
        case Strict;
    }
 
    enum TimingMode
    {
        case Unprotected;
        case ConstantTime;
    }
}

The following Base16 functions are added:

namespace Encoding {
    function base16_encode(
        string $decoded,
        Base16 $variant = Base16::Upper,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base16_decode(
        string $encoded,
        Base16 $variant = Base16::Upper,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
}

The following Base32 functions are added:

namespace Encoding {
    function base32_encode(
        string $decoded,
        Base32 $variant = Base32::Ascii,
        PaddingMode $paddingMode = PaddingMode::VariantControlled,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base32_decode(
        string $encoded,
        Base32 $variant = Base32::Ascii,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
}

The following Base58 functions are added:

namespace Encoding {
    function base58_encode(
        string $decoded,
        Base58 $variant = Base58::Bitcoin,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base58_decode(
        string $encoded,
        Base58 $variant = Base58::Bitcoin,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
}

The following Base64 functions are added:

namespace Encoding {
    function base64_encode(
        string $decoded,
        Base64 $variant = Base64::Standard,
        PaddingMode $paddingMode = PaddingMode::VariantControlled,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base64_decode(
        string $encoded,
        Base64 $variant = Base64::Standard,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Unprotected,
    ): string;
}

API Design

The RFC chooses to use a functions-based API instead of a class-based API for the following reasons:

The RFC chooses to use enum-based options rather than boolean or arbitrary string values to improve readability and developer experience when using the API.

The general signature semantic chosen for each algorithm is the following:

For encoding:

function algo_encode(string $decoded, Enum ...$options): string;

For decoding:

/**
 * @throws UnableToDecodeException
 */
function algo_decode(string $encoded, Enum ...$options): string;

where:

When decoding is performed a UnableToDecodeException exception is thrown on any error. When not strict, a tolerance toward the encoded string is allowed but decoding can still trigger a UnableToDecodeException exception if the string is still invalid after applying tolerant related operations on the encoded string.

Parameters

String Parameters

Options

Variant support

Base encodings support a range of alphabets and extra configurations that can collectively be referred to as variants. The following Enum are introduced to help developers choose the correct variant to use.

Base16 Variants
<?php
 
enum Base16
{
   case Upper;
   case Lower;
}

Base16 does not define multiple alphabets, but it can be encoded using either uppercase or lowercase letters.

The default variant is Base16::Upper as per RFC 4648 the Base16 alphabet is defined using uppercased letters.

Base32 Variants
<?php
 
enum Base32
{
   case Ascii;
   case Hex;
   case Crockford;
   case Z;

Base32 supports multiple variants, and we provide the most common ones out of the box:

The default variant is Base32::Ascii as per RFC 4648 the Base32 alphabet is defined using uppercased letters.

Base58 Variants
<?php
 
enum Base58
{
   case Bitcoin;
   case Flickr;
}

Base58 supports multiple variants, and we provide the most common ones out of the box:

The default variant is Base58::Bitcoin as it is the most used Base58 variant. Of note, the only difference between the bitcoin and the flickr variants is in the order of the characters the alphabet used.

Base64 Variant
<?php
 
enum Base64
{
   case Standard;
   case UrlSafe;
   case Imap;
}

Base64 supports multiple variants, and we provide the most common ones out of the box. All Base64 variants are case-sensitive.

The default variant is Base64::Standard as per RFC 4648 the Base64 alphabet is not Url-safe.

Padding presence during encoding

<?php
 
enum PaddingMode
{
  case VariantControlled;
  case StripPadding;
  case PreservePadding;
}

Base32 and Base64 use a padding character. The padding character has a technical role. It ensures that the encoded output represents complete blocks of data and allows the decoder to reconstruct the original binary input unambiguously. But to improve readability or interoperability, some variants have chosen to not include them in the result of their encoding process. This option MUST tell the encoding mechanism if the padding character needs to be present or not at the end of the encoding process, when applicable.

The default padding mode is PaddingMode::VariantControlled, indicating that the padding character is added only when mandated by the chosen variant.

Decoding Mode

<?php
 
enum DecodingMode
{
  case Lenient;
  case Strict;
}

For all functions, you MUST be able to specify how decoding is performed. By default, the $decodingMode is set to DecodingMode::Strict, meaning the algorithm strictly follows the rules defined by the RFC. Alternatively, you can set $decodingMode to DecodingMode::Lenient. In this mode, several adjustments are applied to the $encoded string before the actual decoding process begins:

Independent of the mode:

Although the lenient decoding mode is available, it is intentionally restricted to account for the security considerations outlined in section 12 of RFC 4648

The default decoding mode is DecodingMode::Strict.

Timing generation mode

<?php
 
enum TimingMode
{
   case Unprotected;
   case ConstantTime;
}

In some cases, for security reasons, you may prefer to use a more secure algorithm to prevent information leakage during the encoding or decoding process. Since different algorithms can have varying processing times, an optional enum is proposed to allow developers to opt into a more secure approach. For now, a constant-time generation algorithm is provided alongside the standard implementation, which does not protect against timing attacks. Depending on the implementation, this option may not be available for all encoding algorithms.

The default timing mode is TimingMode::Unprotected.

Usage examples

Using the Encoding\base64_encode and Encoding\base64_decode functions

<?php
 
use Encoding\Base64;
use Encoding\PaddingMode;
use Encoding\DecodingMode;
 
use function Encoding\base64_encode;
use function Encoding\base64_decode;
 
$decoded = 'This is an encoded string';
 
echo base64_encode($decoded); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw=="
echo base64_encode($decoded, paddingMode: PaddingMode::StripPadding); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw"
echo base64_decode("VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw"); // throws a UnableToDecodeException exception
echo base64_decode("VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw", decodingMode: DecodingMode::Lenient); // returns 'This is an encoded string'
 
$decoded = chr(0xFF) . chr(0xFF);
echo base64_encode($decoded); // "//8="
echo base64_encode($decoded, variant: Base64::UrlSafe); // "__8"
echo base64_encode($decoded, paddingMode: PaddingMode::StripPadding); // "//8"

Using the Encoding\base16_encode and Encoding\base16_decode functions

<?php
 
use Encoding\Base16;
use Encoding\DecodingMode;
 
use function Encoding\base16_encode;
use function Encoding\base16_decode;
 
$decoded = 'Hello world!';
$encodedUpper = "48656C6C6f20776f726C6421"; // using uppercase characters
$encodedLower = "48656c6c6f20776f726c6421"; // using lowercase characters
 
echo base16_encode($decoded); // returns $encodedUpper RFC4648 dictates that the return value should be uppercased
echo base16_decode($encodedLower, decodingMode: DecodingMode::Strict); // throws a UnableToDecodeException exception
echo base16_decode($encodedLower); // 'Hello world!'

Migration path

Due to the widespread use of the current API, this RFC proposes a gradual migration path to help users transition to the new API. However, the full deprecation and removal of the current functions—base64_encode, base64_decode, hex2bin, and bin2hex—will be handled separately through the traditional RFC deprecation process, which occurs before each PHP version release. This ensures users have sufficient time to adopt the new API.

Base16 functions

bin2hex

The bin2hex function encodes a string using the Base16 algorithm, but it defaults to a lowercase alphabet, which contradicts the recommendation in RFC 4648. To migrate a bin2hex call to the new API while preserving current behaviour use

<?php
 
$decoded = 'Hello world!';
 
//before
echo bin2hex($decoded);
//after
echo Encoding\base16_encode($decoded, variant: Encoding\Base16::Lower);
hex2bin

The hex2bin function is lenient and accepts both lowercase and uppercase input. To migrate a hexbin call to the new API while preserving current behaviour use:

<?php
 
$encoded = "6578616d706c65206865782064617461";
 
//before
echo hex2bin($encoded);
//after
echo Encoding\base16_decode($encoded, decodingMode: Encoding\DecodingMode::Lenient);

Base64 functions

base64_encode

This function already follows the standard Base64 encoding algorithm. Migrating is straightforward:

$decoded = 'This is an encoded string';
//before
echo base64_encode($decoded);
//after
echo Encoding\base64_encode($decoded);
base64_decode

Migrating base64_decode is more complex. The current function behaves leniently by default, accepting non-alphabet characters and misplaced padding:

base64_decode('dG9===0bw??'); // returns 'toto' 

However, the proposed API enforces stricter rules as recommended in RFC 4648, Section 12, This includes rejecting invalid characters and padding in non-terminal positions for security reasons:

Encoding\base64_decode('dG90bw??', decodingMode: Encoding\DecodingMode::Lenient);  // will throw because of outside alphabet letter
Encoding\base64_decode('dG9===0bw', decodingMode: Encoding\DecodingMode::Lenient); // will throw because of unsafe use of the padding character
Encoding\base64_decode('dG90bw', decodingMode: Encoding\DecodingMode::Lenient);    // returns 'toto'

To ease the transition, we propose updating the signature of base64_decode in the global namespace:

base64_encode(string $string, bool|DecodingMode $strict = false);

Impact:

This allows developers to opt into the enum-based approach and move away from the insecure default.

If the function is using strict mode:

$str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==';
//before
echo base64_decode($str, true);
//after
echo Encoding\base64_decode($str);

(No need to specify mode—strict is the default in the new API.)

If using the non-strict (default) legacy mode:

The developer can begin by opting into explicit lenient decoding:

$str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==';
echo base64_decode($str);
echo base64_decode($str, strict: Encoding\Decoding::Lenient); 

If this step causes errors, it indicates the original code was relying on unsafe decoding behaviour.

Then, finalize the migration:

$str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==';
- echo base64_decode($str, strict: Encoding\Decoding::Lenient); 
+ echo Encoding\base64_decode($str, decodingMode: Encoding\Decoding::Lenient); 

To incorporate this change, an additional optional vote will be included in the RFC to determine whether the updated base64_decode signature—supporting both boolean and enum-based decoding modes—should be accepted as part of this proposal.

In other Languages

Go

In its standard package Go supports all RFC4648 algorithms as well as acii85 format

Python

Python has updated its encoding support and now supports all RFC4648 algorithms as well as acii85 format. Python also has an extensive support for many Base85 variants.

JavaScript/NodeJS

Does not support base32 natively nor base85.

C#

Only natively supports base64 (not base64 URL)

Java

Only natively supports base64

Open questions

Backward Incompatible Changes

The namespace Encoding is now reserved

Proposed PHP Version(s)

The next minor PHP version (PHP 8.5).

RFC Impact

To SAPIs

None.

To Existing Extensions

None.

To Opcache

None.

Implementation

Tim Düsterhus has volunteered to do the implementation, but will check whether or not a constant time implementation is possible for all combinations of options.

Future Scope

References