PHP RFC: Add RFC 4648 compliant data encoding API

Version: 1.3
Date: 2025-06-19
Author: Ignace Nyamagana Butera, nyamsprod@gmail.com
Status: Under Discussion
First Published at: https://wiki.php.net/rfc/data_encoding_api

Introduction

To improve interoperability between PHP and other programming languages, and to simplify data encoding in PHP, we propose adding native support for encoding and decoding data using the family of RFC 4648 algorithms (Base16, Base32, and Base64). Currently, PHP only supports a limited subset of RFC 4648. With this RFC, we aim to provide full compliance with the standard and introduce the missing encoding algorithms for developers.

Downsides of the current approach

PHP provides partial support for Base64 via the base64_encode and base64_decode functions but they do not provide:

support for base64 URL alphabet and specific settings
support for base64 IMAP alphabet and specific settings
support for padding character removal during encoding
support for generating time constant encoding string

PHP provides partial support for Base16 via the bin2hex and hex2bin functions but they do not provide:

support for strict decoding mechanism
support for strict encoding (PHP uses lowercased letters whereas the RFC recommends using uppercased letters)
support for generating time constant encoding string

PHP currently lacks native support for Base32 encoding and decoding. In addition to the absence of this algorithm, the ecosystem suffers from a fragmented landscape of user-land packages—many of which claim Base32 compliance without clearly specifying which variant they implement. This lack of a consistent reference can become problematic when applications rely on a specific Base32 configuration for processing incoming or outgoing data. This challenge, which also applies to other RFC 4648 algorithms, makes working with data encoding in PHP more complex than necessary.

PHP currently offers Base58 encoding and decoding via a PECL extension. This proposal seeks to integrate these functions directly into the PHP core, with additional support for the Flickr variant of Base58. Although Base58 is not defined in RFC 4648, it has seen widespread adoption in production systems—most notably in Bitcoin and other cryptocurrencies for encoding addresses and keys, as well as in platforms like Flickr for generating compact, URL-safe identifiers. Compared to Base64, Base58 yields shorter output, avoids visually ambiguous characters (such as 0, O, I, and l), and is inherently safe for use in URLs without additional encoding. The Base58 algorithm is simple, deterministic, and has remained stable and well-understood in the software ecosystem for over a decade.

The goal of this RFC is to propose adding the encoding and decoding functionalities defined in RFC 4648 to the PHP standard library as well as Base58. It also introduces a native, constant-time implementation to address security concerns in data encoding. Once adopted, this feature will simplify data encoding in PHP, enhance interoperability with other programming languages, and strengthen security within the PHP ecosystem.

Proposal

A new, always available Encoding namespace will be added to the standard library. The namespace will contain classes and functions for encoding and decoding string or byte sequences.

For this purpose, the following internal classes and functions are added:

namespace Encoding {
    class EncodingException extends \Exception
    {
    }
 
    class UnableToDecodeException extends EncodingException
    {
    }
 
    enum Base16
    {
        case Upper;
        case Lower;
    }
 
    enum Base32
    {
        case Ascii;
        case Hex;
        case Crockford;
        case Z;
    }
 
    enum Base58
    {
        case Bitcoin;
        case Flickr;
    }
 
    enum Base64
    {
        case Standard;
        case UrlSafe;
        case Imap;
    }
 
    enum PaddingMode
    {
        case VariantControlled;
        case StripPadding;
        case PreservePadding;
    }
 
    enum DecodingMode
    {
        case Forgiving;
        case Strict;
    }
 
    enum TimingMode
    {
        case Variable;
        case Constant;
    }
}

The following Base16 functions are added:

namespace Encoding {
    function base16_encode(
        string $data,
        Base16 $variant = Base16::Upper,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base16_decode(
        string $data,
        Base16 $variant = Base16::Upper,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
}

The following Base32 functions are added:

namespace Encoding {
    function base32_encode(
        string $data,
        Base32 $variant = Base32::Ascii,
        PaddingMode $paddingMode = PaddingMode::VariantControlled,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base32_decode(
        string $data,
        Base32 $variant = Base32::Ascii,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
}

The following Base58 functions are added:

namespace Encoding {
    function base58_encode(
        string $data,
        Base58 $variant = Base58::Bitcoin,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base58_decode(
        string $data,
        Base58 $variant = Base58::Bitcoin,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
}

The following Base64 functions are added:

namespace Encoding {
    function base64_encode(
        string $data,
        Base64 $variant = Base64::Standard,
        PaddingMode $paddingMode = PaddingMode::VariantControlled,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
 
    /**
     * @throws UnableToDecodeException
     */
    function base64_decode(
        string $data,
        Base64 $variant = Base64::Standard,
        DecodingMode $decodingMode = DecodingMode::Strict,
        TimingMode $timingMode = TimingMode::Variable,
    ): string;
}

API Design

The RFC chooses to use a functions-based API instead of a class-based API for the following reasons:

most PHP scripts use encoding in a one off fashion, and using a class-based API would feel overly complicated for a quick encode or decode operation
using functions emphasises that encoding/decoding operations have no internal state or side effects.
creating a class-based API on top of a function-based API, in user-land, is trivial.

The RFC chooses to use enum-based options rather than boolean or arbitrary string values to improve readability and developer experience when using the API.

The general signature semantic chosen for each algorithm is the following:

For encoding:

function algo_encode(string $data, Enum ...$options): string;

For decoding:

/**
 * @throws UnableToDecodeException
 */
function algo_decode(string $data, Enum ...$options): string;

where:

algo is the name of the underlying encoding algorithm.
$options is a list of options, represented by Enum instances, which MAY be encoding specific.

When decoding is performed a UnableToDecodeException exception is thrown on any error. When not strict, a tolerance toward the encoded string is allowed but decoding can still trigger a UnableToDecodeException exception if the string is still invalid after applying tolerant related operations on the encoded string.

Parameters

String Parameters

$data : the string to encode or decode;

Options

Variant support

Base encodings support a range of alphabets and extra configurations that can collectively be referred to as variants. The following Enum are introduced to help developers choose the correct variant to use.

Base16 Variants

<?php
 
enum Base16
{
   case Upper;
   case Lower;
}

Base16 does not define multiple alphabets, but it can be encoded using either uppercase or lowercase letters.

The default variant is Base16::Upper as per RFC 4648 the Base16 alphabet is defined using uppercased letters.

Base32 Variants

<?php
 
enum Base32
{
   case Ascii;
   case Hex;
   case Crockford;
   case Z;

Base32 supports multiple variants, and we provide the most common ones out of the box:

Ascii : the RFC 4648 Standard variant (case sensitive)
Hex : the RFC 4648 Hexadecimal variant (case sensitive)
Crockford: The douglas Crockford base32 (case insensitive)
Z: the Z-base-32 variant (case sensitive)

The default variant is Base32::Ascii as per RFC 4648 the Base32 alphabet is defined using uppercased letters.

Base58 Variants

<?php
 
enum Base58
{
   case Bitcoin;
   case Flickr;
}

Base58 supports multiple variants, and we provide the most common ones out of the box:

Bitcoin : the base58 Bitcoin variant (case sensitive)
Flickr : the Flickr variant (case sensitive)

The default variant is Base58::Bitcoin as it is the most used Base58 variant. Of note, the only difference between the bitcoin and the flickr variants is in the order of the characters the alphabet used.

Base64 Variant

<?php
 
enum Base64
{
   case Standard;
   case UrlSafe;
   case Imap;
}

Base64 supports multiple variants, and we provide the most common ones out of the box. All Base64 variants are case-sensitive.

Standard : the RFC 4648 Standard variant
UrlSafe : the RFC 4648 URL and Filename Safe variant
Imap: the RFC 3501 Imap variant

The default variant is Base64::Standard as per RFC 4648 the Base64 alphabet is not Url-safe.

Padding presence during encoding

<?php
 
enum PaddingMode
{
  case VariantControlled;
  case StripPadding;
  case PreservePadding;
}

Base32 and Base64 use a padding character. The padding character has a technical role. It ensures that the encoded output represents complete blocks of data and allows the decoder to reconstruct the original binary input unambiguously. But to improve readability or interoperability, some variants have chosen to not include them in the result of their encoding process. This option MUST tell the encoding mechanism if the padding character needs to be present or not at the end of the encoding process, when applicable.

Values

VariantControlled — Padding is included or omitted according to the rules defined by the selected variant.
StripPadding — Padding characters are removed from the encoded output.
PreservePadding — Padding characters are retained in the encoded output.

Rules

If the selected variant does not support padding and PaddingMode::PreservePadding is specified, a ValueError MUST be thrown.
If the selected variant requires padding and PaddingMode::StripPadding is specified, a ValueError MUST be thrown.

The default padding mode is PaddingMode::VariantControlled, indicating that the padding character is added only when mandated by the chosen variant.

Decoding Mode

<?php
 
enum DecodingMode
{
  case Forgiving;
  case Strict;
}

For all functions, you MUST be able to specify how decoding is performed. By default, the $decodingMode is set to DecodingMode::Strict, meaning the algorithm strictly follows the rules defined by the RFC. Alternatively, you can set $decodingMode to DecodingMode::Forgiving. In this mode, several adjustments are applied to the $data string before the actual decoding process begins:

When applicable, the $data string is converted into the correct character casing.
When applicable, the padding length is corrected to allow correct decoding.

Independent of the mode:

The alphabet is treated as a sequence of byte values without any special treatment for multi-byte UTF-8.
The following characters: \r, \t, \n and the space character are all ignored during the decoding processus.
There should be a protection against NULL bytes presence in the $data string.

Although the forgiving decoding mode is available, it is intentionally restricted to account for the security considerations outlined in section 12 of RFC 4648

The default decoding mode is DecodingMode::Strict.

Timing generation mode

<?php
 
enum TimingMode
{
   case Variable;
   case Constant;
}

In some cases, for security reasons, you may prefer to use a more secure algorithm to prevent information leakage during the encoding or decoding process. Since different algorithms can have varying processing times, an optional enum is proposed to allow developers to opt into a more secure approach. For now, a constant-time generation algorithm is provided alongside the standard implementation, which does not protect against timing attacks. Depending on the implementation, this option may not be available for all encoding algorithms.

The default timing mode is TimingMode::Variable.

Usage examples

Using the Encoding\base64_encode and Encoding\base64_decode functions

<?php
 
use Encoding\Base64;
use Encoding\PaddingMode;
use Encoding\DecodingMode;
 
use function Encoding\base64_encode;
use function Encoding\base64_decode;
 
$data = 'This is an encoded string';
 
echo base64_encode($data); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw=="
echo base64_encode($data, paddingMode: PaddingMode::StripPadding); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw"
echo base64_decode("VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw"); // throws a UnableToDecodeException exception
echo base64_decode("VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw", decodingMode: DecodingMode::Forgiving); // returns 'This is an encoded string'
 
$data = chr(0xFF) . chr(0xFF);
echo base64_encode($data); // "//8="
echo base64_encode($data, variant: Base64::UrlSafe); // "__8"
echo base64_encode($data, paddingMode: PaddingMode::StripPadding); // "//8"

Using the Encoding\base16_encode and Encoding\base16_decode functions

<?php
 
use Encoding\Base16;
use Encoding\DecodingMode;
 
use function Encoding\base16_encode;
use function Encoding\base16_decode;
 
$data = 'Hello world!';
$encodedUpper = "48656C6C6f20776f726C6421"; // using uppercase characters
$encodedLower = "48656c6c6f20776f726c6421"; // using lowercase characters
 
echo base16_encode($data); // returns $encodedUpper RFC4648 dictates that the return value should be uppercased
echo base16_decode($encodedLower, decodingMode: DecodingMode::Strict); // throws a UnableToDecodeException exception
echo base16_decode($encodedLower); // 'Hello world!'

Migration path

Due to the widespread use of the current API, this RFC proposes a gradual migration path to help users transition to the new API. However, the full deprecation and removal of the current functions—base64_encode, base64_decode, hex2bin, and bin2hex—will be handled separately through the traditional RFC deprecation process, which occurs before each PHP version release. This ensures users have sufficient time to adopt the new API.

Base16 functions

bin2hex

The bin2hex function encodes a string using the Base16 algorithm, but it defaults to a lowercase alphabet, which contradicts the recommendation in RFC 4648. To migrate a bin2hex call to the new API while preserving current behaviour use

<?php
 
$data = 'Hello world!';
 
//before
echo bin2hex($data);
//after
echo Encoding\base16_encode($data, variant: Encoding\Base16::Lower);

hex2bin

The hex2bin function is lenient and accepts both lowercase and uppercase input. To migrate a hexbin call to the new API while preserving current behaviour use:

<?php
 
$data = "6578616d706c65206865782064617461";
 
//before
echo hex2bin($data);
//after
echo Encoding\base16_decode($data, decodingMode: Encoding\DecodingMode::Forgiving);

Base64 functions

base64_encode

This function already follows the standard Base64 encoding algorithm. Migrating is straightforward:

$data = 'This is an encoded string';
//before
echo base64_encode($data);
//after
echo Encoding\base64_encode($data);

base64_decode

Migrating base64_decode is more complex. The current function behaves leniently by default, accepting non-alphabet characters and misplaced padding:

base64_decode('dG9===0bw??'); // returns 'toto'

However, the proposed API enforces stricter rules as recommended in RFC 4648, Section 12, This includes rejecting invalid characters and padding in non-terminal positions for security reasons:

Encoding\base64_decode('dG90bw??', decodingMode: Encoding\DecodingMode::Forgiving);  // will throw because of outside alphabet letter
Encoding\base64_decode('dG9===0bw', decodingMode: Encoding\DecodingMode::Forgiving); // will throw because of unsafe use of the padding character
Encoding\base64_decode('dG90bw', decodingMode: Encoding\DecodingMode::Forgiving);    // returns 'toto'

To ease the transition, we propose updating the signature of base64_decode in the global namespace:

base64_encode(string $string, bool|DecodingMode $strict = false);

Impact:

$strict = Encoding\DecodingMode::Strict would be identical to $strict = true
$strict = Encoding\DecodingMode::Forgiving would have the same behaviour as in the proposed API
$strict = false would preserve the current unsafe behaviour (which is not part of the new API)

This allows developers to opt into the enum-based approach and move away from the insecure default.

If the function is using strict mode:

$str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==';
//before
echo base64_decode($str, true);
//after
echo Encoding\base64_decode($str);

(No need to specify mode—strict is the default in the new API.)

If using the non-strict (default) legacy mode:

The developer can begin by opting into explicit lenient decoding:

$str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==';
echo base64_decode($str);
echo base64_decode($str, strict: Encoding\Decoding::Forgiving);

If this step causes errors, it indicates the original code was relying on unsafe decoding behaviour.

Then, finalize the migration:

$str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==';
- echo base64_decode($str, strict: Encoding\Decoding::Forgiving); 
+ echo Encoding\base64_decode($str, decodingMode: Encoding\Decoding::Forgiving);

To incorporate this change, an additional optional vote will be included in the RFC to determine whether the updated base64_decode signature—supporting both boolean and enum-based decoding modes—should be accepted as part of this proposal.

In other Languages

Go

In its standard package Go supports all RFC4648 algorithms as well as acii85 format

Python

Python has updated its encoding support and now supports all RFC4648 algorithms as well as acii85 format. Python also has an extensive support for many Base85 variants.

JavaScript/NodeJS

Does not support base32 natively nor base85.

C#

Only natively supports base64 (not base64 URL)

Java

Only natively supports base64

Open questions

Should we allow users to specify their own alphabet for base32 ?
Should we allow users to specify their own padding character where applicable ?

Backward Incompatible Changes

The namespace Encoding is now reserved

Proposed PHP Version(s)

The next minor PHP version (PHP 8.6).

RFC Impact

To SAPIs

None.

To Existing Extensions

None.

To Opcache

None.

Implementation

Tim Düsterhus has volunteered to do the implementation, but will check whether or not a constant time implementation is possible for all combinations of options.

Future Scope

Add support for ascii85 used in PDF format and by Git
The current functions for Base64 and Base16 can be deprecated at some distant point of time
Add Base64 support to PHP convert.base64-encode and convert.base64-decode stream filters

References

RFC4648: https://datatracker.ietf.org/doc/html/rfc4648
Douglas CrockFord base32: https://www.crockford.com/base32.html
Z-Base32: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
IMAP Base64: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3
Base58 Bitcoin: https://bitcoinwiki.org/wiki/base58|base58

Table of Contents

PHP RFC: Add RFC 4648 compliant data encoding API

Introduction

Downsides of the current approach

Proposal

API Design

Parameters

String Parameters

Options

Variant support

Base16 Variants

Base32 Variants

Base58 Variants

Base64 Variant

Padding presence during encoding

Values

Rules

Decoding Mode

Timing generation mode

Usage examples

Migration path

Base16 functions

bin2hex

hex2bin

Base64 functions

base64_encode

base64_decode

In other Languages

Go

Python

JavaScript/NodeJS

C#

Java

Open questions

Backward Incompatible Changes

Proposed PHP Version(s)

RFC Impact

To SAPIs

To Existing Extensions

To Opcache

Implementation

Future Scope

References