PHP RFC: Base16, Base32, and Base64 Data Encodings
- Version: 1.0
- Date: 2025-06-15
- Author: Ignace Nyamagana Butera, nyamsprod@gmail.com
- Status: Draft
- First Published at: https://wiki.php.net/rfc/data_encoding_api
Introduction
To improve interoperability between PHP and other programming languages and to simplify data encoding usage in PHP we propose to add the ability for the core language to encode and decode data using the family of RFC4648 encoding/decoding algorithms (base16, base32 and base64).
Currently PHP supports only a limited subset of RFC4648 and with this RFC we aim at providing full support for the RFC but also to provide missing encoding algorithms to developers.
Downsides of the current approach
PHP provides partial support for Base64 via the base64_encode
and base64_decode
functions but they do not provide:
- support for base64 Url
- support for padding character removal during encoding
- support for generating time constant encoding string.
PHP provides partial support for Base16 via the bin2hex
and hex2bin
functions but they do not provide:
- support for strict decoding mechanism
- support for generating time constant encoding string
PHP currently does not provide any Base32 feature. Adding to the missing algorithm, is the diversity of PHP user-land packages which all claim support for Base32 algorithms without explicitly referring which variant is used. The situation becomes critical if your application relies on that say encoding for handling data generated from other systems or from other programming languages. The context renders using data encoding in PHP more complex than it should be.
The goal of the RFC is to proposed the encoding/decoding functionalities as described in RFC4648 to the PHP standard library. the RFC also introduces a native constant time encoding implementation of the feature to tackle security challenges in the data encoding fields. Once implemented the feature would improve and simplify data encoding usage in PHP while improving interoperability with other programming languages and security within the PHP ecosystem.
Proposal
A new, always available Encoding
namespace is to be added to the standard library. The namespace would contain classes and function for encoding and decoding string or byte sequences.
For this purpose, the following internal classes and functions are added:
namespace Encoding { class EncodingException extends \Exception { } class UnableToDecodeException extends EncodingException { } enum Base16Alphabet { case Upper; case Lower; } enum Base32Alphabet { case Ascii; case Hex; case Crockford; case Z; } enum Base64Alphabet { case Standard; case SafeUrl; case Imap; } enum PaddingMode { case AlphabetControlled; case StripPadding; case PreservePadding; } enum DecodingMode { case Lenient; case Strict; } enum TimingMode { case Unprotected; case ConstantTime; } }
The following Base16 functions are added:
namespace Encoding { function base16_encode( string $decoded, Base16Alphabet $alphabet = Base16Alphabet::Upper, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base16_decode( string $encoded, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; }
The following Base32 functions are added:
namespace Encoding { function base32_encode( string $decoded, Base32Alphabet $alphabet = Base32Alphabet::Ascii, PaddingMode $paddingMode = PaddingMode::AlphabetControlled, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base32_decode( string $encoded, Base32Alphabet $alphabet = Base32Alphabet::Ascii, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; }
The following Base64 functions are added:
namespace Encoding { function base64_encode( string $decoded, Base64Alphabet $alphabet = Base64Alphabet::Standard, PaddingMode $paddingMode = PaddingMode::AlphabetControlled, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base64_decode( string $encoded, Base64Alphabet $alphabet = Base64Alphabet::Standard, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; }
Function-based API
The RFC chooses to use a functions-based API instead of a class-based API for the following reasons:
- most PHP scripts use encoding in a one off fashion using a class-based API would feel overly complicated for a quick encode or decode operation
- using functions emphasise that encoding/decoding has no internal state or side effects.
- creating a class-based API on top of a function-based API, in user-land, is trivial.
The RFC chooses to use a enum-based options to avoid the use of boolean or arbitrary string values to improve readability and developer experience when using the new API.
The general signature semantic chosen for each algorithm is the following:
For encoding:
function algo_encode(string $decoded, Enum ...$options): string;
For decoding:
/** * @throws UnableToDecodeException */ function algo_decode(string $encoded, Enum ...$options): string;
where algo is the name of the underlying encoding algorithm.
When decoding is performed a UnableToDecodeException
exception is thrown on any error. When not strict, a tolerance toward the encoded string is allowed but decoding can still trigger a UnableToDecodeException
exception if the string is still invalid after applying tolerant related operation on the encoded string.
Parameters
String Parameters
- $decoded : the string to encode;
- $encoded : the string to decode;
Options
Alphabets support
Base16 Alphabets
<?php enum Base16Alphabet { case Upper; case Lower; }
Base16 does not have multiple alphabets but can be encoded using uppercase or lowercase letters.
By default, to be compliant with RFC4648, the default value will be Base16Alphabet::Upper
.
Base32 Alphabets
<?php enum Base32Alphabet { case Ascii; case Hex; case Crockford; case Z;
The Base32 can be used with different alphabets. We will support the most used alphabets out of the box
- ASCII : the RFC4648 Standard alphabet (case sensitive)
- HEX : the RFC4648 Hexadecimal alphabet (case sensitive)
- Crockford: The douglas Crockford alphabet (case insensitive)
- Z: the Z-base-32 alphabet (case sensitive)
Base64 Alphabets
<?php enum Base64Alphabet { case Standard; case SafeUrl; case Imap; }
The Base64 can be used with different alphabets. We will support the most used alphabets out of the box. All Base64 alphabet are case sensitive.
- Standard : the RFC4648 Standard alphabet
- SafeUrl : the RFC4648 SafeURL alphabet
- Imap: the RFC3501 Imap version
Padding presence during encoding
<?php enum PaddingMode { case AlphabetControlled; case StripPadding; case PreservePadding; }
Base32 and Base64 use a padding character. The padding character has a technical role. It ensures that the encoded output represents complete blocks of data and allows the decoder to reconstruct the original binary input unambiguously. But to improve readability some alphabets have chosen to not include them in the result of their encoding process. This option MUST tell the encoding mechanism if padding needs to be present or not at the end of the encoding process.
By default the padding mode is PaddingMode::AlphabetControlled
meaning the padding character will be present only if it is
mandatory for the chosen alphabet.
Decoding Mode
<?php enum DecodingMode { case Lenient; case Strict; }
For all functions, during decoding, you MUST be able to specify how decoding will be performed. By default, the $decodingMode is DecodingMode::Strict
and the algorithm strictly follow the rules set by the RFC.
You can also set the $decodingMode
to DecodingMode::Linient
. When using this decoding mode several manipulation are performed on the `$encoded` string before the actual decoding process:
- When applicable, the $encoded string is converted into the correct character casing.
- When applicable, the padding length is corrected to allow correct decoding.
Regardless of the mode:
- The alphabet is treated as a sequence of byte values without any special treatment for multi-byte UTF-8.
- The following characters:
\r
,\t
,\n
and the space character are all ignored during the decoding processus. - There should be a protection against
NULL
bytes presence in the $encoded string.
The linient process while available is made restrictive to take into account the security considerations covered in section 12 of RFC 4648
Timing generation mode
<?php enum TimingMode { case Unprotected; case ConstantTime; }
Sometimes for security reason you MAY want to use a more secure algorithm to avoid leaking information during encoding/decoding process. Because using a different algorithm MAY result in a different processing time an optional Enum is proposed to opt-in into the changed process, for now a constant time generation algorithm is added in addition to the standard generation process which does not protect against timing attacks. Depending on the implementation this option MAY not be made available for every algorithm.
Usage examples
Using the Encoding\base64_encode
and Encoding\base64_decode
functions
<?php use Encoding; $decoded = 'This is an encoded string'; echo base64_encode($decoded); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==" echo base64_encode($decoded, paddingMode: PaddingMode::StripPadding); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw" $decoded = chr(0xFF) . chr(0xFF); echo base64_encode($decoded); // "//8=" echo base64_encode($decoded, alphabet: Base64Alphabet::SafeUrl); // "__8" echo base64_encode($decoded, paddingMode: PaddingMode::StripPadding); // "//8"
Using the Encoding\base16_encode
and Encoding\base16_decode
functions
<?php use Encoding; $decoded = 'Hello world!'; $encodedUpper = "48656C6C6f20776f726C6421"; // using uppercase characters $encodedLower = "48656c6c6f20776f726c6421"; // usign lowercase characters echo base16_encode($decoded); // returns $encodedUpper RFC4648 dictates that the return value should be uppercased echo base16_decode($encodedLower, decodingMode: DecodingMode::Strict); // throw a UnableToDecodeException exception echo base16_decode($encodedLower); // 'Hello world!'
In other Languages
Go
In its standard package Go supports [all RFC4648 algorithm as well as acii85 format](https://pkg.go.dev/encoding@go1.24.4)
Python
Python has updated its encoding supports and now supports [all RFC4648 algorithm as well as acii85 format](https://docs.python.org/3/library/base64.html). Python also has an extensive support for many base85 alphabet.
JavaScript/NodeJs
Does not support base32 natively nor base85.
C#
Only support natively base64 (not base64 URL)
Java
Only support natively base64
Open questions
- Should we allow users to specify their own alphabet for base32 ?
- Should we allow users to specify their own padding character where applicable ?
Backward Incompatible Changes
The namespace Encoding
is now reserved
Proposed PHP Version(s)
The next minor PHP version (PHP 8.5).
RFC Impact
To SAPIs
None.
To Existing Extensions
None.
To Opcache
None.
Implementation
Tim Düsterhus, has volunteered to do the implementation, but will check whether or not a constant time implementation is possible for all combinations of options.
Future Scope
References
- Douglas CrockFord base32: https://www.crockford.com/base32.html