To improve interoperability between PHP and other programming languages and to simplify data encoding usage in PHP we propose to add the ability for the core language to encode and decode data using the family of RFC4648 encoding/decoding algorithms (base16, base32 and base64).
Currently PHP supports only a limited subset of RFC4648 and with this RFC we aim at providing full support for the RFC but also to provide missing encoding algorithms to developers.
PHP provides partial support for Base64 via the base64_encode
and base64_decode
functions but they do not provide:
PHP provides partial support for Base16 via the bin2hex
and hex2bin
functions but they do not provide:
PHP currently does not provide any Base32 feature. Adding to the missing algorithm, is the diversity of PHP user-land packages which all claim support for Base32 algorithms without explicitly referring which variant is used. The situation becomes critical if your application relies on that say encoding for handling data generated from other systems or from other programming languages. The context renders using data encoding in PHP more complex than it should be.
The goal of the RFC is to proposed the encoding/decoding functionalities as described in RFC4648 to the PHP standard library. the RFC also introduces a native constant time encoding implementation of the feature to tackle security challenges in the data encoding fields. Once implemented the feature would improve and simplify data encoding usage in PHP while improving interoperability with other programming languages and security within the PHP ecosystem.
A new, always available Encoding
namespace is to be added to the standard library. The namespace would contain classes and function for encoding and decoding string or byte sequences.
For this purpose, the following internal classes and functions are added:
namespace Encoding { class EncodingException extends \Exception { } class UnableToDecodeException extends EncodingException { } enum Base16Alphabet { case Upper; case Lower; } enum Base32Alphabet { case Ascii; case Hex; case Crockford; case Z; } enum Base64Alphabet { case Standard; case SafeUrl; case Imap; } enum PaddingMode { case AlphabetControlled; case StripPadding; case PreservePadding; } enum DecodingMode { case Lenient; case Strict; } enum TimingMode { case Unprotected; case ConstantTime; } }
The following Base16 functions are added:
namespace Encoding { function base16_encode( string $decoded, Base16Alphabet $alphabet = Base16Alphabet::Upper, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base16_decode( string $encoded, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; }
The following Base32 functions are added:
namespace Encoding { function base32_encode( string $decoded, Base32Alphabet $alphabet = Base32Alphabet::Ascii, PaddingMode $paddingMode = PaddingMode::AlphabetControlled, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base32_decode( string $encoded, Base32Alphabet $alphabet = Base32Alphabet::Ascii, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; }
The following Base64 functions are added:
namespace Encoding { function base64_encode( string $decoded, Base64Alphabet $alphabet = Base64Alphabet::Standard, PaddingMode $paddingMode = PaddingMode::AlphabetControlled, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base64_decode( string $encoded, Base64Alphabet $alphabet = Base64Alphabet::Standard, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; }
The RFC chooses to use a functions-based API instead of a class-based API for the following reasons:
The RFC chooses to use a enum-based options to avoid the use of boolean or arbitrary string values to improve readability and developer experience when using the new API.
The general signature semantic chosen for each algorithm is the following:
For encoding:
function algo_encode(string $decoded, Enum ...$options): string;
For decoding:
/** * @throws UnableToDecodeException */ function algo_decode(string $encoded, Enum ...$options): string;
where algo is the name of the underlying encoding algorithm.
When decoding is performed a UnableToDecodeException
exception is thrown on any error. When not strict, a tolerance toward the encoded string is allowed but decoding can still trigger a UnableToDecodeException
exception if the string is still invalid after applying tolerant related operation on the encoded string.
<?php enum Base16Alphabet { case Upper; case Lower; }
Base16 does not have multiple alphabets but can be encoded using uppercase or lowercase letters.
By default, to be compliant with RFC4648, the default value will be Base16Alphabet::Upper
.
<?php enum Base32Alphabet { case Ascii; case Hex; case Crockford; case Z;
The Base32 can be used with different alphabets. We will support the most used alphabets out of the box
The default value will be Base32Alphabet::Ascii
.
<?php enum Base64Alphabet { case Standard; case SafeUrl; case Imap; }
The Base64 can be used with different alphabets. We will support the most used alphabets out of the box. All Base64 alphabet are case sensitive.
The default value will be Base64Alphabet::Standard
.
<?php enum PaddingMode { case AlphabetControlled; case StripPadding; case PreservePadding; }
Base32 and Base64 use a padding character. The padding character has a technical role. It ensures that the encoded output represents complete blocks of data and allows the decoder to reconstruct the original binary input unambiguously. But to improve readability some alphabets have chosen to not include them in the result of their encoding process. This option MUST tell the encoding mechanism if padding needs to be present or not at the end of the encoding process.
By default the padding mode is PaddingMode::AlphabetControlled
meaning the padding character will be present only if it is
mandatory for the chosen alphabet.
<?php enum DecodingMode { case Lenient; case Strict; }
For all functions, during decoding, you MUST be able to specify how decoding will be performed. By default, the $decodingMode is DecodingMode::Strict
and the algorithm strictly follow the rules set by the RFC.
You can also set the $decodingMode
to DecodingMode::Linient
. When using this decoding mode several manipulation are performed on the `$encoded` string before the actual decoding process:
Regardless of the mode:
\r
, \t
, \n
and the space character are all ignored during the decoding processus.NULL
bytes presence in the $encoded string.The linient process while available is made restrictive to take into account the security considerations covered in section 12 of RFC 4648
By default the decoding mode is DecodingMode::Strict
.
<?php enum TimingMode { case Unprotected; case ConstantTime; }
Sometimes for security reason you MAY want to use a more secure algorithm to avoid leaking information during encoding/decoding process. Because using a different algorithm MAY result in a different processing time an optional Enum is proposed to opt-in into the changed process, for now a constant time generation algorithm is added in addition to the standard generation process which does not protect against timing attacks. Depending on the implementation this option MAY not be made available for every algorithm.
By default the timing mode is TimingMode::Unprotected
.
Using the Encoding\base64_encode
and Encoding\base64_decode
functions
<?php use Encoding; $decoded = 'This is an encoded string'; echo base64_encode($decoded); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==" echo base64_encode($decoded, paddingMode: PaddingMode::StripPadding); // "VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw" $decoded = chr(0xFF) . chr(0xFF); echo base64_encode($decoded); // "//8=" echo base64_encode($decoded, alphabet: Base64Alphabet::SafeUrl); // "__8" echo base64_encode($decoded, paddingMode: PaddingMode::StripPadding); // "//8"
Using the Encoding\base16_encode
and Encoding\base16_decode
functions
<?php use Encoding; $decoded = 'Hello world!'; $encodedUpper = "48656C6C6f20776f726C6421"; // using uppercase characters $encodedLower = "48656c6c6f20776f726c6421"; // usign lowercase characters echo base16_encode($decoded); // returns $encodedUpper RFC4648 dictates that the return value should be uppercased echo base16_decode($encodedLower, decodingMode: DecodingMode::Strict); // throw a UnableToDecodeException exception echo base16_decode($encodedLower); // 'Hello world!'
In its standard package Go supports [all RFC4648 algorithm as well as acii85 format](https://pkg.go.dev/encoding@go1.24.4)
Python has updated its encoding supports and now supports [all RFC4648 algorithm as well as acii85 format](https://docs.python.org/3/library/base64.html). Python also has an extensive support for many base85 alphabet.
Does not support base32 natively nor base85.
Only support natively base64 (not base64 URL)
Only support natively base64
The namespace Encoding
is now reserved
The next minor PHP version (PHP 8.5).
None.
None.
None.
Tim Düsterhus, has volunteered to do the implementation, but will check whether or not a constant time implementation is possible for all combinations of options.