====== PHP RFC: Add RFC 4648 compliant data encoding API ====== * Version: 1.3 * Date: 2025-06-19 * Author: Ignace Nyamagana Butera, nyamsprod@gmail.com * Status: Under Discussion * First Published at: https://wiki.php.net/rfc/data_encoding_api ===== Introduction ===== To improve interoperability between PHP and other programming languages, and to simplify data encoding in PHP, we propose adding native support for encoding and decoding data using the family of RFC 4648 algorithms (Base16, Base32, and Base64). Currently, PHP only supports a limited subset of RFC 4648. With this RFC, we aim to provide full compliance with the standard and introduce the missing encoding algorithms for developers. ==== Downsides of the current approach ==== PHP provides partial support for Base64 via the ''base64_encode'' and ''base64_decode'' functions but they do not provide: * support for base64 URL alphabet and specific settings * support for base64 IMAP alphabet and specific settings * support for padding character removal during encoding * support for generating time constant encoding string PHP provides partial support for Base16 via the ''bin2hex'' and ''hex2bin'' functions but they do not provide: * support for strict decoding mechanism * support for strict encoding (PHP uses lowercased letters whereas the RFC recommends using uppercased letters) * support for generating time constant encoding string PHP currently lacks native support for Base32 encoding and decoding. In addition to the absence of this algorithm, the ecosystem suffers from a fragmented landscape of user-land packages—many of which claim Base32 compliance without clearly specifying which variant they implement. This lack of a consistent reference can become problematic when applications rely on a specific Base32 configuration for processing incoming or outgoing data. This challenge, which also applies to other RFC 4648 algorithms, makes working with data encoding in PHP more complex than necessary. PHP currently offers Base58 encoding and decoding via a PECL extension. This proposal seeks to integrate these functions directly into the PHP core, with additional support for the Flickr variant of Base58. Although Base58 is not defined in RFC 4648, it has seen widespread adoption in production systems—most notably in Bitcoin and other cryptocurrencies for encoding addresses and keys, as well as in platforms like Flickr for generating compact, URL-safe identifiers. Compared to Base64, Base58 yields shorter output, avoids visually ambiguous characters (such as 0, O, I, and l), and is inherently safe for use in URLs without additional encoding. The Base58 algorithm is simple, deterministic, and has remained stable and well-understood in the software ecosystem for over a decade. The goal of this RFC is to propose adding the encoding and decoding functionalities defined in RFC 4648 to the PHP standard library as well as Base58. It also introduces a native, [[https://github.com/paragonie/constant_time_encoding/|constant-time implementation]] to address security concerns in data encoding. Once adopted, this feature will simplify data encoding in PHP, enhance interoperability with other programming languages, and strengthen security within the PHP ecosystem. ===== Proposal ===== A new, always available ''Encoding'' namespace will be added to the standard library. The namespace will contain classes and functions for encoding and decoding string or byte sequences. For this purpose, the following internal classes and functions are added: namespace Encoding { class EncodingException extends \Exception { } class UnableToDecodeException extends EncodingException { } enum Base16 { case Upper; case Lower; } enum Base32 { case Ascii; case Hex; case Crockford; case Z; } enum Base58 { case Bitcoin; case Flickr; } enum Base64 { case Standard; case UrlSafe; case Imap; } enum PaddingMode { case VariantControlled; case StripPadding; case PreservePadding; } enum DecodingMode { case Lenient; case Strict; } enum TimingMode { case Unprotected; case ConstantTime; } } The following Base16 functions are added: namespace Encoding { function base16_encode( string $decoded, Base16 $variant = Base16::Upper, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base16_decode( string $encoded, Base16 $variant = Base16::Upper, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; } The following Base32 functions are added: namespace Encoding { function base32_encode( string $decoded, Base32 $variant = Base32::Ascii, PaddingMode $paddingMode = PaddingMode::VariantControlled, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base32_decode( string $encoded, Base32 $variant = Base32::Ascii, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; } The following Base58 functions are added: namespace Encoding { function base58_encode( string $decoded, Base58 $variant = Base58::Bitcoin, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base58_decode( string $encoded, Base58 $variant = Base58::Bitcoin, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; } The following Base64 functions are added: namespace Encoding { function base64_encode( string $decoded, Base64 $variant = Base64::Standard, PaddingMode $paddingMode = PaddingMode::VariantControlled, TimingMode $timingMode = TimingMode::Unprotected, ): string; /** * @throws UnableToDecodeException */ function base64_decode( string $encoded, Base64 $variant = Base64::Standard, DecodingMode $decodingMode = DecodingMode::Strict, TimingMode $timingMode = TimingMode::Unprotected, ): string; } ==== API Design ==== The RFC chooses to use a functions-based API instead of a class-based API for the following reasons: * most PHP scripts use encoding in a one off fashion, and using a class-based API would feel overly complicated for a quick encode or decode operation * using functions emphasises that encoding/decoding operations have no internal state or side effects. * creating a class-based API on top of a function-based API, in user-land, is trivial. The RFC chooses to use enum-based options rather than boolean or arbitrary string values to improve readability and developer experience when using the API. The general signature semantic chosen for each algorithm is the following: For encoding: function algo_encode(string $decoded, Enum ...$options): string; For decoding: /** * @throws UnableToDecodeException */ function algo_decode(string $encoded, Enum ...$options): string; where: * __algo__ is the name of the underlying encoding algorithm. * __$options__ is a list of options, represented by ''Enum'' instances, which MAY be encoding specific. When decoding is performed a ''UnableToDecodeException'' exception is thrown on any error. When not strict, a tolerance toward the encoded string is allowed but decoding can still trigger a ''UnableToDecodeException'' exception if the string is still invalid after applying tolerant related operations on the encoded string. ==== Parameters ==== === String Parameters === * **$decoded** : the string to encode; * **$encoded** : the string to decode; ==== Options ==== === Variant support === Base encodings support a range of alphabets and extra configurations that can collectively be referred to as variants. The following Enum are introduced to help developers choose the correct variant to use. == Base16 Variants == Base16 does not define multiple alphabets, but it can be encoded using either uppercase or lowercase letters. The default variant is ''Base16::Upper'' as per RFC 4648 the Base16 alphabet is defined using uppercased letters. == Base32 Variants == Base32 supports multiple variants, and we provide the most common ones out of the box: * Ascii : the RFC 4648 Standard variant (case sensitive) * Hex : the RFC 4648 Hexadecimal variant (case sensitive) * Crockford: [[https://www.crockford.com/base32.html|The douglas Crockford base32]] (case insensitive) * Z: the [[https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt|Z-base-32 variant]] (case sensitive) The default variant is ''Base32::Ascii'' as per RFC 4648 the Base32 alphabet is defined using uppercased letters. == Base58 Variants == Base58 supports multiple variants, and we provide the most common ones out of the box: * Bitcoin : the [[https://bitcoinwiki.org/wiki/base58|base58 Bitcoin]] variant (case sensitive) * Flickr : the Flickr variant (case sensitive) The default variant is ''Base58::Bitcoin'' as it is the most used Base58 variant. Of note, the only difference between the bitcoin and the flickr variants is in the order of the characters the alphabet used. == Base64 Variant == Base64 supports multiple variants, and we provide the most common ones out of the box. **All Base64 variants are case-sensitive.** * Standard : the RFC 4648 Standard variant * UrlSafe : the RFC 4648 URL and Filename Safe variant * Imap: the [[https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3|RFC 3501]] Imap variant The default variant is ''Base64::Standard'' as per RFC 4648 the Base64 alphabet is not Url-safe. === Padding presence during encoding === Base32 and Base64 use a padding character. The padding character has a technical role. It ensures that the encoded output represents complete blocks of data and allows the decoder to reconstruct the original binary input unambiguously. But to improve readability or interoperability, some variants have chosen to not include them in the result of their encoding process. This option MUST tell the encoding mechanism if the padding character needs to be present or not at the end of the encoding process, when applicable. The default padding mode is ''PaddingMode::VariantControlled'', indicating that the padding character is added only when mandated by the chosen variant. === Decoding Mode === For all functions, you MUST be able to specify how decoding is performed. By default, the **$decodingMode** is set to ''DecodingMode::Strict'', meaning the algorithm strictly follows the rules defined by the RFC. Alternatively, you can set **$decodingMode** to ''DecodingMode::Lenient''. In this mode, several adjustments are applied to the **$encoded** string before the actual decoding process begins: * When applicable, the **$encoded** string is converted into the correct character casing. * When applicable, the padding length is corrected to allow correct decoding. Independent of the mode: * The alphabet is treated as a sequence of byte values without any special treatment for multi-byte UTF-8. * The following characters: ''\r'', ''\t'', ''\n'' and the space character are all ignored during the decoding processus. * There should be a protection against ''NULL'' bytes presence in the **$encoded** string. Although the lenient decoding mode is available, it is intentionally restricted to account for [[https://datatracker.ietf.org/doc/html/rfc4648.html#section-12|the security considerations outlined in section 12 of RFC 4648]] The default decoding mode is ''DecodingMode::Strict''. === Timing generation mode === In some cases, for security reasons, you may prefer to use a more secure algorithm to prevent information leakage during the encoding or decoding process. Since different algorithms can have varying processing times, an optional enum is proposed to allow developers to opt into a more secure approach. For now, a constant-time generation algorithm is provided alongside the standard implementation, which does not protect against [[https://blog.ircmaxell.com/2014/11/its-all-about-time.html|timing attacks]]. Depending on the implementation, this option may not be available for all encoding algorithms. The default timing mode is ''TimingMode::Unprotected''. ==== Usage examples ==== Using the ''Encoding\base64_encode'' and ''Encoding\base64_decode'' functions Using the ''Encoding\base16_encode'' and ''Encoding\base16_decode'' functions ==== Migration path ==== Due to the widespread use of the current API, this RFC proposes a gradual migration path to help users transition to the new API. However, the full deprecation and removal of the current functions—''base64_encode'', ''base64_decode'', ''hex2bin'', and ''bin2hex''—will be handled separately through the traditional RFC deprecation process, which occurs before each PHP version release. This ensures users have sufficient time to adopt the new API. === Base16 functions === == bin2hex == The ''bin2hex'' function encodes a string using the Base16 algorithm, but it defaults to a lowercase alphabet, which contradicts the recommendation in RFC 4648. To migrate a ''bin2hex'' call to the new API while preserving current behaviour use == hex2bin == The ''hex2bin'' function is lenient and accepts both lowercase and uppercase input. To migrate a ''hexbin'' call to the new API while preserving current behaviour use: === Base64 functions === == base64_encode == This function already follows the standard Base64 encoding algorithm. Migrating is straightforward: $decoded = 'This is an encoded string'; //before echo base64_encode($decoded); //after echo Encoding\base64_encode($decoded); == base64_decode == Migrating ''base64_decode'' is more complex. The current function behaves leniently by default, accepting non-alphabet characters and misplaced padding: base64_decode('dG9===0bw??'); // returns 'toto' However, the proposed API enforces stricter rules as recommended in [[https://www.rfc-editor.org/rfc/rfc4648.html#section-12|RFC 4648, Section 12]], This includes rejecting invalid characters and padding in non-terminal positions for security reasons: Encoding\base64_decode('dG90bw??', decodingMode: Encoding\DecodingMode::Lenient); // will throw because of outside alphabet letter Encoding\base64_decode('dG9===0bw', decodingMode: Encoding\DecodingMode::Lenient); // will throw because of unsafe use of the padding character Encoding\base64_decode('dG90bw', decodingMode: Encoding\DecodingMode::Lenient); // returns 'toto' To ease the transition, we propose updating the signature of ''base64_decode'' in the global namespace: base64_encode(string $string, bool|DecodingMode $strict = false); Impact: * ''$strict = Encoding\DecodingMode::Strict'' would be identical to ''$strict = true'' * ''$strict = Encoding\DecodingMode::Lenient'' would have the same behaviour as in the proposed API * ''$strict = false'' would preserve the current unsafe behaviour (which is not part of the new API) This allows developers to opt into the enum-based approach and move away from the insecure default. If the function is using strict mode: $str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw=='; //before echo base64_decode($str, true); //after echo Encoding\base64_decode($str); (No need to specify mode—strict is the default in the new API.) If using the non-strict (default) legacy mode: The developer can begin by opting into explicit lenient decoding: $str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw=='; echo base64_decode($str); echo base64_decode($str, strict: Encoding\Decoding::Lenient); If this step causes errors, it indicates the original code was relying on unsafe decoding behaviour. Then, finalize the migration: $str = 'VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw=='; - echo base64_decode($str, strict: Encoding\Decoding::Lenient); + echo Encoding\base64_decode($str, decodingMode: Encoding\Decoding::Lenient); To incorporate this change, an additional optional vote will be included in the RFC to determine whether the updated ''base64_decode'' signature—supporting both boolean and enum-based decoding modes—should be accepted as part of this proposal. ==== In other Languages ==== === Go === In its standard package Go supports [[https://pkg.go.dev/encoding@go1.24.4|all RFC4648 algorithms as well as acii85 format]] === Python === Python has updated its encoding support and now supports [[https://docs.python.org/3/library/base64.html|all RFC4648 algorithms as well as acii85 format]]. Python also has an extensive support for many Base85 variants. === JavaScript/NodeJS === Does not support base32 natively nor base85. === C# === Only natively supports base64 (not base64 URL) === Java === Only natively supports base64 ===== Open questions ===== * Should we allow users to specify their own alphabet for base32 ? * Should we allow users to specify their own padding character where applicable ? ===== Backward Incompatible Changes ===== The namespace **''Encoding''** is now reserved ===== Proposed PHP Version(s) ===== The next minor PHP version (PHP 8.5). ===== RFC Impact ===== ==== To SAPIs ==== None. ==== To Existing Extensions ==== None. ==== To Opcache ==== None. ===== Implementation ===== Tim Düsterhus has volunteered to do the implementation, but will check whether or not a constant time implementation is possible for all combinations of options. ===== Future Scope ===== * Add support for [[https://en.wikipedia.org/wiki/Ascii85|ascii85]] used in PDF format and by Git * The current functions for Base64 and Base16 can be deprecated at some distant point of time * Add ''Base64'' support to PHP ''convert.base64-encode'' and ''convert.base64-decode'' stream filters ===== References ===== * RFC4648: https://datatracker.ietf.org/doc/html/rfc4648 * Douglas CrockFord base32: https://www.crockford.com/base32.html * Z-Base32: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt * IMAP Base64: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3 * Base58 Bitcoin: https://bitcoinwiki.org/wiki/base58|base58