Even cryptographic hashes have secret information disclosure/analysis risks with string concatenations. For example, following code is weak (could be more vulnerable).
Example 1:
$new_key = hash('sha256', $secret_key . $derivation_key);
To reduce this risk, HMAC was invented. HMAC is known to be more secure compare to previous example.
Example 2:
$new_key = hash_hmac('sha256', $secret_key, $derivation_key);
Note that HMAC divides keys as distinct parameters for better security.
When deriving new keys, there are many cases that developers need additional application- and context-specific information such as a protocol number, algorithm identifiers, user identities, etc, to limit derived key context. For the same reason that Example 1 being weak, following code is weak.
Example 3:
$new_key = hash_hmac('sha256', $secret_key, $derivation_key . $proto_version . $algo . $user_id);
To reduce risk, keys and additional information should be divided into separated parameters.
Example 4:
// $proto_version, $algo, $user_id are non-secret. $info = $proto_version . $algo . $user_id; // $prk must be cryptographically strong to derive strong $new_key. $prk = hash_hmac('sha256', $secret_key, $derivation_key); // $prk is strong and $info is non-secret, therefore $new_key is secured. $new_key = hash_hmac('sha256', $prk, $info);
This is the basic HKDF (HMAC based Key Derivation Function) operation. HKDF additionally supports shorting/extending key length for certain crypto tasks. Therefore, natural HKDF function signature would be
string HKDF(string $hash_algo, string $secret_key, string $derivation_key, string $info [, int $length]);
Newly introduced HKDF function (hash_hkdf) has different signature that is inconsistent with hash() and hash_hmac() functions even if hash_hkdf() is simple hash_hmac() extension.
Change hash_hkdf() signature from
string hash_hkdf(string $algo, string $ikm [, int $length = 0 [, string $info = '' [, string $salt = '' ]]]) Return value: Binary hash value ONLY.
to
string hash_hkdf(string $algo, string $ikm, string $salt , string $info [, int $length = 0 [, bool $raw_output = FALSE]]) Return value: HEX string hash value by default.
Note: Only changed/added parts are described.
Insecure usage is easily possible with current signature.
$key = hash_hkdf('sha256', $weak_key); // Generate insecure $key!! <= This isn't secure at all w/o strong salt. $key = hash_hkdf('sha256', $weak_key, 80); // Generate even more insecure $key!! <= Length does not add strength to OKM. $key = hash_hkdf('sha256', $weak_key, 80, 'Admin'); // Generate even more insecure $key only usable in 'Admin' context!! <= info does not add strength to OKM because it supposed to be non secret.
$salt is described as more important parameter than “info” in RFC 5869, and it recommends there should be appropriate $salt whenever it is possible. $info makes HKDF more valuable because simpler and faster hash_hmac(string $algo , string $data , string $key [, bool $raw_output = false ] ) could be used instead when $info could be omitted. hash_hkdf() could be viewed as hash_hmac() extension. (It is HMAC extension indeed) It should be better to have compatible signatures for API consistency.
i.e. hash_hmac(), as well as hash(), already has optional $raw_output parameter
string hash_hmac(string $algo, string $data, string $key [, bool $raw_output = FALSE ]) Return value: HEX string hash value by default.
Compatible hash_hkdf() signature would be
string hash_hkdf(string $algo, string $ikm, string $salt , string $info [, int $length = 0 [, bool $raw_output = FALSE ]]) Return value: HEX string hash value by default.
For these reasons, $salt and $info are made to required parameters and made return to HEX hash value by default.
hash('sha256', something) and hash_hmac('sha256', something) returns HEX. hash_hkdf() which is HMAC returns “binary” breaks API consistency and disturbs common usage with PHP.
HKDF (HMAC based KDF) is KDF (Key Devivation Function) defined by RFC 5869. Although the RFC describes HKDF in low level crypto context, HKDF is designed as general purpose key derivation function. It derives secure keys for encryption/validation/authentication/etc from other weak/strong key information such as encryption key, API key, master key, password, etc. However, current hash_hkdf() signature is overly optimized for specialized cryptographic operation. (i.e. To derive “binary” output key with “specified length” for “strong input key”. This is hardly be common/most used use case with PHP. In addition, it violates what the RFC recommends.) This PHP RFC proposes more generic HKDF behavior to use with PHP and consistent signature with other PHP hash functions.
In short, HKDF is a general purpose hash function that is designed to create new key(s) from certain key using cryptographic hash function and HMAC. However, newly introduced hash_hkdf() is optimized for very limited usage.
hash_hkdf() is added to master without a PHP RFC already. HKDF is HMAC based KDF hash function (HMAC extension) for general purpose key derivation. However, the signature (and return value) is overly optimized to derive key from strong IKM (input key material) with specified length. hash_hkdf() has following signature currently.
string hash_hkdf(string $algo, string $ikm [, int $length = 0 [, string $info = '' [, string $salt = '']]]) Returns "Binary" hash value ONLY.
To understand what HKDF does, HMAC should be understood. PHP already has HAMC hash function as hash_hmac().
string hash_hmac ( string $algo , string $data , string $key [, bool $raw_output = false ] ) Returns "HEX" or "Binary" string hash value by $raw_output
Please note that $key(required parameter) in hash_hmac() and $salt(the last optional parameter) in hash_hkdf() has the same task.1) Also note that “return value” and return value option parameter inconsistencies.
When $ikm (or $data) is strong and $length is equal to hash size, hash_hkdf() is not needed at all and hash_hmac() is enough for KDF task. Followings are cryptographically equivalent. 2) 3)
$key = hash_hmac('sha256', $ikm, $pre_shared_key, TRUE); // This is basically the same as following hash_hkdf() $key = hash_hkdf('sha256', $ikm, 0, '', $pre_shared_key); // Key as the least important. It returns "Binary" hash value always unlike other hash functions.
Not only hash_hkdf() has inconsistencies with existing function, but it also encourages insecure/poor usages.
Example: Poor 256 bits (32 bytes) AES key derivation from 128 bits AES key.4)
// Weak key expansion from 128 bit key to 256 bit key $key256 = hash_hkdf('sha256', $key128); // Or even worse $key256 = hash_hkdf('sha1', $key128, 32);
Users must not do this unless derivation key(salt) cannot be used5). Although aboves are misuse, it is clear that current hash_hkdf() encourages poor usage/misuse by ignoring strong RFC 5869 recommendation. 6)
Correct ways are followings.
Example: Derive new strong 256 bits AES key from 128 bits AES key.
$derivation_key = random_bytes(32); // Create and save a random 256 bit derivation key used to derive new key // By hash_hmac() $key256 = hash_hmac('sha256', $key128, $derivation_key, TRUE); // By hash_hkdf() $key256 = hash_hkdf('sha256', $key128, 0, '', $derivation_key);
Even though hash_hmac() and HMAC based hash_hkdf() has similar use, they have different signature and return value. These are unnecessary inconsistencies. Moreover, current signature encourages insecure usage in many ways.
In general, “salt” (or “key” with hash_hmac(), or “pre-shared key” with RFC 5869 ) is mandatory or should be use for almost all use cases. “salt” importance is clear because “salt” is often used as pre-shared “key”. The RFC states “designers of applications are therefore encouraged to provide salt values to HKDF if such values can be obtained by the application”. PHP internals are the designers. Nonetheless, current PHP implementation discourages “salt” parameter use by “salt as the last optional parameter”.
“length” is optional mostly. Modified output key length results in weaker key always. It shouldn't be used unless it is necessary for very limited crypto operations.7)
“info” is what HKDF makes useful otherwise hash_hmac() is enough, but it still is optional with regard to derive secure keys. 8)
Although hash_hmac() and hash_hkdf() has equivalent usage, new hash_hkdf() has inconsistent signature and “Binary” only return value.
It makes little sense to have the most important parameter as the last optional parameter and encourages insecure usages/misuse. String return value is suitable for most hash_hkdf() usage with PHP also.
In short, hash_hkdf() has unnecessary inconsistencies with hash()/hash_hmac() and RFC 5869 currently.
If cryptographic hash function is truly cryptographic hash, following hash usage that “concatenate $IKM and $key” should be safe.
$secure_hex_hash = hash('sha256', $IKM . $key);
However, in real world, MD5/SHA-1 which was known as cryptographic hash function is obsolete and above usage is not safe enough. HMAC is made to address this issue by separating $IKM and $key as follows.
$secure_hex_hash = hash_hmac('sha256', $IKM, $key); // Note: $key is individual parameter
While hash_hmac() is good enough for many purposes, there are many cases that require additional non secret information (e.g. Key version, expiration time, applicable user/group) to generate secure hash. HKDF separates $key and additional non secret key information ($info) to keep generate hash value safe. Natural/consistent function signature/usage would be
$secure_binary_hash = hash_hkdf('sha256', $IKM, $key, $info); // Note: $key and $info are individual parameters
In many use cases, IKM could be strong key. However, in real world, IKM could be user defined poor plain text password.
RFC 5869 “Notes for HKDF Users” states,
3.1. To Salt or not to Salt
HKDF is defined to operate with and without random salt. This is done to accommodate applications where a salt value is not available. We stress, however, that the use of salt adds significantly to the strength of HKDF, ensuring independence between different uses of the hash function, supporting “source-independent” extraction, and strengthening the analytical results that back the HKDF design.
Primary purpose of “salt” is to generate stronger key from IKM by “salt” entropy. “Salt” is also often used as pre shared key. i.e. Salt is combined final key.
3.2. The 'info' Input to HKDF
While the 'info' value is optional in the definition of HKDF, it is often of great importance in applications. Its main objective is to bind the derived key material to application- and context-specific information.
Primary purpose of “info” is to distinguish key context so that generated key is only usable to specific context. i.e. Users should not use secret value for “info”.
Although it may seem IKM and salt is interchangeable, there is important difference that salt must be not be user controllable. salt and info may seem they are interchangeable. However unlike salt, info is supposed to be non secret.
md5() is used to obtains shorter result from hash_hkdf(). In practice, developers should consider SHA2 or better.
[yohgaki@dev PHP-master]$ ./php-bin -r 'var_dump(bin2hex(hash_hkdf(“md5”,”123456“)));'
string(32) “1a4f9cd30ab214082d93ba850f1fa2b0”
[yohgaki@dev PHP-master]$ ./php-bin -r 'var_dump(bin2hex(hash_hkdf(“md5”,”123456“, 20)));'
string(40) “1a4f9cd30ab214082d93ba850f1fa2b054cfcd49”
[yohgaki@dev PHP-master]$ ./php-bin -r 'var_dump(bin2hex(hash_hkdf(“md5”,”123456“, 20, “1”)));'
string(40) “d0d1bbee08810d08a1e54f3a401308353cedd30b”
[yohgaki@dev PHP-master]$ ./php-bin -r 'var_dump(bin2hex(hash_hkdf(“md5”,”123456“, 20, “1”, “1”)));'
string(40) “ca16de591ad40f02e599428bf9f50772ebead3ff”
Both “salt” and “info” parameters affect hash_hkdf() result. Although hash_hkdf() does some hash calculations (HMAC with specified hash) to derive secure key from IKM, salt and info, it could be understood as simple hash calculation by using separate parameters with hash_hmac(string $algo, string $data, string $key), i.e. $key is devided into $salt (secret or non secret) and $info (non secret) from user point of view. “length” parameter works in a way that weaken derived key.
Therefore, following code is equivalent. 10)
// Although the value returned differs due to algorithm difference, they are equivalent $key = hash_hmac('sha256', $ikm, $salt); $key = bin2hex(hash_hkdf('sha256', $ikm, 0, '', $salt)); // Need bin2hex() because hash_hkdf() return binary result always
Although followings are supposed to be equivalent if hash is truly cryptographic, but they aren't because hash functions have some characteristics that allow analysis.
// Trying to add 'Admin' only context information to derived $key $key = hash('sha256', $ikm . $salt . 'Admin'); // This should work in theory, but has greater risks than blow $key = hash_hmac('sha256', $ikm, $salt . 'Admin'); // Better, but involves risk by string concatenation $key = bin2hex(hash_hkdf('sha256', $ikm, 0, 'Admin', $salt)); // More secure than aboves because $info parameter('Admin') is designed for non secret
Typical PHP HKDF application can be used with “salt”. Application can provide better security with “salt”, strong salt is mandatory in many cases. There are many PHP HKDF usages that can/should/must use with salt. This PHP RFC only describes 4 examples here. Developers must consider salt use for better security rather than omitting salt without proper consideration. Salt is often a part of final key which is combined key for users to access resources. Developers should use strong salt if it is possible. When IKM is weak, developer must use strong salt to keep IKM and OKM security.
There are more usages for low level crypto, but these would not be common for average PHP developers/applications, so these are not covered. Crypto specialists should be able to use HKDF hash properly regardless of examples here.
Although this usage would not be the most used with PHP, but following 3 examples. This would be one of most common example usage for HKDF application. There will be this usage with PHP also. 11)
User entered password is extremely weak key. Therefore, strong salt is mandatory for security unless such salt cannot be used. Omitting strong salt or non secret salt results in extremely weak encryption key that attackers can crack easily. $info and $length is optional.
This is second example that HKDF could be used with weak IKM (password). Note that this method fundamentally differs from user ID based access control. i.e No user registration required nor stored keys on server side.
With this method, system does not have to store each combinations of $_GET['salt'], $_GET['timestamp'] (info) and $_POST['password'] (ikm). Except IKM is weak so that user can type it, other keys are cryptographically secured.
Step1: Setting up keys
You may display URL(6) and send password(7) via email.
Step2: Validating keys (Check expiration time before this procedure)
Note: You would be better to deploy password brute force attack countermeasure. hash_password()/crypt() is designed for password, but developers shouldn't use it unless performance and DoS is not your concern, because hash_password()/crypt() is designed to be inefficient to calculate password hash.
This is an example that “salt” over “info” results in better design.
Less secure design
Suppose your application had SQL injection vulnerability and your data is stolen including password hash and encrypted user data. Secret encrypted data can be decrypted by attackers.
Better design
Both method uses application wie “secret” $ikm. However, there is notable difference between these 2. This method uses only 1 secret $ikm key (master encryption key) and $info (user ID) is known to public, one stolen key allows attackers to decrypt all encrypted data. Latter method requires 2 secret information(master encryption $ikm key and secret $salt as combined key) to attack.
This example shows how HKDF could be used for CSRF token that has both expiration time and context limitation, i.e. URL specific CSRF token with expiration without server resource.
When session ID is used for CSRF token, there is risk that session ID can leak to others by saving & sending HTML page, by malware web browser plugins that read page content, etc. Therefore, session ID should not be used as CSRF token and CSRF token should have much shorter lifetime than session.13)
Setting up CSRF token
Generate strong unique CSRF token seed, store it in $_SESSION. ( $_SESSION['CSRF_TOKEN_SEED'] = random_bytes(32) ) This ensures CSRF belongs to certain session.
Verifying CSRF token
Secure CSRF token expiration and context (URL) can be defined with this method regardless of session ID lifetime. Developers do not need hash_hkdf() when context (URL) is not required because this could be done with hash_hmac().
Since this CSRF token in valid only for specific URL, the token cannot be used for other URLs. Timeout is configurable for each URL. Developers can have more precise access controls according to URL importance. i.e. Shorter timeout for important, longer for less important.
You can create expiration enabled URL for limited use with similar steps. This could be used to allow anonymous storage object access with relatively secure manner. Example is AWS S3 presigned URL.
On Mon, Jan 16, 2017 at 8:16 PM, Andrey Andreev narf@devilix.net wrote:
There's no comment from you on the PR, inline or not, but I can assure you this was not overlooked.
Salt is optional because RFC 5869 allows it to be optional. There's a reason for each of the current defaults work as they do, as well as the parameter positions:
- Length is in no way actually described as optional, and that makes sense as the function's purpose is to create cryptographic keys, which by nature have fixed lengths. The only reason we could make Length optional is because hash functions' output sizes are known values, and matching the desired OKM length with the hash function size makes for better performance.
- Info can be empty, but the algorithm is pretty much meaningless without it. The purpose of HKDF is to derive 2+ outputs from a single input, with the Info parameter serving as the differentiating factor.
- Salt is ... while recommended, the only thing actually optional.
Salt cannot be optional to derive strong key(s) when IKM is weak. In order to obtain strong output key, either input key or salt must be cryptographically strong. i.e. When input key is weak, strong salt is mandatory by HKDF definition.
While salt could be optional for strong IKM, but as the RFC describes “salt” as “salt adds significantly to the strength of HKDF” and “designers of applications are therefore encouraged to provide salt values to HKDF if such values can be obtained by the application.”, “info” is actually optional. “Salt” should be used always whenever it is possible as the RFC recommends.
In any cases, “info”(context) has less importance than “salt”(entropy or pre-shared key/combined key) at least. With PHP, length is not needed for most HKDF applications.
On Mon, Jan 16, 2017 at 8:08 PM, Nikita Popov nikita.ppv@gmail.com wrote:
Making the salt required makes no sense to me.
HKDF has a number of different applications:
a) Derive multiple strong keys from strong keying material. Typical case for this is deriving independent encryption and authentication keys from a master key. This requires only specification of $length. A salt is neither necessary nor useful in this case, because you start with strong cryptographic keying material.
b) Generating per-session (or similar) keys from a (strong cryptographic) master key. For this purpose you can specify the $info parameter. again, a salt is neither necessary nor useful in this case. (You could probably also use $salt instead of $info in this case, but the design of the function implies that $info should be used for this purpose.)
c) Extracting strong cryptographic keying material from weak cryptographic keying material. Standard example here is extracting strong keys from DH g^xy values (which are non-uniform) and similar. This is the usage that benefits from a $salt.
d) Combinations thereof.
Remember that HKDF is an extract-and-expand algorithm, and the extract step (which uses the salt) is only necessary if the input keying material is weak. We always include the extract step for compatibility with the overall HKDF construction (per the RFCs recommendation), but it's essentially just an unnecessary operation if you work on strong keying material.
The only thing that we may want to discuss is whether we should swap the $info and the $salt parameters. This depends on which usage (b or c) we consider more likely.
a) When deriving keys, “salt” should be supplied whenever it's possible in general. Simply deriving other key w/o salt would not be typical, not recommended at least, usage with PHP because PHP is not used to implement basic cryptographic algorithms. i.e. a) statement only applicable to specific application. Unless weaker encryption/etc is required, e.g. generate 128 bit AES key from 256 bit AES key, modified length results in weaker output key than it could be as described in this PHP RFC.
b) If I assume 'user identity' is used for “info”, then derived key wouldn't be “per session” key, but “per user” key. So assuming session ID is used as “info”. While it works, the RFC states “info” as are non secret information, i.e. “a protocol number, algorithm identifiers, user identities, etc.”. Session ID is secret key. We don't have to follow the RFC recommendations always, but storing secret key in “info”(context) violates the RFC.
For per-session encryption/etc, simple choice for secret input key(IKM) would be per-session master key stored in $_SESSION, random string as “salt” which is a part of final key, optional “info”(context) could be used for additional information such as “confidential”,”public“, etc.
We are implementing RFC 5869. Not following the RFC recommendation does not make sense.
c) True, but “salt” is not only good for generating strong key from weak key according to the RFC. “salt” can be used as part of final key just like crypt() calculates password hash by “salt and password”.
d) True, but you seems to be missed “non secret salt” usage. 'Other Use Cases' section includes many “non secret salt” that are used as final key. Like examples in this PHP RFC, there are valid use cases with very weak IKM.
“the extract step (which uses the salt) is only necessary if the input keying material is weak”, this cannot be true by the RFC.
What the RFC states is
Yet, even a salt value of less quality (shorter in size or with limited entropy) may still make a significant contribution to the security of the output keying material
It does not say salt is only good for weak input keys, but generated output key will have significantly better security.
On Sun, Feb 5, 2017 at 1:20 AM, Tom Worster fsb@thefsb.org wrote: The salt defends against certain attacks on predictable input key material, i.e. weak passwords. But HKDF should not normally be used for passwords because it is unsuitable.
Strong input key is prefered, but input key shouldn't not have to be strong. For weak input keys, strong salt should be used though.
There are valid usages with weak input keys as this PHP RFC examples illustrated. Weak input key is perfectly OK when it is used properly.
Following use case examples are using new hash_hkdf() signiture. $ikm could be any valid keys. Secret master key is assumed for convenience. Generally speaking, secret master key for all derived keys is difficult to maintain, developers are better to avoid it if it is possible.
As you can understand from bad examples, omitting salt as an optional parameter results in nonoptimal implementations.
Following examples use proposed hash_hkdf() function signature.
Note: When you provide both output key and salt (pre-shared key), you should have timeout for IKM. Even if keys (output key and salt) are supposed to be cryptographically strong, brute force attack is always possible. e.g. HMAC based AWS S3 presigned URL secret key expires within a week.
Bad example first
Although it works, developers shouldn't do this because $info is intended for context which is public information as per the RFC.
Correct way is
Note: “salt” is intended to be secret or non secret as per the RFC, but salt should be secret for this application. Since $ikm is very weak, salt must be strong.
Bad example first
Developers shouldn't do this unless they have no choice because once encryption key is stolen, they cannot issue new encryption key.
Correct way is
Developers can issue new encryption keys as many as they need with this way for user.
Combined key, output key and salt, may be disclosed to user as encryption key. Key disclosure is not necessary for many web applications because encryption and decryption will be done by servers.
Bad example first
Although it works, developers shouldn't do this because $info is intended for context(public information) as per the RFC.
Correct way is
Bad example first
Developers shouldn't do this unless they are absolutely sure that URL is accessible with the generated key regardless of stolen key.
Better way is
By keeping track valid $salt values, developers can control key validity.
Good example only
Any keys that have timeout can build similarly.
Note: To generate unique ID, either $ikm or $salt must be unique. Current uniqid() result is not unique enough.
Good example only
Bad example first
Developers shouldn't do this because developers cannot issue new key for the user, cannot revoke keys. Since input is weak, there must be strong salt also.
Better way is
Note: this example's salt is secret partially, non secret partially.
By keeping track salt values, developers can control key validity.
Note: generated keys could be non unique.
Bad example first
Developers shouldn't do this because developers cannot issue new key for the user, cannot revoke keys. Since input is weak, there must be strong salt also.
Use output key, ID and key version as combined key. Developers may revoke keys by key version.
Note: To generate unique key, either $ikm or $salt must be unique.
It is merged into PHP 7.1.2.
Next PHP 7.x and 7.1.x
PHP 7.1.2/7.1.3 has hash_hkdf().
Please comment if any.
Other than hash_hkdf() signature and return value, nothing is affected.
Please comment if any
State whether this project requires a 2/3
Vote start: 2017-03-26 Vote end: 2017-04-07 UTC 23:59:59
TBD
After the project is implemented, this section should contain
Links to external references, discussions or RFCs
Keep this updated with features that were discussed on the mail lists.