Hash functions↑
High-level description↑
Using an insecure hash function can lead to a wide range of business risks, including system tampering, data breaches, unauthorized access, and legal non-compliance.
That’s because cryptographic hash functions are a core cryptographic building block ubiquitous in cryptographic systems across a wide range of applications.
Their main job is to transform long input data into a short, much harder to predict, fixed-length output, known as a hash value or digest. This output is always the same for the same input, but it should appear unpredictable or “random-looking” for security purposes.
Assets that depend on hash functions include but are not limited to:
- (public-key) certificates, such as X.509.
- Secure software updates.
- Digital signatures for document signing documents, like PDF signatures.
- Message integrity and authentication in communications and data at rest, for example, HMAC.
- Password hashing, such as SCrypt.
Because these applications have different security requirements, the severity of a specific attack against a hash function and its resulting business threat will vary based on the application.
In general, secure hash functions are widely available. Therefore, migrating from a weak to a secure hash function, such as SHA-2 or SHA-3, is strongly recommended.
Detailed description↑
As the basic building block for many cryptographic protocols, a hash function is used in a variety of ways for different applications. A secure cryptographic hash function behaves like a random function, meaning:
- The output appears random, with no apparent structure.
- It’s hard to find two different inputs that provide the same output (a collision).
- Given the output, it’s hard to determine the input that generated it.
Given these properties, a hash value can be used as a computationally unique fingerprint of a data object, or to generate random-looking values (also called pseudo-random values), such as when deriving cryptographic keys. Hash functions are commonly used in communication protocols for transcript hashing and key derivation, while in digital signature schemes, they’re used for message compression.
Hash functions also have many other applications, such as constructing password hashing functions like Scrypt or message authentication codes like HMAC.
Because different uses of hash functions have varying security requirements, the severity of an attack against a hash function can vary depending on the application.
Security properties↑
A secure hash function must guarantee three key security properties:
- Collision resistance: Essentially, collision resistance prevents attackers from creating two different pieces of data that appear identical when hashed. It’s practically infeasible to find two different inputs X1 and X2 that produce the same hash value, H(X1) = H(X2), where X1 ≠ X2. This is important because if an attacker can find such a pair as X1, X2 (a collision), they could potentially use it to forge a digital signature or to produce a false positive to pass an integrity check, like in checksum validation.
- Preimage resistance: It’s computationally infeasible to find an input X that produces a given hash value Y. This input X is referred to as a preimage and can be used to overcome a password check if the password hash is known.
- Pseudorandom outputs: The output of the hash function should be indistinguishable from a truly random value to anyone who doesn’t know the input. This means that even if someone sees the output, they shouldn’t be able to detect any patterns or predict future outputs. If an attacker can distinguish the hash output, they could potentially break cryptographic keys derived using the hash function, or forge message authentication codes.
Security considerations↑
The security of cryptographic hash functions with respect to the above Security properties is established by cryptanalysis. The more cryptographers attempt to break a specific hash function and fail, the more convinced we are of its security. A hash function is considered secure if the required resources to attack the function are so enormous that the computation is considered practically infeasible. Computations that currently require 2128 operations are commonly considered infeasible. For the highest security level, a cost of 2256 is typically required.
A hash function is considered theoretically broken the moment an attack is significantly more efficient than a brute-force attack. While such a theoretical attack does not pose an immediate threat, it demonstrates a critical weakness. In the past, known weaknesses have often been extended into practical attacks within relatively short timeframes. As a result, theoretically broken hash functions are typically deprecated by regulators.
Over the past two decades, cryptanalysis of hash functions has made significant progress due to the break of MD5 and SHA1, and the following SHA3 competition run by the National Institute of Standards and Technology (NIST). This process drove substantial research funding into studying hash functions and, as a result, the security of hash functions is now well understood.
Severity of vulnerabilities↑
The severity of a hash function vulnerability largely depends on the application of the hash function. A vulnerability that allows for finding collisions in a hash function directly impacts its applicability as a fingerprinting function. For example, a hash function vulnerable to collision attacks must not be used for message compression in digital signatures or as a fingerprint for software downloads. However, uses like message authentication codes, password hashing, or key derivation are not directly threatened.
Even though some applications may not be immediately at risk, a collision attack that performs significantly better than a generic attack weakens the security of the hash function, potentially indicating a broader structural weakness. As a result, it’s common practice to deprecate a hash function whenever a collision attack is discovered, even though some applications are not immediately threatened.
It should be noted that the scientific community around hash functions uses the term “attack” quite loosely. Often, an “attack” refers to a partial or theoretical attack on an artificially weakened version of a hash function. These partial attacks play a crucial role in understanding the security of a hash function but don’t necessarily threaten the actual, full-strength function.
Security recommendations↑
As hash functions are an integral part of cryptographic systems, virtually all cryptographic guidelines include recommendations for hash functions. The security of a hash function against generic attacks depends on its output (digest) length. Hence, recommendations typically focus on two aspects: the choice of cryptographic hash function and the appropriate digest length.
Recent guidelines recommend using SHA-2 and SHA-3 with varying output lengths for current and future use. For specific requirements on output length from different organizations, refer to Cryptographic key length and the keylength.com website1.
Summary tables↑
The tables below summarize recommendations from NSA2 (CNSA 2.0), BSI3, and ANSSI4 as well as the status of known attacks against the previously widely deployed hash functions SHA1 and MD5. More detailed explanations follow.
Recommended and approved hash functions↑
CNSA 2.0 | BSI | ANNSI | |
---|---|---|---|
SHA2 | SHA-384 / SHA-512 | SHA-256, SHA-512/256, SHA-384, and SHA-512 | SHA-256, SHA-512/256, SHA-384, and SHA-512 |
SHA3 | Not approved | SHA3-256, SHA3-384, and SHA3-512 | SHA3-256, SHA3-384, and SHA3-512 |
Known attack status for SHA1 and MD5↑
Collisions | Preimages | Pseudorandomness | |
---|---|---|---|
SHA1 | Practical attacks | No attacks known | No attacks known |
MD5 | Practical attacks | Theoretical attacks known | Theoretical attacks known |
Note
Recommendations for hash function use will vary depending on your specific use case. Consult the relevant authorities within your organization for your application to ensure secure and appropriate usage.
Overview of commonly used hash function families↑
SHA2↑
SHA25 is a family of hash functions that includes SHA-224, SHA-256, SHA-384, SHA-512, and the truncated output functions SHA-512/224, SHA-512/256. The number after “SHA” indicates the output (digest) length in bits: 224, 256, 384, or 512.
SHA2 is widely recommended as a secure hash function by many institutions publishing cryptographic recommendations, including NSA6, BSI7, and ANSSI8. Requirements for output length vary based on the application, timeframe, and guidelines of the recommending institution.
SHA3 (and SHAKE)↑
SHA39 is a family of hash functions based on the Sponge construction, including SHA3-224, SHA3-256, SHA3-384, SHA3-512, where the number indicates the output length in bits: 224, 256, 384, or 512.
SHA3 is recommended as a secure hash function by many institutions publishing cryptographic recommendations, including BSI10 and ANSSI11. While it is permitted as part of specific algorithms in CNSA 2.0, it is not approved as a standalone cryptographic hash function under CNSA. Requirements for output length vary based on the application, timeframe, and guidelines of the recommending institution.
SHA1 (broken)↑
SHA112 is an older hash function that was previously widely used and is the predecessor of SHA2. However, practical collision attacks have been demonstrated13, leading to the deprecation of SHA1.
Applications using SHA1 as a cryptographic fingerprint, such as in digital signatures, certificates, software updates, entity authentication or file integrity should be migrated immediately to a secure hash function like SHA2 or SHA3 as attacks are now possible (although they still require significant computational resources).
Migrating to a secure hash function is recommended, based on your application and security risk.
MD5 (broken)↑
MD514 was once a widely used hash function, but collision attacks have been demonstrated and even observed in the wild15. As a result, the use of MD5 is now deprecated. Applications that use MD5 for cryptographic fingerprints such as in digital signatures, certificates, software updates, entity authentication, or file integrity should be migrated immediately to a secure hash function like SHA2 or SHA3 as practical attacks are possible.
Even applications that rely solely on the one-way or pseudo-random properties of MD5 should be migrated to a secure hash function quickly, since MD5 has also shown weaknesses in resisting preimage attacks16.
BLAKE2↑
BLAKE217 was one of the finalists in NIST’s SHA3 competition and has undergone extensive cryptanalysis. BLAKE2 is also used internally by the winner of the password hashing competition Argon2.
The BLAKE2 RFC describes two variants (blake2b and blake2s) with different output lengths. While BLAKE2 is considered secure, it is not included in the recommendations of NSA, BSI, or ANSSI. As a result, there are no specific recommendations for its use by these authorities.
RIPEMD (partially broken)↑
RIPEMD is a legacy hash function. The 128-bit variant RIPEMD-128 is considered broken because of its short output length, and practical attacks are possible. Applications that use RIPEMD-128 should be migrated immediately to a secure hash function like SHA2 or SHA3.
RIPEMD-160, with 160-bit outputs, is also considered weak due to its short output length. It is accepted for legacy use by some institutions but never for future use. Therefore, applications that rely on RIPEMD-160 should be migrated to a secure hash function like SHA2 or SHA3.
Further reading↑
- eHash - Website giving an overview of most known cryptographic hash functions and the status of their cryptanalysis with a special page for the SHA3 competition contenders
- Keylength - Website giving an overview of the recommended cryptographic algorithms according to different authorities and researchers
-
NSA. Commercial National Security Algorithm (CNSA) Suite https://media.defense.gov/2021/Sep/27/2002862527/-1/-1/0/CNSS%20WORKSHEET.PDF – last accessed Oct 16, 2024 ↩
-
BSI TR-02102-1: “Cryptographic Mechanisms: Recommendations and Key Lengths” Version: 2024-1, https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TG02102/BSI-TR-02102-1.pdf?__blob=publicationFile&v=7 – last accessed Oct 16, 2024 ↩
-
ANSSI. GUIDE DE SÉLECTION D’ALGORITHMES CRYPTOGRAPHIQUES, https://cyber.gouv.fr/publications/mecanismes-cryptographiques - last accessed Oct 16, 2024 ↩
-
NSA. Commercial National Security Algorithm (CNSA) Suite https://media.defense.gov/2021/Sep/27/2002862527/-1/-1/0/CNSS%20WORKSHEET.PDF – last accessed Oct 16, 2024 ↩
-
BSI TR-02102-1: “Cryptographic Mechanisms: Recommendations and Key Lengths” Version: 2024-1, https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TG02102/BSI-TR-02102-1.pdf?__blob=publicationFile&v=7 – last accessed Oct 16, 2024 ↩
-
ANSSI. GUIDE DE SÉLECTION D’ALGORITHMES CRYPTOGRAPHIQUES, https://cyber.gouv.fr/publications/mecanismes-cryptographiques - last accessed Oct 16, 2024 ↩
-
BSI TR-02102-1: “Cryptographic Mechanisms: Recommendations and Key Lengths” Version: 2024-1, https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TG02102/BSI-TR-02102-1.pdf?__blob=publicationFile&v=7 – last accessed Oct 16, 2024 ↩
-
ANSSI. GUIDE DE SÉLECTION D’ALGORITHMES CRYPTOGRAPHIQUES, https://cyber.gouv.fr/publications/mecanismes-cryptographiques - last accessed Oct 16, 2024 ↩
-
Stevens, M., Bursztein, E., Karpman, P., Albertini, A., Markov, Y. (2017). The First Collision for Full SHA-1. In: Katz, J., Shacham, H. (eds) Advances in Cryptology – CRYPTO 2017. LNCS, vol 10401. Springer, Cham. https://doi.org/10.1007/978-3-319-63688-7_19. Freely available online: https://eprint.iacr.org/2017/190.pdf ↩
-
Sasaki, Y., Aoki, K. (2009). Finding Preimages in Full MD5 Faster Than Exhaustive Search. In: Joux, A. (eds) Advances in Cryptology - EUROCRYPT 2009. LNCS, vol 5479. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01001-9_8. Freely available online: https://iacr.org/archive/eurocrypt2009/54790136/54790136.pdf ↩