CS260: Overview | Saylor Academy

Use this section to extend your understanding of hash function design. This section specifically addresses how to take a useful compression function and iteratively apply it in a way that can result in a collision-resistant hash.

Building a hash function, especially one that accepts inputs of arbitrary length, seems like a challenging task. In this section, we'll see one approach for constructing hash functions, called the Merkle-Damgård construction.

Instead of a full-fledged hash function, imagine that we had a collision-resistant function whose inputs were of a single fixed length, but longer than its outputs. In other words, \(h : {0, 1}^{n+t} → {0, 1}^n\) , where \(t > 0\). We call such an h a compression function. This is not compression in the usual sense of the word - we are not concerned about recovering the input from the output. We call it a compression function because it "compresses" its input by t bits (analogous to how a pseudorandom generator "stretches" its input by some amount).

The following construction is one way to build a full-fledged hash function (supporting inputs of arbitrary length) out of such a compression function:

Construction 11.2 (Merkle-Damgård)

Let \(h : {0, 1}^{n+t} → {0, 1}^n\) be a compression function. Then the Merkle-Damgård transformation of \(h\) is \(MD_h : {0, 1}^∗ → {0, 1}^n\), where:

Merkle-Damgard Construction

The idea of the Merkle-Damgård construction is to split the input x into blocks of size \(t\). The end of the string is filled out with 0s if necessary. A final block called the "padding block" is added, which encodes the (original) length of \(x\) in binary.

Example

Suppose we have a compression function \(h : {0, 1}^{48} → {0, 1}^{32}\), so that \(t = 16\). We build a Merkle-Damgård hash function out of this compression function and wish to compute the hash of the following 5-byte (40-bit) string:

x = 01100011 11001101 01000011 10010111 01010000

We must first pad x appropriately (MDPAD(X)):

Since x is not a multiple of \(t = 16\) bits, we need to add 8 bits to make it so.
Since \(|x| = 40\), we need to add an extra 16-bit block that encodes the number 40 in binary (101000).

After this padding, and splitting the result into blocks of length 16, we have the following:

\( \underbrace{01100011 \; 11001101}_{x_1} \; \; \underbrace{01000011 \; 10010111}_{x_2} \; \; \underbrace{0101000 \; 000000}_{x_3} \; \; \underbrace{00000000 \; 00101000}_{x_4} \)

The final hash of x is computed as follows:

Merkle-Damgard Construction

We are presenting a simplified version, in which \(MD_h\) accepts inputs whose maxi- mum length is \(2^t − 1\) bits (the length of the input must fit into t bits). By using multiple padding blocks (when necessary) and a suitable encoding of the original string length, the construction can be made to accommodate inputs of arbitrary length (see the exercises).

The value y₀ is called the initialization vector (IV), and it is a hard-coded part of the algorithm.

As discussed above, we will not be making provable security claims using the library- style definitions. However, we can justify the Merkle-Damgård construction with the following claim:

Claim 11.3

Suppose \(h\) is a compression function and \(MD_h\) is the Merkle-Damgård construction applied to \(h\). Given a collision \(x, x′\) in \(MD_h\) , it is easy to find a collision in \(h\). In other words, if it is hard to find a collision in \(h\), then it must also be hard to find a collision in \(MD_h\).

Proof

Suppose that \(x, x′\) are a collision under \(MD_h\). Define the values \(x_1, . . . , x_{k+1}\) and \(y_1, . . . , y_{k+1}\) as in the computation of \(MD_h (x)\). Similarly, define \(x′_1, . . . , x′_{k′+1}\) and \(y'_1, . . . , y′_{k′+1}\) as in the computation of \(MD_h (x′)\). Note that, in general, \(k\) may not equal \(k'\).

Recall that:

\(MD_h (x) = y_{k+1} = h(y_k ‖ x_{k+1})\)

\(MD_h (x′) = y′_{k′+1} = h(y′_{k′} ‖ x′_{k′+1})\)

Since we are assuming \(MD_h (x) = MD_h (x')\), we have \(y_{k+1} = y′_{k′+1}\). We consider two cases:

Case 1: If \(|x| \neq |x'|\), then the padding blocks \(x_{k+1}\) and \(x'_{k′+1}\) which encode |x| and |x'| are not equal. Hence we have \(y_k ‖ x_{k+1} \neq y′_{k′} ‖ x'_{k′+1}\), so \(y_k ‖ x_{k+1}\) and \(y′_{k′} ‖ x'_{k′+1}\) are a collision under h and we are done.

Case 2: If \(|x| = |x'|\), then x and x' are broken into the same number of blocks, so \(k = k'\). Let us work backwards from the final step in the computations of \(MD_h(x)\) and \(MD_h (x')\). We know that:

\(y_{k+1} = h(y_k ‖ x_{k+1})\)

\(y′_{k+1} = h(y′_k ‖ x'_{k+1})\)

If \(y_k ‖ x_{k+1}\) and \(y′_k ‖ x'_{k+1}\) are not equal, then they are a collision under h and we are done. Otherwise, we can apply the same logic again to \(y_k\) and \(y′_k\) , which are equal by our as- sumption.

More generally, if \(y_i = y′_i\) , then either \(y_{i−1} ‖ x_i\) and \(y′_{i−1} ‖ x'_i\) are a collision under h (and we say we are "lucky"), or else \(y_{i−1} = y'_{i−1}\) (and we say we are "unlucky"). We start with the premise that \(y_k = y′_k\) . Can we ever get "unlucky" every time, and not encounter a collision when propagating this logic back through the computations of \(MD_h (x)\) and \(MD_h (x')\)? The answer is no, because encountering the unlucky case every time would imply that \(x_i = x'_i\) for all \(i\). That is, \(x = x'\). But this contradicts our original assumption that \(x \neq x'\). Hence we must encounter some "lucky" case and therefore a collision in \(h\).

Source: Mike Rosulek, https://joyofcryptography.com/pdf/chap11.pdf
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License.

COURSE INTRODUCTION

Course Syllabus

Unit 1: Introduction to Cryptography

Unit 1 Learning Objectives

1.1: Introduction to Encryption and Computational Security

Key Concepts of Cryptography

How Cryptography Works

Key Terms for Network Security

1.2: Block Ciphers vs. Stream Ciphers

Overview

More on Stream Ciphers

1.3: Substitution Cipher

Basic Techniques

Python Exercise

1.4: Transposition Cipher

Basic Techniques

1.5: Random Numbers

The Basics of PRNGs

Python Example

1.6: Perfect Secrecy and the One-Time Pad

Perfect Secrecy a La Shannon

The Main Idea behind the One-Time Pad

More on the One-Time Pad

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Hash Functions

Unit 2 Learning Objectives

2.1: What Is a Hash Function?

Overview

One Way Functions

Hash Functions

Python Exercise

2.2: Collision Resistance

The Importance of Collision Resistance

The Vulnerabilities of MD5

2.3: Merkle-Damgard Construction

Overview

Recipes for Hash Functions

2.4: SHA (Secure Hash Algorithm)

Overview

The Security of SHA

Python Exercise

2.5: Application: Blockchain

Introduction to Blockchain

Proof of Work

Python Exercise

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Symmetric Encryption

Unit 3 Learning Objectives

3.1: Introduction to Symmetric Encryption

Key Concepts

3.2: Feistel Cipher

Combining Transposition and Substitution

The Elegance of the Feistel Cipher

3.3: Data Encryption Standard (DES)

Overview

Applying the Feistel Cipher

3.4: Advanced Encryption Standard (AES)

Beyond DES

Python Implementations

3.5: Triple DES (3DES)

Introduction to 3DES

Contrasting 3DES with AES

A Performance Comparison of DES, AES, and DES

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Asymmetric Encryption

Unit 4 Learning Objectives

4.1: Asymmetric Cryptosystems

Introduction to Public Key Cryptography

4.2: The RSA Cryptosystem

The RSA Cryptosystem

4.3: Integer Factorization

Prime Factorization

Factoring Techniques

Python Exercise

4.4: Congruences Modulo n

Modular Arithmetic

More Immersion into Congruences modulo n