hash vs. MAC vs. digital signature

breaking down the subtle differences

Nov 17, 2024

what is a hash

a hashing function condenses an input of any length into an output (hash) of a fixed length.

to better understand how hashes are commonly utilized, it may help to understand what makes a strong hashing function.

  • a strong hashing algorithm should be able to produce a hash of a fixed length for any kind of digital data in a relatively efficient and simple manner
  • it should always produce the same output for the same input (be deterministic in nature)
  • the output should depend on every bit of the input i.e. when given a slightly altered input than the previous, the output for the altered input should be very different from the output for the original input (avalanche effect)
  • it should be computationally infeasible to,
    • find a pair of inputs that produces the same output (collision resistance)
    • find the input corresponding to a known output (pre-image resistance)
    • find another input that produces the same output, given someone has the output for a particular input (second pre-image resistance.) in other words, a hash function should ideally produce a unique hash for every input it is given

common uses for generating hashes (digests/checksums) of digital content

  • checking the integrity of content - since hashes differ drastically when the input is altered in the slightest way, hashes of digital content is commonly used to verify that the content had not been altered from its original state. this is done by generating a hash for the content when needed and comparing it with the hash for the original content.
    • verifying that downloaded software/updates/files are genuine
    • verifying that the files stored in the cloud are intact at the point of retrieval
    • avoiding blocks/transactions from being tampered with in a blockchain
    • ensuring that the code within a single commit remains unchanged in version control systems like Git
  • storing passwords in databases - storing hashes of passwords is a common security practice. this is because if the data were to be compromised, it’s not possible to (easily) obtain the real password from its hash
  • data deduplication - hash-bases techniques are used in storage, backup, content delivery and archiving systems to optimize the usage of space by identifying duplicate data and taking action (Dropbox, Google Drive, Cloudflare etc.)

what is a MAC

generating a MAC (message authentication code) for content meant to be transmitted is a method used to authenticate the origin of the content (other than to verify the integrity of the content.)

usually a “MAC algorithm” is a function that that accepts a message and a secret key as the input and produces a, MAC. HMAC is a MAC algorithm that uses hashing under the hood and others like CMAC and GMAC use symmetric encryption as part of their process.

now since the input also includes a secret key, the output has the special purpose of helping a recipient verify the authenticity of the message they receive.

  • the sender generates a MAC using a message + a secret key and sends the MAC along with the message to the recipient
  • the recipient, who is also supposed to know the secret key, uses the message and the key to generate a MAC using the same algorithm
  • if the MAC received along with the message is similar to the one generated by the recipient, the recipient can verify that the message actually originated from the sender, because the key is only known to the sender apart from the recipient.

what is a digital signature

a digital signature is also a hash generated for a message, but it is “encrypted” in addition using asymmetric encryption.

it is important to know how actual asymmetric encryption differs from generating digital signatures.

asymmetric encryption

the purpose of encryption is confidentiality because encrypted content isn’t readable without decryption.

it works by the sender encrypting the data using the recipient’s public key, and the recipient decrypting it using their own private key - which ensures confidentiality - since only the recipient can perform decryption on data that is encrypted using their public key.

signature generation

a hash of the data is put through encryption as opposed to actual data itself.

the sender encrypts (or rather, signs) the hash of the data using their own private key, and the recipient decrypts (or verifies) it using the sender’s public key.

the recipient then generates the hash for the received message and by comparing the generated hash with the decrypted signature, the recipient can verify the authenticity of the message origin.

the difference lies in,

  • the goal of the operation - confidentiality is achieved by encryption and signing a message achieves ensuring integrity, sender authentication and non-repudiation (more on this below)
  • and key direction -
    • encrypting with recipient public key and decrypting with recipient private key vs
    • signing with sender private key and verifying with sender public key

in addition to helping verify integrity and authenticity, which can also be achieved by a MAC alone, a digital signature ensures non-repudiation – which means that someone can’t deny their authenticity.

in the message exchanging scenario, since only the sender knows their own private key, AND since the signature can be only be decrypted using their public key, the sender can’t deny that they were the one who generated the signature.

why a hash is used instead of signing the actual data

there’s a few reasons it’s standard practice to use hashes in generating signatures instead of signing the actual data

  • performance - asymmetric encryption is an operation that consumes a fair bit of time and resources when performed on large amounts of data. hashing reduces the data to a small and fixed size output, making the process much more efficient
  • ensuring data integrity - hashing the data naturally guarantees the integrity of the original data
  • achieving a level of confidentiality in some cases, by keeping the actual data separate from the signature generating/signing process

where to find out about cryptographic algorithms that are currently considered secure:
what inspired me to write this:
the highest voted answer to this great question on crypto stack exchange