Hash Function (2024)

An algorithm that converts a message into a hash value

Over 1.8 million professionals use CFI to learn accounting, financial analysis, modeling and more. Start with a free account to explore 20+ always-free courses and hundreds of finance templates and cheat sheets.

Start Free

Written byAndrew Loo

What is a Hash Function?

A hash function is a mathematical function or algorithm that simply takes a variable number of characters (called a ”message”) and converts it into a string with a fixed number of characters (called a hash value or simply, a hash).

Key Highlights

A hash function is a mathematical function that converts any digital data into an output string with a fixed number of characters. Hashing is the one-way act of converting the data (called a message) into the output (called the hash).
Hashing is useful to ensure the authenticity of a piece of data and that it has not been tampered with since even a small change in the message will create an entirely different hash.
Hash functions are the basic tools of modern cryptography that are used in information security to authenticate transactions, messages, and digital signatures.

The act of hashing is, therefore, running an input into a formula that converts it into an output message of fixed length. No matter how many characters long the input is, the output will always be the same in terms of the number of hexadecimal (letters and numbers) characters.

Hashing is useful to ensure the authenticity of a piece of data, as any small change to the message will result in a completely different hash value.

Why Do We Need Hash Functions?

Standard Length

When you hash a message, it takes your file or message of any size, runs it through a mathematical algorithm, and spits out an output of a fixed length.

Table 1: Different Hash Functions

In Table 1 above, I have converted the same input message (the letters CFI) into hash values using three different hash functions (MD5, SHA-1, and SHA-256). Each one of those different hash functions will spit out an output hash that has a set fixed length of hexadecimal characters. In the case of MD5, it is 32 characters, SHA-1, 40 characters, and SHA-256, 64 characters.

Table 2: Different Inputs Using the Same Hash Function (SHA-1)

It doesn’t matter what we put in as an input, the same hash function will always produce a hash value that has the the same number of characters. In Table 2 above, we change the message each time, but using the same hash function (SHA-1 in this case), the output is always 40 hexadecimal characters long.

Ensure data integrity

Let’s think of an example where you want to send a digital message or document to someone, and you want to make sure that it hasn’t been tampered with along the way. You could send it multiple times and have the recipient verify each copy is the same, but that would not be feasible if the file or message was very large.

It would be much easier if there was a way of having a shorter and set number of characters for the sender and receiver to check. And that’s essentially what a hash function allows two computers to do.

Rather than compare the data in its original (and larger) form, by comparing the two hashes of the data, computers can quickly confirm that the data has not been tampered with and changed.

Hash functions, therefore, serve as a check-sum or a way for someone to identify whether digital data has been tampered with after it’s been created.

Verify authenticity

For example, if you send out an email, it can be intercepted easily (especially if it is sent over an unsecured WiFi network). The recipient of the email has no way of knowing if someone has altered the contents of the email along the way, called a “Man-in-the-Middle” (MitM) attack.

However, if the sender signs the email with their digital signature and hashes that together with the email contents, the receiver can examine the hash data to ensure that the email contents have not been modified after being digitally signed.

To do this, the receiver would compare the hash value on the digitally-signed email received to a hash value they “re-generate” themselves using the same hash function provided by the sender, as well as the signer’s public key.

If it matches, that means that no one has altered the message, but if the hashes are different, then the receiver knows that the contents of the email are not authentic, as even if something small has been changed in that message, the hash will be completely different.

How Does a Hash Function Work?

A hash function depends on the algorithm but generally, to get the hash value of a set length, it needs to first divide the input data into fixed-sized blocks, which are called data blocks.

This is because a hash function takes in data at a fixed length. The size of the data block is different from one algorithm to another.

If the blocks are not big enough, they may add padding to fill it out. However, regardless of what method of hashing you use, the output, or hash value, is always the same fixed length.

The hash function is then repeated as many times as the number of data blocks.

The “Avalanche Effect”

The data blocks are processed one at a time. The output of the first data block is fed as input along with the second data block. Consequently, the output of the second is fed along with the third block, and so on.

Thus, making the final output the combined value of all the blocks. If you change one bit anywhere in the message, the entire hash value changes. This is called ‘the avalanche effect.

Uniqueness and Deterministic

Hash functions must be Deterministic – meaning that every time you put in the same input, it will always create the same output.

In other words, the output, or hash value, must be unique to the exact input. There should be no chance whatsoever that two different message inputs create the same output hash. If a hash function produces the same output from two different pieces of data, it is known as a “hash collision,” and the algorithm is useless.

Irreversibility

Ideally, hash functions should be irreversible. Meaning that while it is quick and easy to compute the hash if you know the input message for any given hash function, it is very difficult to go through the process in reverse to compute the input message if you only know the hash value.

Brute Force Search

However, it is possible to compute the input given the output hash value, and that involves lots of computing power. Computing from right to left is called a “brute force” search, using trial and error to find a message that fits the hash value and see if it produces a match.

Hash Functions in Cryptography

The most famous cryptocurrency, Bitcoin, uses hash functions in its blockchain. Powerful computers, called miners, race each other in brute force searches to try to solve hashes in order to earn the mining rewards of new Bitcoins, as well as processing fees that users pay to record their transactions on the blockchain.

Solving a hash involves computing a proof-of-work, called a NONCE, or “number used once”, that, when added to the block, causes the block’s hash to begin with a certain number of zeroes. Once a valid proof-of-work is discovered, the block is considered valid and can be added to the blockchain.

Since each block’s hash is created by a cryptographic algorithm – Bitcoin uses the SHA-256 algorithm – the only way to find a valid proof-of-work is to run guesses through the algorithm until the right number is found that creates a hash that starts with the right number of zeroes. This is what Bitcoin miners are doing, running numbers through a cryptographic algorithm until they guess the valid NONCE.

What are Examples of Common Cryptocurrency Hash Functions?

The SHA-256 function that Bitcoin uses is short for “Secure Hash Algorithm” and was designed by the United States National Security Agency (NSA) and includes SHA-1, SHA-2 (a family within a family that includes SHA-224, SHA-256, SHA-384, and SHA-512), and SHA-3 (SHA3-224, SHA3-256, SHA3-384, and SHA3-512).

Other examples of common hashing algorithms include:

Message Digest (MD) Algorithm — MD2, MD4, MD5, and MD6. MD5 was long considered a go-to hashing algorithm, but it’s now considered broken because of hash collisions.
Windows NTHash— Also known as a Unicode hash or NTLM, this hash is commonly used by Windows systems
RACE Integrity Primitives Evaluation Message Digest (RIPEMD)
Whirlpool
RSA

Generally speaking, the most popular hashing algorithms or functions have a hash length ranging from 160 to 512 bits.

Where Else Do You Find Hash Functions at Work?

Any piece of digital information, like a file on your computer, a photo on your smartphone, or a block on a cryptocurrency blockchain, has a hash. And each hash is unique to each piece of data – any small change in the underlying information will lead to a completely different hash.

Apart from cryptocurrencies and other blockchain technologies, hash functions can be found throughout public key cryptography in everything from signing new software and verifying digital signatures to securing the website connections in your computer and mobile web browsers.

Hashing vs Encryption

Encryption is the practice of taking data and creating a scrambled message in a way that only someone with a corresponding key, called a cipher, can unscramble and decode it. Encryption is a two-way function, designed to be reversible by anyone who holds a cipher. So when someone encrypts something, it is done with the intention of decrypting it later.

Hashing is using a formula that converts data of any size to a fixed length. The computing power required to “un-hash” something makes it very difficult so whereas encryption is a two-way function, hashing is generally a one-way function.

Encryption is meant to protect data in transit, hashing is meant to verify that a file or piece of data hasn’t been altered—that it is authentic. So you might liken encryption to putting a piece of data in a safe that opens when the recipient knows the combination; hashing is more like a security tamper seal that indicates if the contents of the data have been altered.

Learn More

Thank you for reading CFI’s guide to Hash Functions. To keep learning and developing your knowledge base, please explore the additional relevant resources below:

FAQs

What is considered a good hash function? ›

Characteristics of a Good Hash Function. There are four main characteristics of a good hash function: 1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of possible hash values.

Read On ›

What is a hash function answer? ›

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes.

Discover More Details ›

What is a good hash function complexity? ›

The time and space complexity for a hash map (or hash table) is not necessarily O(n) for all operations. The typical and desired time complexity for basic operations like insertion, lookup, and deletion in a well-designed hash map is O(1) on average.

What is a minimal hash function? ›

Minimal perfect hash functions are used for memory efficient storage and fast retrieval of items from a static set, such as reserved words in programming languages, 1 Page 2 command names in operating systems, commonly used words in natural languages, etc.

See Details ›

What is the optimal number of hash functions? ›

There is a simple formula that can help you find the optimal number of hash functions for a given Bloom filter size and expected number of elements. The formula is: k = (m/n) * ln(2) where m is the size of the bit array, n is the expected number of elements, and ln(2) is the natural logarithm of 2.

Find Out More ›

What is the most famous hash function? ›

The MD5 algorithm, defined in RFC 1321, is probably the most well-known and widely used hash function. It is the fastest of all the . NET hashing algorithms, but it uses a smaller 128-bit hash value, making it the most vulnerable to attack over the long term.

Tell Me More ›

What is the simplest hash function? ›

The simplest example of a hash function encodes the input in the same way as the output range and then discards all that exceeds the output range. For example if the output range of the hash function is 0–9 then we can interpret all input as an (base 10) integer and discard all but the last digit.

Show Me More ›

What is a hash function for dummies? ›

Hash function.

This function takes the input data and applies a series of mathematical operations to it, resulting in a fixed-length string of characters. The hash function ensures that even a small change in the input data produces a significantly different hash value.

Explore More ›

What are two common hash functions? ›

6 Answers

MD5, SHA-1 - Commonly used, used to be secure, but no longer collision resistant.
SHA-2 - Commonly used, secure. ...
SHA-3 - Not yet specified, but will probably become popular after that. ...
CRC32 - Not secure, but really common as checksum.

More items...

May 6, 2013

What is the average hash function? ›

The average hash algorithm first converts the input image to grayscale and then scales it down. In our case, as we want to generate a 64 bit hash, the image is scaled down to 8×8 pixels. Next, the average of all gray values of the image is calculated and then the pixels are examined one by one from left to right.

Show Me More ›

Which is a weak hash function? ›

Use of a Weak Hash Algorithm: Hash functions generate outputs of a fixed length, which defines the number of potential inputs that an attacker needs to search to find a collision. If this output length becomes too short or the hash contains exploitable flaws, then the hash is no longer secure.

Read The Full Story ›

What is a good hash function design? ›

Rules for choosing good hash function:

The hash function should be simple to compute.
Number of collisions should be less while placing the record in the hash table. ...
Hash function should produce such keys which will get distributed uniformly over an array.
The hash function should depend on every bit of the key.

Feb 21, 2023

See Details ›

What is a bad hash function? ›

A lot of obvious hash function choices are bad. For example, if we're mapping names to phone numbers, then hashing each name to its length would be a very poor function, as would a hash function that used only the first name, or only the last name. We want our hash function to use all of the information in the key.

Get More Info Here ›

What is a hash function in layman's terms? ›

A hash function is a mathematical function that converts any digital data into an output string with a fixed number of characters. Hashing is the one-way act of converting the data (called a message) into the output (called the hash).

What is the perfect hashing problem? ›

Perfect hashing. We consider the following perfect hashing problem: Given a set S of n keys from a universe U, build a look-up table T of size O(n) such that a membership query (given x 2 U, is x 2 S) can be answered in constant time. We show that a perfect hash table can be built in linear expected time.

What is a perfect hash function requires? ›

The use of O(n) words of information to store the function of Fredman, Komlós & Szemerédi (1984) is near-optimal: any perfect hash function that can be calculated in constant time requires at least a number of bits that is proportional to the size of S.

View Details ›

What is a good hash size? ›

But a good general “rule of thumb” is: The hash table should be an array with length about 1.3 times the maximum number of keys that will actually be in the table, and. Size of hash table array should be a prime number.