What are hash functions in crypto? A layman’s guide.
What are hash functions in crypto? A layman’s guide.

Table of Contents:

More than one user has wondered what are hash functions, a term usually thrown around the more technical crypto circles.

The word 'hash' may evoke memories of meat and potatoes, chopped and mixed. However, food is the last thing on the menu. With hash functions, computers take center stage. Hash functions are all around us as long as we interact with websites, digital wallets, cryptocurrencies, privacy enhancing technologies, etc.

If you cannot understand the phrase 'hash function' when you come across it, this article is for you. We will explore everything about hash functions, their importance in the modern digital world, types of hash functions, and properties of a strong hash function.

What are hash functions?

A hash is, in short, a confusing mess. This word is used to describe the result of a person making a mess out of something and making everything worse. In computing, a hash is a result produced by a hash function, and it usually looks like letters, numbers or both jumbled up together in a way that makes no sense to the human mind. A hash function is a storage-saving tool that computers use to scramble data inputs and convert them into numerical values of fixed or variable lengths.

If you’ve watched enough spy movies, you must have heard of scrambled calls, which shut out unauthorized listeners by muddling up a conversation such that it becomes unintelligible. Hash functions take data of varying lengths and transform them into unintelligible (to humans) numeric or alphanumeric values called hash values, hash codes, or simply hashes. Unlike scrambled calls, which you can unscramble, hash functions are unidirectional, i.e., they must produce hashes that cannot be unscrambled into the original data.

Simple diagram illustrating a hash function. (Source)

Hash values are unique, with none having a similar makeup to another, which makes them similar to human fingerprints. They are usually stored in hash tables for easy access, and are of the same length if produced by the same type of hash function.

Regardless of the size of input message, a hash function always gives an output of fixed size. (Source)

For instance, if you scramble your 16-character Facebook password with the SHA256 hash function and a 56-character web link, the resulting hash values will be of the same length – a 64-character fingerprint. Even if you hashed a 10,000-word essay with SHA256, the hash length would remain unchanged.

How do hash functions work?

You must wonder how hash functions scramble data to produce fixed-length hash values or digests. Here's a simple breakdown:

  1. The hash function takes a key as an input.
  2. The key is usually associated with a record and serves as a unique identifier of the record to the data storage and retrieval application.
  3. The key could be of a fixed length, like numbers between 10 and 20, or varying lengths, like names, or the record itself in some cases.
  4. The hash function converts the keys of fixed or varying lengths into fixed-length codes, usually of machine length. The function employs operators like ADD or XOR to fold the inputs by words or other units and preserve their parity.
  5. The hash function then scrambles the key's bits to distribute the resulting values uniformly over the keyspace.
  6. When the hash function scrambles the record, it produces an output, a hash code, which is then used to index a hash table that holds the data or other pointers to the records.

Applications of hash functions

Do you know what happens in the backend when you try to log into an account with your email and password? Password verification is one of the major uses of cryptographic hash functions.

During signin up to any online platform, your password is often stored in the hash table as a hashcode. When you input your password during a login session, the hash function creates a checksum for the password you entered and compares it to the saved checksum in the hash table. If both checksums are identical, access will be granted. If not, you’ll get an incorrect password message.

Hash functions are also leveraged for file verification. If you want to download a file from an untrusted website and its checksum is provided, you can check if it matches the checksum of the original file on the trusted site.

Besides passwords and file integrity verification, Hash functions also come in handy in data storage and retrieval applications and have a wide range of uses, including:

  • Signature generation and verification
  • Data or file identifiers
  • Proof of work algorithm such as Hashcash to counter email spams and denial-of-service (DDOS) attacks.
  • Hashcash is also leveraged in proof of work consensus mechanism such as Bitcoin

Types of hash functions

There are everal types of hash function variations, and their distinct modes of operation set them apart. Here are a few of the most significant types of hash functions in computing:

Folding

Folding hash functions utilize a simple two-step method to generate hash codes. First, the key value (k) is split up into a fixed number of parts, each with the same number of digits (the last part can have fewer digits). Then, the sum of all the parts will be the hash value of the input key. The number of digits in each part depends on the hash table size.

Mid-Squares

Mid-squares hash functions produce hash values by squaring the input (key) and extracting a fixed number of middle digits or bits from the result of their multiplication. This type of hash function usually works when the data to be hashed is a numerical value, and it works well when there are few leading or trailing zeros in the record's key.

Mid-square hashing method – a type of hash function. (Source)

Identity hash function

An identity hash function comes into play when the data hashed is small, depending on the hash code length of the programming language you intend to use. With the condition of size fulfilled, you can use the input data as a hash value, although it gets reinterpreted as an integer. This type of hash function can be used to map zip codes to city names.

Division hashing

As one of the easiest ways to generate a hash code, division hashing uses a simple formula that divides the value of k by M, where k is the input key and M is the hash table size (usually a prime number). Although it is fast and great for any value of M, division hashing performs poorly by mapping consecutive keys to consecutive hash codes on the hash table.

Multiplicative hashing

In multiplicative hashing, a constant value A is selected such that A is greater than zero but less than one. Then, we find the product of the input key and A, after which we extract the fractional portion X (all the digits after the decimal point). X is then multiplied by the size of the hash table, and the result is the hash code. This type of hashing is best suited to scenarios when the hash table size is the power of two.

Properties of strong hash functions algorithms

Properties of a strong hash function. (Source)

For any hashing algorithm to pass as strong, it must have the following properties:

Irreversibility

Also known as preimage resistance, irreversibility is an essential property of a strong hash function. With a good hash function, it should be impractical to reverse the hashcode and recover the original input message. Strong hashing functions are one-way and irreversible.

Determinism

A strong hash function should always produce an output of the same size regardless of the input data size. This feature is known as determinism. Whether you are hashing the first and last names or an entire book, the hash codes produced by a specific hash function must be of the same size.

Collision resistance

Collisions in hashing occurs when two unique data inputs produce identical hashcodes. Collisions prove that a hashing function is broken and unsafe, which could invite malicious actors to manipulate data. Hash codes are unique to their data inputs, and collisions are unnatural occurrences for hashing algorithms. A robust hashing algorithm should be resistant to collision.

Avalanche effect

A strong hashing function must produce an avalanche effect when there is even the slightest change to the input key. If something as simple as a comma is added to the input, the output hashing value will change significantly.

Hash speed

A robust hashing algorithm is expected to compute hash values swiftly. Hash speed, however, is somewhat subjective, depending on how you intend to use the hashing algorithm. In some cases, like website connections, faster hashing algorithms are best suited to the job. When hashing passwords, it is best to use slower hashing functions to make sure it will much slower to brute-force the password. Thus, making it almost impossible to brute-force a large set of password database.

Applicability

A strong hash function must be applied in various situations and support varying input string values, hash table sizes, and seed types. For instance, a hash function that only allows specific table sizes, seed types, and limited string lengths, is less valuable than one you can use in various situations.

Uniformity

Strong hash functions must ensure uniformity by mapping input values evenly across their output range. Uniformity, in this case, demands that an exact probability is used to generate each hash code in the output range. Uniformed distributions keep collisions minimal, regardless of input size, which is vital for efficient hashing.

You can now answer to anyone asking “what are hash functions?”... but it doesn’t stop there

Hash functions are without a doubt a backbone of the modern Internet age. From password verifications, signature generation and data identifiers to blockchain networks, cryptographic hash functions are being used to make the digital world more secure, verifiable, and enhance users’ privacy.
As you can imagine, the rabbit hole of hash functions goes deep, and as you increase your technical knowledge you can continue to explore it. We hope that this superficial overview has been enlightening and achieved its primary goal of capturing your attention. If it does, please let us know through Twitter!

About Panther

Panther is a decentralized protocol that enables interoperable privacy in DeFi using zero-knowledge proofs.

Users can mint fully-collateralized, composable tokens called zAssets, which can be used to execute private, trusted DeFi transactions across multiple blockchains.

Panther helps investors protect their personal financial data and trading strategies, and provides financial institutions with a clear path to compliantly participate in DeFi.

Stay connected: Telegram | Twitter | LinkedIn | Website

Share this article on: