Table of Contents:
More than one user has wondered what are hash functions, a term usually thrown around the more technical crypto circles.
The word 'hash' may evoke memories of meat and potatoes, chopped and mixed. However, food is the last thing on the menu. In this case, computers take center stage. This class of functions is all around us as long as we interact with websites, digital wallets, cryptocurrencies, digital signatures, privacy enhancing technologies, etc.
If you cannot understand what are hash functions when you come across the term, this article is for you. We will explore everything about this concept, its importance in the modern digital world, existing variations, and properties of a strong hashing algorithm.
What are hash functions?
Sometimes referred to as message digest, a hash is, in short, a confusing mess. This word is used to describe the result of a person making a mess out of something and making everything worse. In computing, a hash is a result produced by a hash function, and it usually looks like letters, numbers or both jumbled up together in a way that makes no sense to the human mind. A hash function is a storage-saving tool that computers use to scramble data inputs and convert them into numerical values of fixed or variable lengths.
If you’ve watched enough spy movies, you must have heard of scrambled calls, which shut out unauthorized listeners by muddling up a conversation such that it becomes unintelligible. Message digest functions take data of varying lengths and transform them into unintelligible (to humans) numeric or alphanumeric values called hash values, hash codes, or simply hashes. Unlike scrambled calls, which you can unscramble, hash functions are unidirectional, i.e., they must produce hashes that cannot be unscrambled into the original data.
After this brief overview of what are hash functions, we must elucidate the crucial distinction between them and hash values. Hash values are unique, with none having a similar makeup to another, which makes them similar to human fingerprints. They are usually stored in hash tables for easy access, and are of the same length if produced by the same type of hash algorithm.
For instance, if you scramble your 16-character Facebook password or a 56 character web link using the Secure Hash Algorithm (SHA) 256, the resulting hash values will be of the same length — a 64-character fingerprint. Even if you hashed a 10,000-word essay with SHA-256, the hash length would remain unchanged. The same would happen if you used SHA-512, with which texts of any length can be hashed yielding a fixed-size 128-character string. As the output string is encoded in hexadecimal, the 128 characters fit in exactly 512 bits.
How do hash functions work?
The most critical aspect to grasp how hash functions work is the mechanism by which they scramble data to produce a fixed-length hash value or message digest . Here's a simple breakdown:
- The function takes a key as an input.
- The key is usually associated with a record and serves as a unique identifier of the record to the data storage and retrieval application.
- The key could be of a fixed length, like numbers between 10 and 20, or varying lengths, like names, or the record itself in some cases.
- The hash function converts the keys of fixed or varying lengths into fixed-length codes, usually of machine length. The function employs operators like ADD or XOR to fold the inputs by words or other units and preserve their parity.
- It then scrambles the key's bits to distribute the resulting values uniformly over the keyspace.
- When the hash function scrambles the record, it produces an output, a hash code, which is then used to index a hash table that holds the data or other pointers to the records.
You now know how hash functions work, but what about their applications?
Do you know what happens in the backend when you try to log into an account with your email and password? Password verification is one of the major uses of cryptographic hash functions.
During signin up to any online platform, your password is often stored in the hash table as a hashcode. When you input your password during a login session, the hash function creates a checksum for the password you entered and compares it to the saved checksum in the hash table. If both checksums are identical, access will be granted. If not, you’ll get an incorrect password message.
The applications of hashing in crypto are numerous. To name one of the simplest, a public key (the address you provide to someone that will send tokens or coins to you) is obtained by hashing its private key.
They are also leveraged for file verification. If you want to download a file from an untrusted website and its checksum is provided by the original author, you can check if it matches the checksum of the original file on the trusted site.
Besides passwords and file integrity verification, Hashes also come in handy in data storage and retrieval applications and have a wide range of uses, including:
- Signature generation and verification
- Data or file identifiers
- Proof of work algorithm such as Hashcash to counter email spams and denial-of-service (DDOS) attacks.
- Hashcash is also leveraged in proof of work consensus mechanism such as Bitcoin
Types of hash functions and how do they work
There are several types of hash function variations, and their distinct modes of operation set them apart. Here are a few of their most significant types in computing:
The folding method in hashing utilizes a simple two-step method to generate hash codes. First, the key value (k) is split up into a fixed number of parts, each with the same number of digits (the last part can have fewer digits). Then, the sum of all the parts will be the hash value of the input key. The number of digits in each part depends on the hash table size.
Mid-squares hashing produces hash values by squaring the input (key) and extracting a fixed number of middle digits or bits from the result of their multiplication. This type of hash function usually works when the data to be hashed is a numerical value, and it works well when there are few leading or trailing zeros in the record's key.
Identity hash function
An identity hash function comes into play when the data hashed is small, depending on the hash code length of the programming language you intend to use. With the condition of size fulfilled, you can use the input data as a hash value, although it gets reinterpreted as an integer. This type of hash function can be used to map zip codes to city names.
As one of the easiest ways to generate a hash code, division hashing works in a simple formula that divides the value of k by M, where k is the input key and M is the hash table size (usually a prime number). Although it is fast and great for any value of M, division hashing performs poorly by mapping consecutive keys to consecutive hash codes on the hash table.
In multiplicative hashing, a constant value A is selected such that A is greater than zero but less than one. Then, we find the product of the input key and A, after which we extract the fractional portion X (all the digits after the decimal point). X is then multiplied by the size of the hash table, and the result is the hash code. This type of hashing is best suited to scenarios when the hash table size is the power of two.
Properties of strong hash functions algorithms
For any hashing algorithm to pass as strong, it must have the following properties:
Also known as preimage resistance, irreversibility is an essential property of a strong hash function. With a good algorithm, it should be impractical to reverse the hashcode and recover the original input message. Strong hashing functions are one-way and irreversible.
A strong hash function should always produce an output of the same size regardless of the input data size. This feature is known as determinism. Whether you are hashing the first and last names or an entire book, the hash codes produced by a specific hash function must be of the same size.
Collisions in hashing occurs when two unique data inputs produce identical hashcodes. Collisions prove that a hashing function is broken and unsafe, which could invite malicious actors to manipulate data. Hash codes are unique to their data inputs, and collisions are unnatural occurrences for hashing algorithms. A robust hashing algorithm should be resistant to collision.
A strong hashing function must produce an avalanche effect when there is even the slightest change to the input key. If something as simple as a comma is added to the input, the output hashing value will change significantly.
A robust hashing algorithm is expected to compute hash values swiftly. Hash speed, however, is somewhat subjective, depending on how you intend to use the hashing algorithm. In some cases, like website connections, faster hashing algorithms are best suited to the job as the responsiveness contributes to a better user experience. When hashing passwords, it is best to use slower hashing functions to make sure it will much slower to brute-force the password. Thus, making it almost impossible to brute-force a large password database.
A strong hashing algorithm must be applied in various situations and support varying input string values, hash table sizes, and seed types. For instance, a hash function that only allows specific table sizes, seed types, and limited string lengths, is less valuable than one you can use in various situations.
Strong hashing algorithms must ensure uniformity by mapping input values evenly across their output range. Uniformity, in this case, demands that an exact probability is used to generate each hash code in the output range. Uniformed distributions keep collisions minimal, regardless of input size, which is vital for efficient hashing.
You can now confidently answer anyone asking “what are hash functions?” or “how do hash functions work?”... but it doesn’t stop there.
Hashes are without a doubt a backbone of the modern Internet age. From password verifications, signature generation and data identifiers to blockchain networks, cryptographic hash functions are being used to make the digital world more secure, verifiable, and enhance users’ privacy.
As you can imagine, the hashing rabbit hole goes deep, and even though you may now know what are hash functions and how do hash functions work, as you increase your technical knowledge you can continue to explore it. We hope that this superficial overview has been enlightening and achieved its primary goal of capturing your attention. If it does, please let us know through Twitter!
Panther is a decentralized protocol that enables interoperable privacy in DeFi using zero-knowledge proofs.
Users can mint fully-collateralized, composable tokens called zAssets, which can be used to execute private, trusted DeFi transactions across multiple blockchains.
Panther helps investors protect their personal financial data and trading strategies, and provides financial institutions with a clear path to compliantly participate in DeFi.