\boxed{ \text{In a Nutshell: } \text{Key} \to \text{Hash Code} \to \text{Hash Index} }
Why?: Hashing lets us turn search keys into integer indices that can be accessed at O(1)
- Hashing is a data-dependent problem, there’s no one “right” way to do it.
Hash Function: Takes a search key and produces the integer index of an element in the hash table.
Remember: Override the Object class’s default
hashcode
when necessary!
- The default
hashcode
function just returns an object’s memory address.- Example: The
String
andInteger
class already implement their ownhashcode
function.
Typical Hashing:
Hash Code v.s. Hash Index
- Hash Code: The integer result of the hash function.
- Hash Index: The hash code % Table Length
- (This is what we use as the search key.)
- aka: Compressing
Good Hash Functions
Note: Data Structure for Hash Tables
- The only data structure compatible with hash tables are arrays, because random accesses are O(1)
Example: Ideal Hashing
Function add(key,value) { index=h(key); hashTable[index]=value; }
- Supposing that we have perfect hashing (no collision).
Example: Hashing a Unicode string
- Naive Approach: Sum every character’s Unicode integer value.
- Another Approach: Multiply each character’s integer value by its position in the String and sum the values. u_0 g^{n-1} + u_1 g^{n-2} + u^{n-2} g + u_{n-1}
int hash = 0; int n = s.length(); for (int i = 0; i < n; i++) { = g * hash + s.charAt(i); hash }
- (The g constant was arrived at statistically)
Hash Code for Primitives:
int
Byte
, short
, char
: Cast to int
After getting a hash code, we need to compress it into a usable hash index.
\text{Scaling Method: Hash Code} \% n
private int getHashIndex(K key) {
int hashIndex = key.hashCode() % hashTable.length;
if (hashIndex < 0) {
= hashIndex + hashTable.length;
hashIndex }
return hashIndex;
}
Important: Resizing a Hash Table
- Because the hash index is dependent on the size of the hash table, resizing the hash table isn’t a trivial copy-by-value to be bigger array—you need to re-hash the values.
Reducing Collisions:
Handling Collisions:
Three Kinds of Locations in a Hash Table:
null
and always has been.Linear Probing: Resolves collision during hashing by examining consecutive locations in hash table (looping back around if we reach the end)
Note: This resolves collision during additions, but complicates removals.
Clustering: Collisions resolved with linear probing cause groups of consecutive locations in hash table to be occupied, called clusters.
Quadratic Probing: Considers the locations at indices k + j^j, rather than going consecutively from k.
Double Hashing: Use a second hash function to compute increments.
Separate Chaining: Track collisions with linked list.
Load Factor: Measure of cost of collision resolution
\lambda = \frac{\text{Num. of entries in dictionary}}{\text{Num. of locations in hash table}}
Maintaining Performance:
Rehashing:
Example: Another private inner-class to encapsulate key, value pairs along with state.
private static class TableEntry<K,V> { private K key; private V value; private States state; private enum States { CURRENT, REMOVED } private TableEntry(K key, V value) { this.key = key; this.value = value; this.state = States.CURRENT; } }
- Not necessary if we’re doing linear probing.
HashMap
HashSet
java.util.Set
interfaceHashMap