Hashing

Hashing

\boxed{ \text{In a Nutshell: } \text{Key} \to \text{Hash Code} \to \text{Hash Index} }

Why?: Hashing lets us turn search keys into integer indices that can be accessed at O(1)

Hash Function: Takes a search key and produces the integer index of an element in the hash table.

Remember: Override the Object class’s default hashcode when necessary!

Real-world Example: CRC

Hash Code and Hash Index

Typical Hashing:

  1. Convert search key to integer (hash code)
  2. Compress code into range of indices for hash table.

Hash Code v.s. Hash Index

Good Hash Functions

  1. Minimize Collisions
  2. Are fast to compute.

Note: Data Structure for Hash Tables

Example: Ideal Hashing

Function add(key,value) {
  index=h(key);
  hashTable[index]=value;
}

Computing Hash Codes

Example: Hashing a Unicode string

int hash = 0;
int n = s.length();
for (int i = 0; i < n; i++) {
  hash = g * hash + s.charAt(i);
}

Hash Code for Primitives:

Compressing a Hash Code (Scaling Method)

After getting a hash code, we need to compress it into a usable hash index.

\text{Scaling Method: Hash Code} \% n

private int getHashIndex(K key) {
    int hashIndex = key.hashCode() % hashTable.length;
    if (hashIndex < 0) {
        hashIndex = hashIndex + hashTable.length;
    }
    return hashIndex;
}

Important: Resizing a Hash Table

Handling Collisions

Reducing Collisions:

  1. Improve the hash function
  2. Increase table size.

Handling Collisions:

  1. Use another location in the hash table
  2. Change the structure of the hash tale so that each location can represent more than one value.

Three Kinds of Locations in a Hash Table:

  1. Occupied
  2. Empty
  3. Available

A. Linear Probing

Linear Probing: Resolves collision during hashing by examining consecutive locations in hash table (looping back around if we reach the end)

Note: This resolves collision during additions, but complicates removals.

Clustering: Collisions resolved with linear probing cause groups of consecutive locations in hash table to be occupied, called clusters.

B. Open Addressing with Quadratic Probing

Quadratic Probing: Considers the locations at indices k + j^j, rather than going consecutively from k.

C. Open Addressing with Double Hashing

Double Hashing: Use a second hash function to compute increments.

D. Separate Chaining

Separate Chaining: Track collisions with linked list.

More on Implementation

Load Factor (\lambda)

Load Factor: Measure of cost of collision resolution

\lambda = \frac{\text{Num. of entries in dictionary}}{\text{Num. of locations in hash table}}

Maintaining Performance:

Rehashing:

More Advanced Entries

Example: Another private inner-class to encapsulate key, value pairs along with state.

private static class TableEntry<K,V> {
  private K key;
  private V value;
  private States state;
  private enum States { CURRENT, REMOVED }
  private TableEntry(K key, V value) {
      this.key = key;
      this.value = value;
      this.state = States.CURRENT;
  }
}

Java Class Library: The Class HashMap

Java Class Library: The Class HashSet