The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Dictionaries 1/19/2005 11:37 PM 3 Dictionaries and Hash Tables 13 Updates with Linear Probing To handle insertions and deletions, we introduce a special object, called

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2016-05-06 06:54:03

Hash Functions and Hash Tables (§2.5.2)

Dictionaries 1/19/2005 11:37 PM 3 Dictionaries and Hash Tables 13 Updates with Linear Probing To handle insertions and deletions, we introduce a special object, called

Dictionaries 1/19/2005 11:37 PM

Hash Functions and
Hash Tables (§2.5.2)

Dictionaries and Hash Tables A hash function h maps keys of a given type to
integers in a fixed interval [0, N − 1]
0∅ 025-612-0001
1 981-101-0002 Example:
2 h(x) = x mod N
3∅ 451-229-0004
4 is a hash function for integer keys
The integer h(x) is called the hash value of key x

Dictionaries and Hash Tables 1 A hash table for a given key type consists of 4

„ Hash function h
„ Array (called table) of size N
When implementing a dictionary with a hash table,
the goal is to store item (k, o) at index i = h(k)

Dictionaries and Hash Tables

Dictionary ADT (§2.5.1) Example

The dictionary ADT models a Dictionary ADT methods: We design a hash table for 0∅ … 025-612-0001
searchable collection of key- „ findElement(k): if the a dictionary storing items 1 981-101-0002
element items (SSN, Name), where SSN 2 451-229-0004
The main operations of a dictionary has an item with (social security number) is a 3∅
dictionary are searching, key k, returns its element, nine-digit positive integer 4 200-751-9998
inserting, and deleting items else, returns the special 9997 ∅
Multiple items with the same element NO_SUCH_KEY Our hash table uses an 9998
key are allowed „ insertItem(k, o): inserts item array of size N = 10,000 and 9999 ∅
Applications: (k, o) into the dictionary the hash function
„ address book „ removeElement(k): if the h(x) = last four digits of x
„ credit card authorization dictionary has an item with
„ mapping host names (e.g., key k, removes it from the
dictionary and returns its
cs16.net) to internet addresses element, else returns the
(e.g., 128.148.34.101) special element
NO_SUCH_KEY
„ size(), isEmpty()
„ keys(), elements()

Dictionaries and Hash Tables 2 Dictionaries and Hash Tables 5

Log File (§2.5.1) Hash Functions (§ 2.5.3)

A log file is a dictionary implemented by means of an unsorted A hash function is The hash code map is
sequence usually specified as the applied first, and the
„ We store the items of the dictionary in a sequence (based on a composition of two compression map is
functions: applied next on the
doubly-linked lists or a circular array), in arbitrary order result, i.e.,
Performance: Hash code map:
h1: keys → integers h(x) = h2(h1(x))
„ insertItem takes O(1) time since we can insert the new item at the The goal of the hash
beginning or at the end of the sequence Compression map: function is to
h2: integers → [0, N − 1] “disperse” the keys in
„ findElement and removeElement take O(n) time since in the worst an apparently random
case (the item is not found) we traverse the entire sequence to way
look for an item with the given key

The log file is effective only for dictionaries of small size or for
dictionaries on which insertions are the most common
operations, while searches and removals are rarely performed
(e.g., historical record of logins to a workstation)

Dictionaries and Hash Tables 3 Dictionaries and Hash Tables 6

1

Dictionaries 1/19/2005 11:37 PM

Hash Code Maps (§2.5.3) Collision Handling
(§ 2.5.5)

Memory address: Component sum: Collisions occur when 0∅ 025-612-0001 981-101-0004
different elements are 1 451-229-0004
„ We reinterpret the memory „ We partition the bits of mapped to the same 2∅
address of the key object as the key into components cell 3∅
an integer (default hash code of fixed length (e.g., 16 Chaining: let each 4
of all Java objects) or 32 bits) and we sum cell in the table point
the components to a linked list of Chaining is simple,
„ Good in general, except for (ignoring overflows) elements that map but requires
numeric and string keys there additional memory
„ Suitable for numeric keys outside the table
Integer cast: of fixed length greater
than or equal to the
„ We reinterpret the bits of the number of bits of the
key as an integer integer type (e.g., long
and double in Java)
„ Suitable for keys of length
less than or equal to the
number of bits of the integer
type (e.g., byte, short, int
and float in Java)

Dictionaries and Hash Tables 7 Dictionaries and Hash Tables 10

Hash Code Maps (cont.) Linear Probing (§2.5.5)

Polynomial accumulation: Polynomial p(z) can be Open addressing: the Example:
evaluated in O(n) time colliding item is placed in a
„ We partition the bits of the using Horner’s rule: different cell of the table „ h(x) = x mod 13
key into a sequence of Linear probing handles „ Insert keys 18, 41,
components of fixed length „ The following collisions by placing the
(e.g., 8, 16 or 32 bits) polynomials are colliding item in the next 22, 44, 59, 32, 31,
a0 a1 … an−1 successively computed, (circularly) available table cell 73, in this order
each from the previous Each table cell inspected is
„ We evaluate the polynomial one in O(1) time referred to as a “probe” 0 1 2 3 4 5 6 7 8 9 10 11 12
Colliding items lump together,
p(z) = a0 + a1 z + a2 z2 + … p0(z) = an−1 causing future collisions to 41 18 44 59 32 22 31 73
… + an−1zn−1 pi (z) = an−i−1 + zpi−1(z) cause a longer sequence of 0 1 2 3 4 5 6 7 8 9 10 11 12
(i = 1, 2, …, n −1) probes
at a fixed value z, ignoring
overflows We have p(z) = pn−1(z)
„ Especially suitable for strings
(e.g., the choice z = 33 gives
at most 6 collisions on a set
of 50,000 English words)

Dictionaries and Hash Tables 8 Dictionaries and Hash Tables 11

Compression Search with Linear Probing
Maps (§2.5.4)

Division: Multiply, Add and Consider a hash table A Algorithm findElement(k)
Divide (MAD): that uses linear probing i ← h(k)
„ h2 (y) = y mod N p←0
„ The size N of the „ h2 (y) = (ay + b) mod N findElement(k) repeat
„ a and b are c ← A[i]
hash table is usually „ We start at cell h(k) if c = ∅
chosen to be a prime nonnegative integers return NO_SUCH_KEY
„ The reason has to do such that „ We probe consecutive else if c.key () = k
with number theory locations until one of the return c.element()
and is beyond the a mod N ≠ 0 following occurs else
scope of this course Š An item with key k is i ← (i + 1) mod N
„ Otherwise, every found, or p←p+1
integer would map to until p = N
the same value b Š An empty cell is found, return NO_SUCH_KEY
or

Š N cells have been
unsuccessfully probed

Dictionaries and Hash Tables 9 Dictionaries and Hash Tables 12

2

Dictionaries 1/19/2005 11:37 PM

Updates with Linear Probing Performance of
Hashing

To handle insertions and insert Item(k, o) In the worst case, searches, The expected running
deletions, we introduce a insertions and removals on a time of all the dictionary
special object, called „ We throw an exception hash table take O(n) time ADT operations in a
AVAILABLE, which replaces if the table is full The worst case occurs when hash table is O(1)
deleted elements all the keys inserted into the In practice, hashing is
„ We start at cell h(k) dictionary collide very fast provided the
removeElement(k) „ We probe consecutive The load factor α = n/N load factor is not close
affects the performance of a to 100%
„ We search for an item with cells until one of the hash table Applications of hash
key k following occurs Assuming that the hash tables:
values are like random
„ If such an item (k, o) is Š A cell i is found that is numbers, it can be shown „ small databases
found, we replace it with the either empty or stores that the expected number of „ compilers
special item AVAILABLE AVAILABLE, or probes for an insertion with „ browser caches
and we return element o open addressing is
Š N cells have been
„ Else, we return unsuccessfully probed 1 / (1 − α)
NO_SUCH_KEY
„ We store item (k, o) in
cell i

Dictionaries and Hash Tables 13 Dictionaries and Hash Tables 16

Double Hashing Universal Hashing
(§ 2.5.6)

Double hashing uses a Common choice of A family of hash functions Theorem: The set of
secondary hash function compression map for the is universal if, for any all functions, h, as
d(k) and handles secondary hash function: 0<i,j<M-1, defined here, is
collisions by placing an universal.
item in the first available d2(k) = q − k mod q Pr(h(j)=h(k)) < 1/N.
cell of the series where Choose p as a prime
between M and 2M.
(i + jd(k)) mod N „ q<N Randomly select 0<a<p
for j = 0, 1, … , N − 1 „ q is a prime and 0<b<p, and define
The secondary hash h(k)=(ak+b mod p) mod N
function d(k) cannot The possible values for
have zero values d2(k) are
The table size N must be
a prime to allow probing 1, 2, … , q
of all the cells

Dictionaries and Hash Tables 14 Dictionaries and Hash Tables 17

Example of Double Hashing Proof of Universality (Part 1)

Consider a hash k h (k ) d (k ) Probes Let f(k) = ak+b mod p So a(j-k) is a multiple of p
table storing integer Let g(k) = k mod N But both are less than p
keys that handles 18 5 35 So h(k) = g(f(k)). So a(j-k) = 0. I.e., j=k.
collision with double f causes no collisions: (contradiction)
hashing 41 2 12 „ Let f(k) = f(j). Thus, f causes no collisions.
„ Suppose k<j. Then
„ N = 13 22 9 69
„ h(k) = k mod 13
„ d(k) = 7 − k mod 7 44 5 5 5 10

Insert keys 18, 41, 59 7 47
22, 44, 59, 32, 31,
73, in this order 32 6 36

31 5 4 59 0

73 8 48

0 1 2 3 4 5 6 7 8 9 10 11 12 aj + b − ⎢ aj + b⎥ p = ak + b − ⎢ ak + b ⎥ p
⎣⎢ p ⎦⎥ ⎢⎣ p ⎥⎦

31 41 18 32 59 73 22 44 a( j − k ) = ⎜⎜⎛⎝ ⎢ aj + b ⎥ − ⎢ ak + b ⎥ ⎟⎞⎠⎟ p
⎣⎢ p ⎥⎦ ⎣⎢ p ⎥⎦
0 1 2 3 4 5 6 7 8 9 10 11 12

Dictionaries and Hash Tables 15 Dictionaries and Hash Tables 18

3

Dictionaries 1/19/2005 11:37 PM

Proof of Universality (Part 2)

If f causes no collisions, only g can make h cause
collisions.

Fix a number x. Of the p integers y=f(k), different from x,
the number such that g(y)=g(x) is at most ⎡p / N ⎤ −1

Since there are p choices for x, the number of h’s that will
cause a collision between j and k is at most

There are p(p-1) p(⎡p / N ⎤ −1) ≤ p( p −1) of collision is
at most proNbability
functions h. So

p( p − 1) / N = 1
p( p −1) N

Therefore, the set of possible h functions is universal.

Dictionaries and Hash Tables 19

4


Click to View FlipBook Version