from here
Hash table and binary search tree are two fundamental data structures
in CS. When would you want to use one over another one? What is the
difference? This is a common interview question that I had most
frequently been asked.
Well, it is not easy to answer by one or two sentences. The main
difference between hash table and trees is on two aspects:
Implementation details and behaviors and performance under different
circumstance.
Let me start with implementation. Hash table uses hash function to
assign index to input data and put data into array under corresponding
index. If the size of hash table is 100, hash function can generate 0~99
to input data and store them into hash table. Theoretically, time
complexity of insertion and looking-up is constant time (O(1)).
Binary search tree is implemented as the rule that all left
children’s values are less than root, while all right children’s value
are greater than it. If the tree is balanced, it always takes O(log(n))
time to insert a new node or look up.O(log(n)) is not as fast as
constant time but rather fast. n is the total number in tree and log(n)
is usually depth of tree. Notice that I mentioned if it is balanced
tree, however there are sophisticated algorithm (e.g., RB tree) to
maintain tree balanced.
It seems we can prefer hash table over tree, but it is not always the case. Hash table has significant drawbacks:
1. As more data input comes, there is huge probability that collision
shows up (hash function maps different data to same index). There are
two ways to handle collision. First is linear probing that implement
hash table as array of linked list. In this case, worst time for
insertion or retrieve or deletion is O(n) that all input data are mapped
to same index. Besides, hash table need more space than number of input
data. Second way is open addressing. It would not consume more space
than input data, but at worst case insertion and retrieve is still O(n),
which is extremely slow than constant time.
2. You have to know approximate size of input data before
initializing hash table. Otherwise you need to resize hash table which
is a very time-consuming operation. For example, your hash table size is
100 and then you want to insert the 101st element. Not only the size of
hash table is enlarged to 150, all element in hash table have to be
rehashed. This insertion operation takes O(n).
3. The elements stored in hash table are unsorted. In certain
circumstance, we want data to be stored with sorted order, like contacts
in cell phone.
However, binary search tree performs well against hash table:
1. Binary search tree never meets collision, which means binary
search tree can guarantee insertion, retrieve and deletion are
implemented in O(log(n)), which is hugely fast than linear time.
Besides, space needed by tree is exactly same as size of input data.
2. You do not need to know size of input in advance.
3. all elements in tree are sorted that in-order traverse takes O(n) time.
OK, let me make a summary.
If you know how many data to maintain, and have enough space to store
hash table and do not need data to be sorted, hash table is always good
choice. Because, hash table provides constant time operation for
insertion, retrieve and deletion. On the other hand, if items will be
consistently added, binary search tree’s O(log(n)) operation is
acceptable, comparing with rehashing operation during running time.
Besides, If you actually do not know size of input items, but after
inserting, most operations are item looking up, hash table is preferred
due to constant retrieve time. However, if items are continuously added
or removed, tree’s O(log(n)) insertion and deletion time are more
suitable in this condition.
In a word, there is no one answer that hash table or tree is better.
All we need to know is pros and cons of hash table and tree in different
conditions. Best decision can be made with knowledge of benefits
and trade-offs of these two structures.
No comments:
Post a Comment