So first let’s try the Google farmhash hash function, using a few fast hash table implementations.
We will do a similar benchmark as the one we did for integers a while back, but with strings. Actually, let’s add an integer version as well.
We will use a dictionary to create random string and integer keys, insert these into a hash table, shuffle them around, and look them up.
for (j = 0; j < n; j ++) m->insert(std::make_pair(data[j], j)); shuffle(data, n); t1 = ntime(); for (j = 0; j < n; j ++) sum += (*m)[data[j]]; t2 = ntime();
The tests are run on a Google Compute Engine (n1-highmem-2), running an Arch Linux toolbox inside CoreOS, and compiled with g++ 4.9.2.
- std::unordered_map – C++11 standard library hash table
- google::dense_hash_map – Google hash table implementation
- libdynamic – C hash table implementation
- ulib – High performance C/C++ library
- tommy_hashdyn – High performance C library
Rate is measured in lookup operations per second where size is the number of entries in the hash table. All string versions use the Google farmhash hash function.
- The string hash table performance is similar, but specific implementation is still of importance, with libdynamic being close to twice as fast as the standard C++ library
- Using integer keys is an order of magnitude faster than using string keys
- In the integer hash table performance where we use the trivial f(x)=x identity hash function the choice of implementation is more important, with libdynamic being around 6x faster than the standard C++ library
- tommy_hashdyn is the only candidate so far using chaining (where the others use open addressing) which the benchmark seems to indicate makes it more favorable for smaller sized hash tables