Benchmarking string hash tables

So first let’s try the Google farmhash hash function, using a few fast hash table implementations.

We will do a similar benchmark as the one we did for integers a while back, but with strings. Actually, let’s add an integer version as well.

The benchmark

We will use a dictionary to create random string and integer keys, insert these into a hash table, shuffle them around, and look them up.

for (j = 0; j < n; j ++)
    m->insert(std::make_pair(data[j], j));

shuffle(data, n);

t1 = ntime();
for (j = 0; j < n; j ++)
    sum += (*m)[data[j]];
t2 = ntime();

System

The tests are run on a Google Compute Engine (n1-highmem-2), running an Arch Linux toolbox inside CoreOS, and compiled with g++ 4.9.2.

Candidates

Results

Rate is measured in lookup operations per second where size is the number of entries in the hash table. All string versions use the Google farmhash hash function.

hash-table-str-benchmark

hash-table-int-benchmark

 Observations

  • The string hash table performance is similar, but specific implementation is still of importance, with libdynamic being close to twice as fast as the standard C++ library
  • Using integer keys is an order of magnitude faster than using string keys
  • In the integer hash table performance where we use the trivial f(x)=x identity hash function the choice of implementation is more important, with libdynamic being around 6x faster than the standard C++ library
  • tommy_hashdyn is the only candidate so far using chaining (where the others use open addressing) which the benchmark seems to indicate makes it more favorable for smaller sized hash tables
Advertisements

8 thoughts on “Benchmarking string hash tables”

  1. Hi Fredrik,

    Author of TommyDS here.

    Thanks for the benchmark! But please take care that you are using tommy-hashdyn outside its scope.

    tommy-hashdyn is an intrusive data structure, that requires to have its node embedded in the object that you store in the hashtable.
    In your benchmark this doesn’t happen, and you have an extra malloc() call for each value inserted, that obviously kills the performance.

    So, for storing ints, tommy-hashdyn is just not a good match, as doesn’t make sense to have a node bigger than the object itself. For strings, it could work, but only embedding the node inside the string.

    Another aspect that you are not evaluating is that you are testing in the best condition for an open addressing hash table, with only insertions and no deletions.
    This always results in a table without any “deleted” entry, that allows to search for elements always in the best condition.

    Anyway, if you are interested, I’ve adapted the TommyDS benchmark for your libdynamic implementation, and you can find the resulting graphs at :

    http://tommyds.sourceforge.net/beta/libdynamic/

    Note that likely you would like to adjust your implementation in reallocation as it seems to have a very slow worst case with some specific sizes just a little smaller than a power of 2.

    For example, see the random_change graph :

    The problem seems to be that if the table size is near at the realloc limit, a small sequence of inserts and deletes triggers a reallocation, but the new size chosen is the same as before, resulting in another reallocation in a short time.

    You can see the benchmark source at :

    https://github.com/amadvance/tommyds/tree/libdynamic

    Ciao,
    Andrea

    1. Hi Andrea,

      tommy-hashdyn is an intrusive data structure, that requires to have its node embedded in the object that you store in the hashtable.
      In your benchmark this doesn’t happen, and you have an extra malloc() call for each value inserted, that obviously kills the performance.

      Here we are only plotting the lookup performance, so the malloc associated with an insert should not affect the resulting performance. Even so, if we were to benchmark the insert performance I’m not sure I agree with the idea that persisting state for a mapping entry should be excluded, intrusive structure or not. This would to me seem to be an unfair comparison since all implementations are burdened with this task in one way or another.

      So, for storing ints, tommy-hashdyn is just not a good match, as doesn’t make sense to have a node bigger than the object itself. For strings, it could work, but only embedding the node inside the string.

      If you say using tommy_hashdyn for an int:int mapping is not a good idea I can of course just remove it. Could you give an example of how you would suggest using it for a char *:int mapping?

      Another aspect that you are not evaluating is that you are testing in the best condition for an open addressing hash table, with only insertions and no deletions.
      This always results in a table without any “deleted” entry, that allows to search for elements always in the best condition.

      True, in this microbenchmark I’m only measuring lookups resulting in a hit in map without deletes. I would perhaps argue, similar to above, that a delete should include depersisting mapping state for that entry. I suppose there will always be special cases that favour a specific algorithm, but it would be interesting to look closer at this case.

      Anyway, if you are interested, I’ve adapted the TommyDS benchmark for your libdynamic implementation, and you can find the resulting graphs at : http://tommyds.sourceforge.net/beta/libdynamic/

      Interesting. I will have a closer look at this.

  2. Hi Fredrik,

    About string, what could be done is to have an object like :

    struct object {
    tommy_node node;
    char str[1];
    };

    and allocating them like:

    struct object* obj = malloc(sizeof(struct object) + strlen(str));
    strcpy(obj->str, str);

    This removes the extra malloc() call and gains in data locality because the node and the string are likely going in the same cache line.
    Obviously, this is not fair respect other data structures that handle allocation transparently, but it’s more unfair using an intrusive data structure in a not intrusive way 🙂

    PS:
    To fix the worst case issue in your libdynamic, likely you need just to change in map_int_rehash() how you compute the new size, from :

    capacity = map_int_roundup_size(capacity_requested);

    to:

    capacity = map_int_roundup_size(capacity_requested + map->deleted);

    This removes the bottleneck.

    Ciao,
    Andrea

    1. Hi Andrea,

      This removes the extra malloc() call and gains in data locality because the node and the string are likely going in the same cache line.
      Obviously, this is not fair respect other data structures that handle allocation transparently, but it’s more unfair using an intrusive data structure in a not intrusive way 🙂

      Hm, yes I suppose it’s motivated if the gain is large enough, let’s find out.

      I suppose there are two general use cases. Either object state is already kept, independently of the hash table, or you use the table to keep state. I see your point but of course the comparison with a data structure that is not intrusive becomes problematic in some ways.

      capacity = map_int_roundup_size(capacity_requested + map->deleted);
      This removes the bottleneck.

      Yes, I did something quite similar as a work around, but I need to rethink some strategies regarding some edge cases regardless.

      In your use of map_int, you should change the following:

      unsigned key = INSERT[i];
      unsigned hash_key = hash(key);
      struct libdynamic_object* obj = &LIBDYNAMIC[i];
      obj->value = key;
      map_int_insert(libdynamic, hash_key, &obj);

      to

      unsigned key = INSERT[i];
      struct libdynamic_object* obj = &LIBDYNAMIC[i];
      obj->value = key;
      map_int_insert(libdynamic, key, &obj);

      The key argument for …_insert(), __at() is not a hash of the key, but instead the key itself. This has different implications. The hash used is the identity hash f(x) = x, which is not uncommon. This is by design and changes things quite a bit. Ending up with degenerated cases when large parts are deleted is still though a problem. However forward/random hits should be on top.

      (the below is with a similar work around for the insert/remove edge case, and correct key argument)

      # ./tommybench -n 1000000 -d tommy-hashdyn
      Tommy benchmark program.
      1000000 tommy-hashdyn forward
      forward, insert, tommy-hashdyn, 165 [ns]
      forward, change, tommy-hashdyn, 172 [ns]
      forward, hit, tommy-hashdyn, 40 [ns]
      forward, miss, tommy-hashdyn, 38 [ns]
      forward, size, tommy-hashdyn, 48 [byte]
      forward, remove, tommy-hashdyn, 93 [ns]
      1000000 tommy-hashdyn random
      random, insert, tommy-hashdyn, 160 [ns]
      random, change, tommy-hashdyn, 296 [ns]
      random, hit, tommy-hashdyn, 58 [ns]
      random, miss, tommy-hashdyn, 33 [ns]
      random, size, tommy-hashdyn, 48 [byte]
      random, remove, tommy-hashdyn, 186 [ns]
      OK
      # ./tommybench -n 1000000 -d libdynamic
      Tommy benchmark program.
      1000000 libdynamic forward
      forward, insert, libdynamic, 32 [ns]
      forward, change, libdynamic, 46 [ns]
      forward, hit, libdynamic, 3 [ns] <=== !!!
      forward, miss, libdynamic, 269 [ns] <=== still not very good
      forward, remove, libdynamic, 8 [ns]
      1000000 libdynamic random
      random, insert, libdynamic, 63 [ns]
      random, change, libdynamic, 288 [ns]
      random, hit, libdynamic, 48 [ns] <=== also better than others
      random, miss, libdynamic, 21 [ns]
      random, remove, libdynamic, 73 [ns]

  3. Hi Fredrik,

    Sorry, but I cannot change my test to avoid to use a hash function in the case of libdynamic when all the other hashtables are using it.
    No doubt that it’s faster without hashing, but the same will happen also with other hashtables.

    Also, a hashtable without a hash function doesn’t really make sense. We cannot even call it “hashtable” if there isn’t a hash 😉

    Just think what would happen if the lowest bits of the keys are always 0. All the keys would go inside a single bucket, making it extremely slow.
    The hash function is necessary to ensure that the performance is independent on the distribution of the keys.

    Ciao,
    Andrea

    1. Hi Andrea,

      Technically it is a hash function, the identity function f(x) = x. It’s used by default by std::unordered_map, google::dense_hash_map and khash (and many others), of course for a good reason. Of course it is a trade-off, but one that is often worth doing when using open addressing. And for some sub-algorithms required, even. And as with all trade-offs, it has it’s drawbacks as well. 🙂

      I’m not really asking you to change the test though, I’m just saying that you are breaking the API if you try to use the hash as key. 🙂 Hash collisions are possible and will break the functionality. (I believe khash is broken in your benchmark in the same way)

      Are you using the same hash function? I might be missing something here but I see references to hash() as well as tommy_inthash_u32()?

      Regards,
      Fredrik

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s