Benchmarking string hash tables

So first let’s try the Google farmhash hash function, using a few fast hash table implementations.

We will do a similar benchmark as the one we did for integers a while back, but with strings. Actually, let’s add an integer version as well.

Continue reading Benchmarking string hash tables

Benchmarking hash functions

It turns out that for integer hash tables implementations the identity function (f(x)=x) is preferred in most cases. This is a trade-off between speed and desirable properties such as uniform distribution that works well for integers. In essence the speed of attempted lookups is so fast that we can afford the increased number of collisions a naive hash function will give us.

Continue reading Benchmarking hash functions

Test-driven C – Autotools, cmocka


It seems Autotools is more often than not spoken about in negative terms. I believe this is because approaching the framework can be a bit daunting and that the work flow can seem bureaucratic. As a typical C programmer I’m a control freak that want to understand everything that goes on under the hood, and as soon as I see a cryptic macro hiding a lot of functionality in a way that is not clear, I tend to shy away instinctively.

Continue reading Test-driven C – Autotools, cmocka

Benchmarking dynamic array implementations

Continuing looking at the performance of data structures and different implementations and languages, let’s look at a simpler but very common container, the dynamic array.

The interesting performance properties of a dynamic array is constant, O(1), lookup time and amortized constant insertion and deletion at the end of the array. Random insertion on the other hand is linear, O(n), and typically something you want to avoid.

Continue reading Benchmarking dynamic array implementations

libdynamic – vector

I think the lack of simple and basic dynamic data structures is a major reason why people sometimes struggle with programming in C.

To take a simple example the naive task of splitting a string into words is near trivial in C++ using the standard library, while in C you are challenged with how to dimension the resulting array. In most cases likely the solution here would be to declare a fixed size array with a MAX_WORDS constant size, and later when the parsing is done and you know the number of words, declare a new array into which you copy the result, and then return. Hardly an elegant solution, and you end up with making difficult assumptions regarding how many words a line realistically can contain.

Continue reading libdynamic – vector

Benchmarking hash table implementations – Java

Java implementations

It was pointed out that some Java hash table implementations were supposed to be faster than the native HashMap. This test was run enabling the Java process access to all 8 virtual cores, which was not the case in the previous benchmarks.


  • Java native HashMap – Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
  • Java Colt – v1.0.2
  • Java Trove – v3.0.3

Continue reading Benchmarking hash table implementations – Java

Benchmarking hash table implementations

A first iteration of benchmarks of different hash table implementations.


Tested implementations

  • C++11 unordered_map – Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
  • C++ Google DenseHash – Google SparseHash v2.0.2
  • Java native HashMap – Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
  • C khash – Above clang environment with khash v0.2.8
  • Lua builtin – LuaJIT v2.0.3

Continue reading Benchmarking hash table implementations