Benchmarking dynamic array implementations

Continuing looking at the performance of data structures and different implementations and languages, let’s look at a simpler but very common container, the dynamic array.

The interesting performance properties of a dynamic array is constant, O(1), lookup time and amortized constant insertion and deletion at the end of the array. Random insertion on the other hand is linear, O(n), and typically something you want to avoid.

Another important property is the ability to grow dynamically, simplifying the need to make non-trivial assumptions when developing code. How many entries should an exec() argument array be able to hold?  How many MIME fields should be allowed in a HTTP header?  Since there is no obvious correct answer the actual limits will often vary between different programs and environments, creating problems and complicating life for users and developers.

Of course, however, all resources are finite and some limits will be imposed in some way or another.

The benchmark

In this microbenchmark we will simply push 100 million integers, in turn, to the end of a dynamic array,  measuring the time needed to do so. All candidates now use 64 bit number representations on 64 bit platforms.

In C++ the code will look something similar to the below, where n is 100M and ntime() returns some timing information, here in nanoseconds. We get the initial time plus 100 time measurements and plot the result.

uint64_t i, j, t, m[101];
std::vector<uint64_t>; v;
t = ntime();
m[0] = 0;
for (i = 0; i < n; i += n / 100)
  {
    for (j = 0; j < n / 100; j ++)
      {
        v.push_back(j);
      }
    m[i / (n / 100) + 1] = ntime() - t;
  }

System

The tests are done using

  • Arch Linux 3.16.3-1-ARCH (64 bit)
  • 1 CPU (restricted) on an Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
  • 6 GB RAM

Candidates

  • Java Vector<Long> – OpenJDK1.7.0_65/IcedTea 2.5.2
  • Java ArrayList<Long> – OpenJDK1.7.0_65/IcedTea 2.5.2
  • Java Trove TLongArrayList – OpenJDK1.7.0_65/IcedTea 2.5.2 + Trove v3.0.3
  • Rust std::vec with jemalloc (default) – Rust 0.12.0-nightly 2014-09-24
  • Rust std::vec with glibc allocator – Rust 0.12.0-nightly 2014-09-26 (configured with –disable-jemalloc)
  • C++ std::vector – gcc 4.9.1 20140903
  • C libdynamic vector with glibc allocator (default) – gcc 4.9.1 20140903 + libdynamic
  • C libdynamic vector with jemalloc – gcc 4.9.1 20140903 + libdynamic
  • LuaJIT table.insert() – LuaJIT 2.0.3
  • LuaJIT vector assign – LuaJIT 2.0.3

Disclaimer

  • The intention is not to compare programming languages and try to conclude if language X is faster/better than language Y
  • Results are likely vary depending on hardware, OS, compilers, and other parameters

Results

vector_grow

vector_grow_linear

Candidate Time
1 C libdynamic vector (glibc allocator) 1.92 ns/insert
2 Rust std::vec (glibc allocator) 2.38 ns/insert
3 C libdynamic vector (jemalloc) 4.47 ns/insert
4 Rust std::vec (jemalloc) 4.91 ns/insert
5 C++ std::vector 5.01 ns/insert
6 LuaJIT vector assign 6.82 ns/insert
7 Java Trove TLongArrayList 14.6 ns/insert
8 Java ArrayList<Long> 190 ns/insert
9 Java Vector<Long> 230 ns/insert
8 LuaJIT table.insert() 368 ns/insert

Observations

  • Assuming an implementation with low overhead, the most important factor (as noted by Daniel Micay) is the memory allocator, with jemalloc here being more than twice as slow as the glibc allocator (jemalloc/jemalloc#126)
  • After adjusting for memory allocator , C/C++ and Rust perform very similarly
  • Using the natural table.insert() in LuaJIT scales very badly and simply breaks all performance ambitions
  • In LuaJIT the somewhat less elegant workaround of using a separate index and a vector[index] assignment  speeds things up by a factor of 50x in this test
  • Java Vector<Long> and ArrayList<Long> perform very poorly, perhaps indicating that the code is not properly optimized at runtime
  • Java Trove TIntArrayList however still does well performing 10x faster than the other Java candidates
Advertisements

3 thoughts on “Benchmarking dynamic array implementations”

  1. After a comment from Raymond W Ko, and Mike Pall, the LuaJIT version was improved by several orders of magnitude by using an indexed vector assignment instead of the apparently very slow table.insert()

  2. Oh come on. Please just let go of your attachment to Lua table.insert(). It is decidedly not the “natural” or idiomatic way to append to a table. It’s primary purpose and usefulness is that it can insert in the middle of an array and push all the other elements up by 1 position. If you only want to *append* to an array, you use t[len+1], as you were already told multiple times. I repeat, there is nothing “natural”, or logical or sensible about using table.insert to append to an array.

    1. With natural I mean the suggested way in tutorials and manuals.

      For example here (http://lua-users.org/wiki/TablesTutorial)

      There are two ways to add an item to the end of an array:

      > t = {}
      > table.insert(t, 123)
      > t[#t+1] = 456
      > = t[1]
      123
      > = t[2]
      456

      Both these ways suffer from the performance degradation.

      Or here: http://www.lua.org/pil/19.2.html

      Not being able to have self-contained data containers for performance reasons could be seen as an issue but the point here is merely to note this, and not to bash Lua as a language.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s