Why another benchmark
I’ve spent a lot of time benchmarking over the last years while building distributed systems and working with cost/performance optimization. Everything from infrastructure, hardware, storage, database and application solutions.
An understanding that gradually comes to you along the way is that benchmarking actually is very difficult. That is to say, benchmarking in itself is easy, but producing valuable data is not. The result seems to vary from “does at least give an indication” at best to “totally useless in real life scenarios”, with the latter typically occurring more often.
The reason is of course that there just are too many variables and unknowns, and to add to the difficulty some of them are quite complicated to simulate realistically. To be able to produce any data at all we make a lot of assumptions and simplify as much as possible, and keep maybe one or two variables, hoping that it will at least to a degree reflect on what we want to see.
Keeping this in mind it is of course obvious that you can’t put a lot of trust in a single benchmark, and even more obvious that you likely can’t trust someone who benchmarks with agenda at all. Lying with benchmarks is as easy as lying with statistics, you just pick the set of assumptions and fix variables where you perform at your best and your opposition at their worst. Knowing who has an agenda can be difficult, but someone who is benchmarking their own product, well, maybe has one…
This being said, I spend some time looking at the web-proxy Varnish at this summer and since I was curious of potential performance gain I did some benchmarks and decided to share them. I will actually redo them to make them a bit more up-to-date and I will probably skip Varnish itself since it actually is a somewhat different solution than a pure web server.
So this will be just another benchmark of web distribution of static content. If nothing else it will be an additional, for a brief period the most recent, indication of the performance of web server daemons running on Linux. Hopefully there will be a few valuable thoughts along the way.
I will benchmark
- Apache v2.2.21 – The old work horse
- Nginx v1.1.5 – Probably the most common Linux alternative to Apache
- Cherokee v1.2.99 – “The fastest free Web Server out there”
- Lighttpd v1.4.29 – That will “scale several times better with the same hardware than with alternative web-servers. “
- G-WAN v2.10.6 – According to the vendor the silver bullet that makes all other software regardless of purpose obsolete (and will cure disease and solve the worlds conflicts along the way)
Of these G-WAN is the one that sticks out by being not open-source. This obviously has implications that will discourage some but that subject is out of scope of this benchmark. (Note: The vendor is commercial and sell services like support around the product, but the software in itself is free of cost)
As to having an agenda, I’m writing this a private individual without any commercial interests. I’ve obviously used Apache, who hasn’t, but have over the last years worked much more with Nginx. I’ve spend some time looking at Lighttpd, and much less at G-WAN and Lighttpd. I will try to be as unbiased as I can.
So, a few assumptions and simplifications
- I will use a single DELL PowerEdge 1950 with a dual Xeon(R) CPU 5130 @ 2.00GHz (dual cores)with 16GB of RAM, running an updated Arch Linux (that finally convinced me to leave the OpenBSD world) with a 3.0.6 Linux kernel
- I will benchmark concurrent downloads of a single file with random content of the sizes of 1kB, 1MB and 50MB
- I will, at least initially, look at non keep-alive requests
- I will, at least initially, simulate load using 100 concurrent clients
- I will, at least intially, use default configurations for operating system and applications
- I will measure pure throughput in requests-per-second or Mbps
As you can see this will not be a full feathered final benchmark as I am more interested in looking at the process of benchmarking than in finally resolving the issue of which web software that is “the best and finest”. I had to lower the max file size to 50MB since I’m using XFS on all partitions except for /boot which is small, and G-WAN promptly refuses to run on XFS…
What can we with a critical eye say of these assumptions, well, probably a lot…
- The hardware is quite old with only 4 cores in total and different software will scale differently depending on number of cores
- The benchmarking tool will run in a single process without scaling over the available cores to compare with “ab”
- We will run clients and servers on the same hardware over localnet sharing resources
- All clients will use the exact same implementation, will run with virtually zero latency, without packet loss, congestion, or any other network issues
- We only look at static content, and all clients will download the same meaningless single file
- Performance or stability change over time will not be visible
- How the different software uses resources such as CPU, memory, etc. to produce the result will not be presented
- We could go on and on…
To summarize it’s hard to see how the scenario could even come close to being called relevant. In any real life production environment we would probably have a very large number of different files and the bottlenecks could very likely be something completely different such as disk or network I/O, and parameters such as compliance, reliability, security and stability over time would be more or less decisive. But let’s move on anyway.
Ok, maybe too much information… Time for some pictures to liven this up.
Results (using ab, average of 8 runs)
So how boring was this on a graph, there is almost nothing to tell from the results. G-WAN is somewhat faster with small files and Cherokee, slightly surprising me, wins the race for 1MB files.
One thing that comes to mind is the possibility that we’re not actually measuring the web daemon at all, but maybe instead the operating system or the benchmark tool itself.
When I did the benchmarks this summer it seemed to me that everyone was using the “ab” benchmark tool, and I failed to find an alternative. So I wrote my own, called “pounce”. I will release this right after this benchmark somewhere. I recently learned what Lighttpd has it’s own tool, called “weighttp” so lets try these and see if they make any difference.
Results (using ab/weighttp/pounce, average of 8 runs)
(ab is the first set of benchmarks, weighttp the second, pounce the third)
- After talking to the weighttp author it becomes clear that the intention with the program is to run it with the multithreading option enabled so the result is not representative and should be taken with a grain of salt
- The benchmark tool implementation clearly matters here with “pounce” performing clearly better on small files and much better on large
- The result seems to depend on the combination of tool and daemon with Apache for example coming out on top for 50MB files using “weighttpd” indicating that the fact that we are indeed sharing resources between benchmark tool and web daemon is an issue
- Relying on “pounce” G-WAN wins the race here by a small margin
So are we done here? Probably not but this post has become long enough. There are other things to consider but I’ll write another post later on about this.