For a project I am working on at school, we wanted to compare using a thread-per-connection server against using a single threaded event driven server. This is a subject that has been studied to death (e.g.: [1, 2, 3, 4, 5]), but it is still controversial. Somewhat recently I came across an online discussion claiming that threads are faster than Java NIO (second source). However, the benchmark that is discussed there is primarily throughput limited. Our application processes many tiny messages, so I expect the results could be different. Hence, this benchmark tests a simple echo server. The server reads a client request (a length prefixed blob of bytes) then echos that request back. We wrote four versions: C++ and Java, epoll and threads.
Each client sends one request then waits for the response. When it gets the response, it immediately sends another request. Each trial ran for 30 seconds, with 5 seconds of warm up. The graph shows the average of 10 trials, with the 95% confidence interval. The tests were ran on the localhost of a 1.8GHz Core2 Quad machine with 8 cores (2 Core2 Quad CPUs), running Linux 2.6.25 in 64-bit mode. Each server was fixed to a single core using numactl
, in order to avoid giving the threaded servers more resources than the single threaded servers. Java was run with the -server
and -XX:+UseSerialGC
flags, as they were found to improve performance.
The results are not very surprising. Generally, C++ performs slightly better than Java, but only by a small margin. I think that the C++ version makes fewer copies than the Java code, so the "C++ versus Java" comparison here is unfair. Generally, they scale and perform very closely to one another. The single threaded servers performed better than the multithreaded servers, likely because it can avoid expensive process switches.
Hopefully this code should serve as a good example of how to write threaded or event based servers in both Java and C++. This source code is also the Mercurial repository I used to develop this, so feel free to send me patches if you have any improvements.
Source code: javanetperf.tar.bz2
Update 2009-09-21: I've written another article about NIO, byte[] and ByteBuffer performance.