Blog

The latest round of our ongoing Framework Benchmarks project is now available! Round 9 updates several test implementations and introduces a new test environment using modern server hardware.

View Round 9 results

Since the first round, we have known that the highest-performance frameworks and platforms have been network-limited by our gigabit Ethernet. For Round 9, Peak Hosting has provided a high-performance environment equipped with 10-gigabit Ethernet at one of their data centers.

With ample network bandwidth, the highest-performing frameworks are further differentiated. Additionally, the Peak test environment uses servers with 40 HT cores across two Xeon E5 processors, and the results illustrate how well each framework scales to high parallelism on a single server node.

View Round 9 results now.

Round 9 notes and observations

In previous rounds, JSON serialization results for the highest-performance frameworks on our in-house i7 hardware converged on approximately 200,000 requests per second. Although some response headers are normalized, we believe that the small variations beyond 200,000 RPS are caused by how each framework uses the available bandwidth of our gigabit Ethernet.

For example, a larger number of response headers might mean slightly fewer responses sent per second. However, this is acceptable as we want each implementation to be production-grade and typical for the framework, not tuned to include only the response headers we require. Thorough normalization of behavior is not a goal. (More importantly, we recommend against making decisions based on such small variations. 200,000 versus 20,000 is notable; 210,000 versus 200,000 is probably not.)

Top JSON results on i7 versus Peak
Top ten frameworks at JSON serialization; i7 on left, Peak on right

With 10-gigabit Ethernet, the results from the new Peak environment show a larger spread between top-performing frameworks. Of course, other variables are at play, such as significantly more HT processor cores--40 versus 8 for our i7 tests--and a different system architecture, NUMA.

The NUMA architecture presented a variety of challenges including unexpected difficulty scaling the database servers to utilize all available CPU cores. For example, we are using MySQL 5.5 in our database tests, but later versions of MySQL are reportedly better suited for NUMA. For this reason, we may migrate all MySQL tests to a more recent version in a future round. Having reviewed the results, we plan to use differently-equipped machines for Round 10. The hardware specifications used in Round 9 were not well-optimized for the web-app use case our project exercises.

As a result of the concessions made for NUMA, the performance delta for database tests between our i7 workstations and the the 40-core Xeon servers is not as pronounced as the non-database tests. We'd like to see this situation improve in future rounds. We would be happy to receive input and advice from readers with NUMA database deployment expertise.

Similarly, we expect that some application platforms are better suited for NUMA than others. For example, platforms that use process-level concurrency scaled to a large number of processor cores quite well. For JSON serialization, node.js performance in the Peak environment is about 3.3x that of i7. Meanwhile, Go's JSON serialization performance on Peak is only 1.6x that of i7. Even more interesting: Go's database performance is slightly lower on Peak than i7. (Yes, the Go tests use all CPU cores. Putting aside scaling as CPU cores increase, Go is quicker in the JSON serialization test than node.js in both environments.)

C++ continues to dominate virtually all tests. If you are a web developer using C++, you win this particular bragging-rights game. With 6,738,911 plaintext RPS from this single server, the remarkable 188,585 Fortunes RPS might be overlooked at first although it may be more impressive on consideration.

For Round 9, we added validation checks that confirm each implementation is behaving as we expect prior to measuring its performance. These checks are rigid--failing implementations for improper JSON response schema, for example--but ultimately a valuable tool to ensure fairness. We've fixed up many of the implementations to pass validation, but several remain to be fixed. If your favorite framework is showing up as "did not complete" at the bottom of the charts, we'd appreciate your help in correcting that for the next round. Visit the GitHub repository if you can lend a hand.

Busy schedules delayed Round 9. We apologize for the delay and thank you for your continued interest and patience! We welcome you to join in for Round 10.

Thanks!

As always, thank you to all of the contributes to this project, especially those who helped us address validation errors for this latest round.

An extra special thanks to Peak Hosting for providing a no-joking-around, no-holds-barred, genuine server environment. It has been a treat to see these servers set massive new records.

If you have questions, comments, criticism, or would like to contribute a new test or an improvement to an existing one, please join our Google Group or visit the project at Github.

About TechEmpower

We provide web and mobile application development services and are passionate about application performance. We are presently looking for a generalist web/mobile developer to expand our team. If this sounds interesting to you, we'd love to hear from you.

As we and our collaborators prepare Round 9 of our Framework Benchmarks project, we had an epiphany:

With high-performance software, a single modern server processes over 1 million HTTP requests per second.

Five months ago, Google talked about load-balancing to achieve 1 million requests per second. We understand their excitement is about the performance of their load balancer1. Part of what we do is performance consulting—so we are routinely deep in request-per-second data—and we recognized a million requests per second as an impressive milestone.

But fast-forward to today, where we see the same response rate from a single server. We had been working with virtual servers and our modest workstations for so long that these data were a bit of a surprise.

The mind immediately begins painting a world of utter simplicity, where our applications' scores of virtual servers are rendered obsolete. Especially poignant is the reduced architectural complexity that an application can reap if its performance requirement can be satisfied by a single server. You probably still want at least two servers for resilience, but even after accounting for resilience, your architectural complexity will likely remain simpler than with hundreds of instances.

Our project's new hardware

For Round 9 of our benchmarks project, Peak Hosting has generously provided us with a number of Dell R720xd servers each powered by dual Xeon E5-2660 v2 CPUs and 10-gigabit Ethernet. Loaded up with disks, these servers are around $8,000 a piece direct from Dell. Not cheap.

But check out what they can do:

techempower@lg01:~$ wrk -d 30 -c 256 -t 40 http://10.0.3.2:8080/byte Running 30s test @ http://10.0.3.2:8080/byte 40 threads and 256 connections Thread Stats Avg Stdev Max +/- Stdev Latency 247.05us 3.52ms 624.37ms 99.90% Req/Sec 27.89k 6.24k 50.22k 71.15% 31173283 requests in 29.99s, 3.83GB read Socket errors: connect 0, read 0, write 0, timeout 9 Requests/sec: 1039305.27 Transfer/sec: 130.83MB

This is output from Wrk testing a single server running Undertow using conditions similar to Google's test (1-byte response body, no HTTP pipelining, no special request headers). 1.039 million requests per second.

Obviously there are myriad variables that make direct comparison to Google's achievement an impossibility. Nevertheless, achieving a million HTTP requests per second over a network without pipelining to a single server says something about the capacity of modern hardware.

It's possible even higher numbers would be reported had we tested a purpose-built static web server such as nginx. Undertow is the lightweight Java web application server used in WildFly. It just happens to be quite quick at HTTP. Here's the code we used for this test:

public class ByteHandler implements HttpHandler { private static final String aByte = "a"; @Override public void handleRequest(HttpServerExchange exchange) throws Exception { exchange.getResponseHeaders().put( Headers.CONTENT_TYPE, TEXT_PLAIN); exchange.getResponseSender().send(aByte); } }

In Round 9 (coming soon, we swear!), you'll be able to see the other test types on Peak's hardware alongside our i7 workstations and the EC2 instances we've tested in all previous rounds. Spoiler: I feel bad for our workstations.

Incidentally, if you think $8,000 is not cheap, you might want to run the monthly numbers on 200 virtual server instances. Yes, on-demand capacity and all the usual upsides of cloud deployments are real. But the simplified system architecture and cost advantage of high-performance options deserve some time in the sun.

1. Not only that, Google expressly said they were not using this exercise to demonstrate the capacity of their instances but rather to showcase their load balancer's performance. However, the scenario they created achieved massive request-per-second scale by load balancing hundreds of instances. We are simply providing a counter-point that the massive scale achieved by hundreds of instances can be trivially mimicked by a single modern server with modern tools. The capacity of a single server may not be surprising to some, but it may come as a surprise to others.

December 17, 2013

Framework Benchmarks Round 8

Merry Christmas web framework performance aficionados! What better way to celebrate the holidays than by cheering on your favorites as they race through a variety of application fundamentals in the biggest web platform grudge match of the season? We certainly can't think of anything more festive.

Now at 90 frameworks and 230 permutations (variations on configuration), Round 8 has something for everyone. And if it doesn't have what you want, you can join the party! We have fruitcake and egg nog. Or maybe not. But we enjoy pull requests; they're almost as good as egg nog.

A veritable rainbow of holiday cheer awaits!

View Round 8 results

View Round 8 results now.

Round 8 notes and observations

  • Go, always the scrappy competitor, flexes some performance muscle and lands a razor-thin victory in JSON serialization on i7 hardware. But be aware, the highest-performance frameworks are network limited in the JSON serialization and Plaintext tests. Anything in this "200k club" is sure to keep overhead at a bare minimum, leaving maximum headroom for your custom application logic.
  • Round 7 was missing some of the Go frameworks due to configuration problems. Those problems have been resolved, and the Go frameworks have returned in Round 8 to reaffirm that Go is a viable performance rival to the JVM.
  • The HipHop PHP VM with no framework and thanks in part to the MySQL driver for PHP, yields dominion over the Updates test. HHVM is impressive in the multiple query test as well. However, hhvm trails plain PHP in the Fortunes test presently. Implementation details may be at play here. If you're interested in testing HHVM with popular PHP frameworks, we would be happy to receive a pull request.
  • Vert.x and Netty have wrestled the Plaintext crown from Undertow, but this rivalry isn't yet settled. Rumor has it they have more improvements in store for Round 9. Meanwhile, a newcomer named Plain (which may rival Go as the most in need of a more search-friendly name; though the irony of Go makes it uncontested champion) is right behind the leaders. Most interestingly, Plain demonstrates the highest Windows performance we've seen by a massive margin (reaching 611,095 pipelined plaintext requests per second on i7). Once again, bear in mind that these tests are network-bound by our gigabit Ethernet.
  • On EC2, the Netty and Vert.x upgrades have paid huge dividends with Netty now breaking 200,000 pipelined plaintext responses per second on a humble m1.large instance.
  • Grizzly performance on JSON serialization is off from its Round 7 showing, but unfortunately, we have not yet determined the cause.
  • The Plaintext test requirements were clarified. It is not necessary to copy the bytes of the small response payload per request. Using a pre-rendered byte buffer for the body is acceptable as long as that is conventional for the platform or framework being tested and response headers are composed normally.
  • The maximum query and update performance for Mongo on EC2 is substantially higher than MySQL. When looking at that chart in particular, consider filtering by your preferred data store to maintain a useful perspective. Related: a late change to the Mongo "schema" intended to replace "id" with "_id" caused some challenges. We further postponed Round 8 to re-run Mongo tests with a schema that provides both columns to allow all tests to complete. We want to normalize the implementations for Round 9.
  • We were targeting early December for Round 8 and we're off by about two weeks. There is still room for improvement toward our goal of a monthly cycle. We will target mid-January for Round 9.

Thanks!

A big thank-you to all of the contributors who have added and improved existing test implementations for Round 8.

The contributors for Round 8 are, in no particular order: @lhotari, @methane, @pseudonom, @lucassp, @aualin, @weltermann17, @kpacha, @nareshv, @martin-g, @bclozel, @ijl, @bbrowning, @sbordet, @purplefox, @stuartwdouglas, @normanmaurer, @kardianos, @hamiltont, and @julienschmidt.

If you have questions, comments, criticism, or would like to contribute a new test or an improvement to an existing one, please join our Google Group or visit the project at Github.

About TechEmpower

We provide web and mobile application development services and are passionate about application performance. Read more about what we do.