We're happy to announce that Round 17 of the TechEmpower Framework Benchmarks project is now available. Since the adoption of Continuous Benchmarking, the creation of an official Round is a fairly simple process:

  1. Try to reduce errors in framework implementations. We want an official Round to have a solid showing by as many frameworks as feasible given limited personnel bandwidth.
  2. Select a continuous run on the physical hardware (named "Citrine") that looks good and identify its Git commit.
  3. Run the same commit on cloud (Azure).
  4. Write a blog entry and post the Round.

For weeks, we have been stuck at step 4 waiting on me to write something, so I'm going to keep it short and sweet to get this out before Halloween.

Stratified database results

As you review Round 17 results, you'll notice that Postgres database tests are stratified—there are groups of test implementations that seem implausibly faster than other test implementations.

The underlying cause of this is use of a Postgres protocol feature we have characterized as "query pipelining" because it is conceptually similar to HTTP pipelining. We call it pipelining, but you could also call it multiplexing. It's a feature of the "Extended Query" protocol in Postgres. Query pipelining allows database clients to send multiple queries without needing to wait for each response before sending the next. It's similar to batching but provided invisibly in the driver. Thereotically (and we believe in practice) short queries could be sent together in the same network packets.

Importantly (at least for us), the client application's business logic is unchanged. This is not batching of queries implemented by the application or web frameworks, but rather an optimization provided invisibly by the database driver.

We discussed the pros and cons of permitting this optimization in our tests. While we have disallowed several performance-optimizing tactics elsewhere as violations of the rules or spirit of our tests, we ultimately arrived at permitting this optimization for two reasons. First, it can be applied "seamlessly" to application code. The application code does not need to be aware this is happening. And second, because this isn't a trick applied at the application tier, but rather an optimization that potentially benefits all applications (once implemented by drivers), we feel this is precisely the sort of improvement this project should encourage.

In short, we think this feature in the Postgres wire protocol is great. We suspect that over time, more platforms will support the "pipelining" capability, gradually re-balancing the stratification we're seeing.

The effect of this feature is most emphatic on the Multi-query and Updates tests, both of which execute a tremendous number of queries per second. We are also considering adding an attribute to indicate and filter test implementations using traditional database connections versus those using pipelining.

Other updates

  • Round 17 is now measuring 179 frameworks.
  • New languages such as F# are now represented.
  • The results web site is a bit wider in response to feedback.

In the works

We are presently working on a few things that we hope to share with the community soon. These include:
  • Potential changes to the physical environment's network to allow further differentiation of ultra high-performance frameworks on the Plaintext test in particular.
  • Badges for framework maintainers to share and celebrate their performance tier, similar to popular continuous integration badges seen in GitHub repositories.


Thank you to all contributors and fans of this project. As always, we really appreciate your continued interest, feedback, and patience!

Round 17 is composed of:

Now in its fifth year, the TechEmpower Framework Benchmarks project has another official round of results available. Round 16 is a real treat for anyone who likes big numbers. Not just in measured results per second (several metric crap tonne), but in number of tests measured (~1830), number of framework permutations tested (~464), number of languages included (26), and total execution time of the test suite (67 hours, or 241 billion microseconds to make that sound properly enormous). Take a look at the results and marvel in the magnitude of the numbers.

Recent months have been a very exciting time for this project. Most importantly, the community has been contributing some amazing test implementations and demonstrating the fun and utility of some good-natured performance competition. More on that later. This is a longer-than-average TFB round announcement blog entry, but there is a lot to share, so bear with me.

Dockerification… Dockerifying… Docking?

After concluding Round 15, we took on the sizeable challenge of converting the full spectrum of ~460 test implementations from our home-brew quasi-sandboxed (only mostly sandboxed) configuration to a stupefying array of Docker containers. It took some time, but The Great Dockerification has yielded great benefits.

Most importantly, thanks to Dockerizing, the reproducibility and consistency of our measurements is considerably better than previous rounds. Combined with our continuous benchmarking, we now see much lower variability between each run of the full suite.

Across the board, our sanity checking of performance metrics has indicated Docker's overhead is immeasurably minute. It's lost in the noise. And in any event, whatever overhead Docker incurs is uniformly applicable as all test implementations are required to be Dockered.

Truly, what we are doing with this project is a perfect fit for Docker. Or Docker is a perfect fit for this. Whichever. The only regret is not having done this earlier (if only someone had told us about Docker!). That and not knowing what the verb form of Docker is.


New hardware

As we mentioned in March, we have a new physical hardware environment for Round 16. Nicknamed the “Citrine” environment, it is three homogeneous Dell R440 servers, each equipped with a Xeon Gold 5120 CPU. Characterized as entry- or mid-tier servers, these are nevertheless turning out to be impressive when paired with a 10-gigabit Ethernet switch.

Being freshly minted by Dell and Cisco, this new environment is notably quicker than equipment we have used in previous rounds. We have not produced a “difference” view between Round 15 and Round 16 because there are simply too many variables—most importantly this new hardware and Docker—to make a comparison remotely relevant. But in brief, Round 16 results are higher than Round 15 by a considerable margin.

In some cases, the throughput is so high that we have a new challenge from our old friend, “network saturation.” We last were acquainted with this adversary in Round 8, in the form of a Giant Sloar, otherwise known as one-gigabit Ethernet. Now The Destructor comes to us laughing about 10-gigabit Ethernet. But we have an idea for dealing with Gozer.

(Thanks again to Server Central for the previous hardware!)

Convergence in Plaintext and JSON serialization results

In Round 16, and in the continuous benchmarking results gathered prior to finalizing the round, we observed that the results for the Plaintext and JSON serialization tests were converging on theoretical maximums for 10-gigabit networking.

Yes, that means that there are several frameworks and platforms that—when allowed to use HTTP pipelining—can completely saturate ten gigabit per second with ~140-byte response payloads using relatively cheap commodity servers.

To remove the network bottleneck for future rounds, we are concocting a plan to cross the streams, in a manner of speaking: use lasers and the fiberoptic QSFP28 ports on our Cisco switch to bump the network capacity up a notch.

Expect to hear more about this as the plan develops during Round 17.

Continuous benchmarking

Introduced prior to Round 16, the continuous benchmarking platform really came into a fully-realized state in the past several months. Combined with the Great Dockening, we now see good (not perfect, but good) results materializing automatically every 67 hours or thereabouts.

Some quick points to make here:

  • We don't expect to have perfection in the results. Perfection would imply stability of code and implementations and that is not at all what we have in mind for this project. Rather, we expect and want participants to frequently improve their frameworks and contribute new implementations. We also want the ever-increasing diversity of options for web development to be represented. So expecting perfection is incongruous with tolerating and wanting dynamism.
  • A full suite run takes 67 hours today. This fluctuates over time as implementation permutations are added (or deleted).
  • Total execution time will also increase when we add more test types in the future. And we are still considering increasing the duration of each individual benchmarking exercise (the duration we run the load generator for to gather a single result datum). That is the fundamental unit of time for this project, so increasing that will approximately linearly increase the total execution time.
  • We have already seen tremendous social adoption of the continuous benchmarking results. For selfish reasons, we want to continue creating and posting official rounds such as today's Round 16 periodically. (Mostly so that we can use the opportunity to write a blog entry and generate hype!) We ask that you humor us and treat official rounds as the super interesting and meaningful events that they are.
  • Jokes aside, the continuous results are intended for contributors to the project. The official rounds are less-frequent snapshots suitable for everyone else who may find the data interesting.

Social media presence

As hinted above, we created a Twitter account for the TechEmpower Framework Benchmarks project: @TFBenchmarks. Don't don't @ us.

Engaging with the community this way has been especially rewarding during Round 16 because it coincided with significant performance campaigns from framework communities. Rust has blasted onto the server-side performance scene with several ultra high-performance options that are competing alongside C, C++, Go, Java, and C#.

Speaking of C#, a mainstream C# framework from a scrappy startup named Microsoft has been taking huge leaps up the charts. ASP.NET Core is not your father's ASP.NET.

Warming our hearts with performance

There is no single reason we created this project over five years ago. It was a bunch of things: frustration with slow web apps; a desire to quantify the strata of high-water marks across platforms; confirming or dispelling commonly-held hunches or prevailing wisdom about performance.View Round 16 results

But most importantly, I think, we created the project with a hopeful mindset of “perhaps we can convince some people to invest in performance for the better of all web-app developers.”

With expectations set dutifully low from the start, we continue to be floored by statements that warm our heart by directly or indirectly suggesting an impact of this project.

When asked about this project, I (Brian) have often said that I believe that performance improvements are best made in platforms and frameworks because they have the potential to benefit the entire space of application developers using those platforms. I argue that if you raise the framework's performance ceiling, application developers get the headroom—which is a type of luxury—to develop their application more freely (rapidly, brute-force, carefully, carelessly, or somewhere in between). In large part, they can defer the mental burden of worrying about performance, and in some cases can defer that concern forever. Developers on slower platforms often have so thoroughly internalized the limitations of their platform that they don't even recognize the resulting pathologies: Slow platforms yield premature architectural complexity as the weapons of “high-scale” such as message queues, caches, job queues, worker clusters, and beyond are introduced at load levels that simply should not warrant the complexity.

So when we see developers upgrade to the latest release of their favorite platform and rejoice over a performance win, we celebrate a victory. We see a developer kicking performance worry further down the road with confidence and laughter.

I hope all of the participants in this project share in this celebration. And anyone else who cares about fast software.

On to Round 17!

Technical notes

Round 16 is composed of:

We have retired the hardware environment provided by Server Central for our Web Framework Benchmarks project. We want to sincerely thank Server Central for having provided servers from their lab environment to our project.

Their contribution allowed us to continue testing on physical hardware with 10-gigabit Ethernet. Ten-gigabit Ethernet gives the highest-performing frameworks opportunity to shine. We were particularly impressed at Server Central's customer service and technical support, which was responsive and helpful in troubleshooting configuration issues even though we were using their servers free of charge. (And since the advent of our Continuous Benchmarking, we were essentially using the servers at full load around the clock.)

Thank you, Server Central!

New hardware for Round 16 and beyond

For Round 16 and beyond, we are happy to announce that Microsoft has provided three Dell R440 servers and a Cisco 10-gigabit switch. These three servers are homogeneous, each configured with an Intel Xeon Gold 5120 CPU (14/28 cores at 2.2/3.2 GHz), 32 GB of memory, and an enterprise SSD.

If your contributed framework or platform performs best with hand-tuning based on cores, please send us a pull request to adjust the necessary parameters.

These servers together compose a hardware environment we've named "Citrine" and are visible on the TFB Results Dashboard. Initial results are impressive, to say the least.

Adopting Docker for Round 16

Concurrent to the change in hardware, we are hard at work converting all test implementations and the test suite to use Docker. The are several upsides to this change, the most important being better isolation. Our past home-brew mechanisms to clean up after each framework were, at times, akin to whack-a-mole as we encountered new and fascinating ways in which software may refuse to stop after being subjected to severe levels of load.

Docker will be used uniformly—across all test implementations—so any impact will be imparted on all platforms and frameworks equally. Our measurements indicate trivial performance impact versus bare metal: on the order of less than 1%.

As you might imagine, the level of effort to convert all test implementations to Docker is not small. We are making steady progress. But we would gladly accept contributions from the community. If you would like to participate in the effort, please see GitHub issue #3296.