Web Framework Benchmarks

In the following tests, we have measured the performance of several web application platforms, full-stack frameworks, and micro-frameworks (collectively, "frameworks"). For more information, read the introduction, motivation, and latest environment details.

Show filters panel
Showing all frameworks.
We classify frameworks as follows:
  • Full-stack, meaning a framework that provides wide feature coverage including server-side templates, database connectivity, form processing, and so on.
  • Micro, meaning a framework that provides request routing and some simple plumbing.
  • Platform, meaning a raw server (not actually a framework at all). Good luck! You're going to need it.
Disable all
The principal programming language used by the framework.
Disable all
The platform is the low-level software or API used to host web applications for the framework; the platform provides an implementation of the HTTP fundamentals.
Disable all
Application operating system
The operating system hosting the application server and web server, if applicable.
Disable all
Front-end server
The front-end server ("web server") paired with the application framework, if applicable. Some frameworks provide a built-in web server.
Disable all
The database server paired with the application framework. Not every framework has tests for every database server.
Disable all
Database operating system
The operating system hosting the database server.
Disable all
Object-relational mapper (ORM) classification
We classify object-relational mappers as follows:
  • Full, meaning an ORM that provides wide functionality, possibly including a query language.
  • Micro, meaning a less comprehensive abstraction of the relational model.
  • Raw, meaning no ORM is used at all; the platform's raw database connectivity is used.
Disable all
Implementation approach
Implementation approach describes the test's design disposition.
  • A realistic implementation approach uses the framework with most out-of-the-box functionality enabled. We consider this realistic because most applications built with the framework will leave these features enabled.
  • A stripped implementation approach removes or outright avoids implementing features that are unnecessary for the particulars of this benchmark exercise. This might illuminate the marginal improvement available in fine-tuning a framework to your application's use-case.
Stripped implementations are hidden by default. See the Motivation and Question section for more information.
Disable all
In addition to filtering frameworks using the attributes above, you may hide frameworks one-by-one by clicking their names below.
Disable all
Close filters panel
Test types

WARNING: Preliminary data!
You are viewing a partial set of results. This run is incomplete.

Requirements summary

In this test, each request is processed by fetching a single row from a simple database table. That row is then serialized as a JSON response.

Example response:

HTTP/1.1 200 OK Content-Length: 32 Content-Type: application/json Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT {"id":3217,"randomNumber":2149}

For a more detailed description of the requirements, see the Source Code and Requirements section.

Requirements summary

In this test, each request is processed by fetching multiple rows from a simple database table and serializing these rows as a JSON response. The test is run multiple times: testing 1, 5, 10, 15, and 20 queries per request. All tests are run at 512 concurrency.

Example response for 10 queries:

HTTP/1.1 200 OK Content-Length: 315 Content-Type: application/json Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT [{"id":4174,"randomNumber":331},{"id":51,"randomNumber":6544},{"id":4462,"randomNumber":952},{"id":2221,"randomNumber":532},{"id":9276,"randomNumber":3097},{"id":3056,"randomNumber":7293},{"id":6964,"randomNumber":620},{"id":675,"randomNumber":6601},{"id":8414,"randomNumber":6569},{"id":2753,"randomNumber":4065}]

For a more detailed description of the requirements, see the Source Code and Requirements section.

Requirements summary

In this test, each request is processed by fetching multiple cached objects from an in-memory database (the cache having been populated from a database table either as needed or prior to testing) and serializing these objects as a JSON response. The test is run multiple times: testing 1, 5, 10, 15, and 20 cached object fetches per request. All tests are run at 512 concurrency. Conceptually, this is similar to the multiple-queries test except that it uses a caching layer.

Example response for 10 object fetches:

HTTP/1.1 200 OK Content-Length: 315 Content-Type: application/json Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT [{"id":4174,"randomNumber":331},{"id":51,"randomNumber":6544},{"id":4462,"randomNumber":952},{"id":2221,"randomNumber":532},{"id":9276,"randomNumber":3097},{"id":3056,"randomNumber":7293},{"id":6964,"randomNumber":620},{"id":675,"randomNumber":6601},{"id":8414,"randomNumber":6569},{"id":2753,"randomNumber":4065}]

For a more detailed description of the requirements, see the Source Code and Requirements section.

Requirements summary

This test exercises database writes. Each request is processed by fetching multiple rows from a simple database table, converting the rows to in-memory objects, modifying one attribute of each object in memory, updating each associated row in the database individually, and then serializing the list of objects as a JSON response. The test is run multiple times: testing 1, 5, 10, 15, and 20 updates per request. Note that the number of statements per request is twice the number of updates since each update is paired with one query to fetch the object. All tests are run at 512 concurrency.

The response is analogous to the multiple-query test. Example response for 10 updates:

HTTP/1.1 200 OK Content-Length: 315 Content-Type: application/json Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT [{"id":4174,"randomNumber":331},{"id":51,"randomNumber":6544},{"id":4462,"randomNumber":952},{"id":2221,"randomNumber":532},{"id":9276,"randomNumber":3097},{"id":3056,"randomNumber":7293},{"id":6964,"randomNumber":620},{"id":675,"randomNumber":6601},{"id":8414,"randomNumber":6569},{"id":2753,"randomNumber":4065}]

For a more detailed description of the requirements, see the Source Code and Requirements section.

Requirements summary

In this test, the framework's ORM is used to fetch all rows from a database table containing an unknown number of Unix fortune cookie messages (the table has 12 rows, but the code cannot have foreknowledge of the table's size). An additional fortune cookie message is inserted into the list at runtime and then the list is sorted by the message text. Finally, the list is delivered to the client using a server-side HTML template. The message text must be considered untrusted and properly escaped and the UTF-8 fortune messages must be rendered properly.

Whitespace is optional and may comply with the framework's best practices.

Example response:

HTTP/1.1 200 OK Content-Length: 1196 Content-Type: text/html; charset=UTF-8 Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT <!DOCTYPE html><html><head><title>Fortunes</title></head><body><table><tr><th>id</th><th>message</th></tr><tr><td>11</td><td>&lt;script&gt;alert(&quot;This should not be displayed in a browser alert box.&quot;);&lt;/script&gt;</td></tr><tr><td>4</td><td>A bad random number generator: 1, 1, 1, 1, 1, 4.33e+67, 1, 1, 1</td></tr><tr><td>5</td><td>A computer program does what you tell it to do, not what you want it to do.</td></tr><tr><td>2</td><td>A computer scientist is someone who fixes things that aren&apos;t broken.</td></tr><tr><td>8</td><td>A list is only as strong as its weakest link. — Donald Knuth</td></tr><tr><td>0</td><td>Additional fortune added at request time.</td></tr><tr><td>3</td><td>After enough decimal places, nobody gives a damn.</td></tr><tr><td>7</td><td>Any program that runs right is obsolete.</td></tr><tr><td>10</td><td>Computers make very fast, very accurate mistakes.</td></tr><tr><td>6</td><td>Emacs is a nice operating system, but I prefer UNIX. — Tom Christaensen</td></tr><tr><td>9</td><td>Feature: A bug with seniority.</td></tr><tr><td>1</td><td>fortune: No such file or directory</td></tr><tr><td>12</td><td>フレームワークのベンチマーク</td></tr></table></body></html>

For a more detailed description of the requirements, see the Source Code and Requirements section.

Requirements summary

In this test, each response is a JSON serialization of a freshly-instantiated object that maps the key message to the value Hello, World!

Example response:

HTTP/1.1 200 OK Content-Type: application/json Content-Length: 28 Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT {"message":"Hello, World!"}

For a more detailed description of the requirements, see the Source Code and Requirements section.

Requirements summary

In this test, the framework responds with the simplest of responses: a "Hello, World" message rendered as plain text. The size of the response is kept small so that gigabit Ethernet is not the limiting factor for all implementations. HTTP pipelining is enabled and higher client-side concurrency levels are used for this test (see the "Data table" view).

Example response:

HTTP/1.1 200 OK Content-Length: 15 Content-Type: text/plain; charset=UTF-8 Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT Hello, World!

For a more detailed description of the requirements, see the Source Code and Requirements section.

Test implementations do not match
Frameworks and test implementations change over time, and our composite score weights are only computed for official rounds (e.g., ""). For the composite scoring shown below to be meaningful, results should be gathered using implementation versions that correspond with the most recent official Round.
The test implementations from Round 18 are at commit ID 12b68023e5d406680af745d34b2984741bc7c198 .
The results rendered here were generated using commit ID . Although this is a mismatch, the composite weights for are used below as a placeholder.
Hardware environment name & description
Score unavailable
TechEmpower Performance Rating (TPR-3)
TPR-3 is a composite hardware environment score for a three-machine configuration, derived from all test types for TPR-tagged frameworks. (More about TPR frameworks and TPR-1 versus TPR-3)
Why is the environment's score unavailable?
Environment scores are only available for rounds or ad-hoc runs that include the full suite of TPR-tagged frameworks. If any TPR-tagged frameworks are missing from a run, we aren't able to compute a fair environment score. This run is missing the following TPR-tagged frameworks:
Framework Tests Missing
Average of TPR-tagged frameworks
JSON Serialization
Single query
Multiple queries
Data Updates
Round -- test type weights
JSON Serialization
Single query
Multiple queries
Data Updates
Weighted Average
Net Score (0.1%)
Composite Framework Scores
Each framework's peak performance in each test type (shown in the colored columns below) is multiplied by the weights shown above. The results are then summed to yield a weighted score. Only frameworks that implement all test types are included.


Frameworks flagged with a icon are part of the TechEmpower Performance Rating (TPR) measurement for hardware environments. The TPR rating for a hardware environment is visible on the Composite scores tab, if available.

If you have any comments about this round, please post at the Framework Benchmarks GitHub Discussion.

Framework details

Author / Sponsor
Framework Source
Test Source

Results sharing

Loading results for run-id from the TFB Results Dashboard...
Loading results from custom URL...
Unable to load or display results for run-id from the TFB Results Dashboard.
Unable to load or display results from


If you are hosting a results.json file somewhere public and would like to share a visualization of it, paste its URL below to generate a shareable visualization URL.


  • Your host must include correct CORS headers so that this origin can fetch the results.json file.
  • Your results.json must include test metadata, which has been included automatically by the TFB toolset since 2020.
  • If you cannot host the results file yourself, you may upload it here and receive an auto-generated shareable visualization URL. Note that we may periodically delete files uploaded to our service as storage limits dictate.

Generate a shareable visualization URL by pasting your results.json URL:


This is a performance comparison of many web application frameworks executing fundamental tasks such as JSON serialization, database access, and server-side template composition. Each framework is operating in a realistic production configuration. Results are captured on cloud instances and on physical hardware. The test implementations are largely community-contributed and all source is available at the GitHub repository.

Note: We use the word "framework" loosely to refer to platforms, micro-frameworks, and full-stack frameworks.

In a March 2013 blog entry, we published the results of comparing the performance of several web application frameworks executing simple but representative tasks: serializing JSON objects and querying databases. Since then, community input has been tremendous. We—speaking now for all contributors to the project—have been regularly updating the test implementations, expanding coverage, and capturing results in semi-regular updates that we call "rounds."


View the latest results from Round 20. Or check out the previous rounds.

Making improvements

We expect that all frameworks' tests could be improved with community input. For that reason, we are extremely happy to receive pull requests from fans of any framework. We would like our tests for every framework to perform optimally, so we invite you to please join in.

What's to come

Feedback has been continuous and we plan to keep updating the project in several ways, such as:

  • Coverage of more frameworks. Thanks to community contributions to-date, the number of frameworks covered has already grown quite large. We're happy to add more if you submit a pull request.
  • Additional test types.
  • Tests on more types of hardware.
  • Enhancements to this results web site.

Additional resources

FAQ Frequently asked questions and the story behind why we do this.
GitHub Source code for the project at GitHub. Pull Requests welcomed!
GitHub Discussion Forum for questions, comments, recommendations, criticisms, or any other form of feedback.
@TFBenchmarks Tweet at us or follow us on Twitter for quick updates.
TFB-Status Continuous Benchmarking results (most useful for contributors).

Current and Previous Rounds

Round 20 Round 20 immediately follows the end of 2020 with the project at an all-time count of 5,200 pull requests processed.

Round 19 Representing the result of processing over 4,600 pull requests at GitHub, Round 19 also introduces composite scores (scores reflecting the results from all test types) and hardware environment performance ratings.

Round 18 This round included several requirements clarifications such as specificying how often implementations are required to recompute the response Date header and stricter validation.

Round 17 Another Continuous Benchmarking run promoted to an official round, Round 17 now includes 179 frameworks. In this round, we permitted Postgres query pipelining, which has created a stratification of database tests. We are optimistic that over time, more test implementations will be able to leverage this capability.

Round 16 Now Dockerized and running on a new 10-gigabit powered hardware environment, Round 16 of the Framework Benchmarks project brings new performance highs and increased stability.

Round 15 The project exceeded 3,000 stars on GitHub and has processed nearly 2,500 pull requests. Continuous benchmarking results are now available on the Results dashboard.

Round 14 Adoption of the mention-bot from Facebook has proven useful in notifying project participants of changes to their contributions. Continuous benchmarking provided a means for several community previews in this round, and we expect that to continue going forward. Note that this round was conducted only on physical hardware within the ServerCentral environment; tests on the cloud environment will return for Round 15.

Round 13 Microsoft's ASP.NET team delivers the most impressive improvement we've seen in this project—a 85,000% increase in plaintext results for ASP.NET Core—making it a top-performing framework at the fundamentals of HTTP request routing. Round 13 also sees new hardware and cloud environments from ServerCentral and Microsoft Azure.

Round 12 Marking the last round on the Peak environment, Round 12 sees some especially high Plaintext scores.

Round 11 26 more frameworks, three more languages, and the volume cranked to 11.

Round 10 Significant restructuring of the project's infrastructure, including re-organization of the project's directory structure and integration with Travis CI for rapid review of pull requests, and the addition of numerous frameworks.

Round 9 Thanks to the contribution of a 10-gigabit testing environment by Peak Hosting, the network barrier that frustrated top-performing frameworks in previous rounds has been removed. The Dell R720xd servers in this new environment feature dual Xeon E5-2660 v2 processors and illustrate how the spectrum of frameworks scale to forty processor cores.

Round 8 Six more frameworks contributed by the community takes the total count to 90 frameworks and 230 permutations (variations of configuration). Meanwhile, several implementations have been updated and the highest-performance platforms jockey for the top spot on each test's charts.

Round 7 After a several month hiatus, another large batch of frameworks have been added by the community. Even after consolidating a few, Round 7 counts 84 frameworks and over 200 test permutations! This round also was the first to use a community-review process. Future rounds will see roughly one week of preview and review by the community prior to release to the public here.

Round 6 Still more tests were contributed by the developer community, bringing the number of frameworks to 74! Round 6 also introduces an "plaintext" test type that exercises HTTP pipelining and higher client-side concurrency levels.

Round 5 The developer community comes through with the addition of ASP.NET tests ready to run on Windows. This round is the first with Windows tests, and we seek assistance from Windows experts to apply additional tuning to bring the results to parity with the Linux tests. Round 5 also introduces an "update" test type to exercise ORM and database writes.

Round 4 With 57 frameworks in the benchmark suite, we've added a filter control allowing you to narrow your view to only the frameworks you want to see. Round 4 also introduces the "Fortune" test to exercise server-side templates and collections.

Round 3 We created this stand-alone site for comparing the results data captured across many web application frameworks. Even more frameworks have been contributed by the community and the testing methodology was changed slightly thanks to enhancements to the testing tool named Wrk.

Round 2 In April, we published a follow-up blog entry named "Frameworks Round 2" where we incorporated changes suggested and contributed by the community.

Round 1 In a March 2013 blog entry, we published the results of comparing the performance of several web application frameworks executing simple but representative tasks: serializing JSON objects and querying databases. The community reaction was terrific. We are flattered by the volume of feedback. We received dozens of comments, suggestions, questions, criticisms, and most importantly, GitHub pull requests at the repository we set up for this project.

Unofficial Results

We operate a continuously-running benchmarking environment. You can see unofficial results as they are collected at the TFB Results Dashboard.

Join the conversation


Choosing a web application framework involves evaluation of many factors. While comparatively easy to measure, performance is frequently given little consideration. We hope to help change that.

Application performance can be directly mapped to hosting dollars, and for companies both large and small, hosting costs can be a pain point. Weak performance can also cause premature and costly scale pain by requiring earlier optimization efforts and increased architectural complexity. Finally, slow applications yield poor user experience and may suffer penalties levied by search engines.

What if building an application on one framework meant that at the very best your hardware is suitable for one tenth as much load as it would be had you chosen a different framework? The differences aren't always that extreme, but in some cases, they might be. Especially with several modern high-performance frameworks offering respectable developer efficiency, it's worth knowing what you're getting into.

Join the conversation


We use the word framework loosely to refer to any HTTP server implementation upon which you could build a web application—a full-stack framework, a micro-framework, or even a web platform such as Rack, Servlet, or plain PHP.
For us, platforms are broadly defined as anything situated between the programming language and the web framework (examples are Servlet, Netty, and Rack). By comparison to a full-stack framework or micro-framework, a platform may include a bare-bones HTTP server implementation with rudimentary request routing and virtually none of the higher-order functionality of frameworks such as form validation, input sanitization, templating, JSON serialization, and database connectivity. Frameworks are often built on top of platforms. To be thorough, and to compute framework overhead, we test several platforms as if they were frameworks.
A combination of attributes that compose a full technology stack being tested (take node.js for example, we might have one permutation with MongoDB and another with MySQL). Some frameworks have seen many permutations contributed by the community; others only one or few.
test type
One of the workloads we exercise, such as JSON serialization, single-query, multiple-query, fortunes, data updates, and plaintext.
An individual test is a measurement of the performance of a permutation's implementation of a test type. For example, a test might be measuring Wicket paired with MySQL running the single-query test type.
Sometimes called "test implementations," these are the bodies of code and configuration created to test permutations according to the requirements. These are frequently contributed by fans, advocates, or the maintainers of frameworks. Together with the toolset, test implementations are the meat of this project.
A set of Python scripts that run our tests.
An execution of the benchmark toolset across the suite of test implementations, either in full or in part, in order to capture results for any purpose.
A capture of data from a run used by project participants to sanity-check prior to an official round.
A posting of "official" results on this web site. This is mostly for ease of consumption by readers and good-spirited & healthy competitive bragging rights. For in-depth analysis, we encourage you to examine the source code and run the tests on your own hardware.

Expected questions

We expect that you might have a bunch of questions. Here are some that we're anticipating. But please contact us if you have a question we're not dealing with here or just want to tell us we're doing it wrong.

Frameworks and configuration

  1. "You call x a framework, but it's a platform." See the terminology section above. We are using the word "framework" loosely to refer to anything found on the spectrum ranging from full-stack frameworks, micro-frameworks, to platforms. If it's used to build web applications, it probably qualifies. That said, we understand that comparing a full-stack framework versus platforms or vice-versa is unusual. We feel it's valuable to be able to compare these, for example to understand the performance overhead of additional abstraction. You can use the filters in the results viewer to adjust the rows you see in the charts.
  2. "You configured framework x incorrectly, and that explains the numbers you're seeing." Whoops! Please let us know how we can fix it, or submit a GitHub pull request, so we can get it right.
  3. "Why include this Gemini framework I've never heard of?" We have included our in-house Java web framework, Gemini, in our tests. We've done so because it's of interest to us. You can consider it a stand-in for any relatively lightweight minimal-locking Java framework. While we're proud of how it performs among the well-established field, this exercise is not about Gemini.
  4. "Why don't you test framework X?" We'd love to, if we can find the time. Even better, craft the test implementation yourself and submit a GitHub pull request so we can get it in there faster!
  5. "Some frameworks use process-level concurrency; have you accounted for that?" Yes, we've attempted to use production-grade configuration settings for all frameworks, including those that rely on process-level concurrency. For the EC2 tests, for example, such frameworks are configured to utilize the two virtual cores provided on an c3.large (in previous rounds, m1.large) instance. For the i7 tests, they are configured to use the eight hyper-threading cores of our hardware's i7 CPUs.
  6. "Have you enabled APC for the PHP tests?" Yes, the PHP tests run with APC and PHP-FPM on nginx.
  7. "Why are you using a (slightly) old version of framework X?" It's nothing personal! With so many frameworks we have a never-ending game of whack-a-mole. If you think an update will affect the results, please let us know (or better yet, submit a GitHub pull request) and we'll get it updated!
  8. "It's unfair and possibly even incorrect to compare X and Y!" It may be alarming at first to see the full results table, where one may evaluate frameworks vs platforms; MySQL vs Postgres; Go vs Python; ORM vs raw database connectivity; and any number of other possibly irrational comparisons. Many readers desire the ability to compare these and other permutations. If you prefer to view an unpolluted subset, you may use the filters available at the top of the results page. We believe that comparing frameworks with plausible and diverse technology stacks, despite the number of variables, is precisely the value of this project. With sufficient time and effort, we hope to continuously broaden the test permutations. But we recommend against ignoring the data on the basis of concerns about multi-variable comparisons. Read more opinion on this at Brian Hauer's personal blog.
  9. "If you are testing production deployments, why is logging disabled?" At present, we have elected to run tests with logging features disabled. Although this is not consistent with production deployments, we avoid a few complications related to logging, most notably disk capacity and consistent granularity of logging across all test implementations. In spot tests, we have not observed significant performance impact from logging when enabled. If there is strong community consensus that logging is necessary, we will reconsider this.
  10. "Tell me about the Windows configuration." We are very thankful to the community members who have contributed Windows tests. In fact, nearly the entirety of the Windows configuration has been contributed by subject-matter experts from the community. Thanks to their effort, we now have tests covering both Windows paired with Linux databases and Windows paired with Microsoft SQL Server. As with all aspects of this project, we welcome continued input and tuning by other experts. If you have advice on better tuning the Windows tests, please submit GitHub issues or pull requests.

The tests

  1. "Framework X has in-memory caching, why don't you use that?" In-memory caching, as provided by some frameworks, yields higher performance than repeatedly hitting a database, but isn't available in all frameworks, so we omitted in-memory caching from these tests. Cache tests are planned for later rounds.
  2. "What about other caching approaches, then?" Remote-memory or near-memory caching, as provided by Memcached and similar solutions, also improves performance and we would like to conduct future tests simulating a more expensive query operation versus Memcached. However, curiously, in spot tests, some frameworks paired with Memcached were conspicuously slower than other frameworks directly querying the authoritative MySQL database (recognizing, of course, that MySQL had its entire data-set in its own memory cache). For simple "get row ID n" and "get all rows" style fetches, a fast framework paired with MySQL may be faster and easier to work with versus a slow framework paired with Memcached.
  3. "Why doesn't your test include more substantial algorithmic work?" Great suggestion. We hope to in the future!
  4. "What about reverse proxy options such as Varnish?" We are expressly not using reverse proxies on this project. There are other benchmark projects that evaluate the performance of reverse proxy software. This project measures the performance of web applications in any scenario where requests reach the application server. Given that objective, allowing the web application to avoid doing the work thanks to a reverse proxy would invalidate the results. If it's difficult to conceptualize the value of measuring performance beyond the reverse proxy, imagine a scenario where every response provides user-specific and varying data. It's also notable that some platforms respond with sufficient performance to potentially render a reverse proxy unnecessary.
  5. "Do all the database tests use connection pooling?" Yes, our expectation is that all tests use connection pooling.
  6. "How is each test run?" Each test is executed as follows:
    1. Restart the database servers.
    2. Start the platform and framework using their start-up mechanisms.
    3. Run a 5-second primer at 8 client-concurrency to verify that the server is in fact running. These results are not captured.
    4. Run a 15-second warmup at 256 client-concurrency to allow lazy-initialization to execute and just-in-time compilation to run. These results are not captured.
    5. Run a 15-second captured test for each of the concurrency levels (or iteration counts) exercised by the test type. Concurrency-variable test types are tested at 16, 32, 64, 128, 256, and 512 client-side concurrency. The high-concurrency plaintext test type is tested at 256, 1,024, 4,096, and 16,384 client-side concurrency.
    6. Stop the platform and framework.
  7. "Hold on, 15 seconds is not enough to gather useful data." This is a reasonable concern. But in examining the data, we have seen no evidence that the results have changed by reducing the individual test durations from 60 seconds to 15 seconds. The duration reduction was made necessary by the growing number of test permutations and a target that the full suite complete in less than one day. With additional effort, we aim to build a continuously-running test environment that will pull the latest source and begin a new run as soon as a previous run completes. When we have such an environment ready, we will be comfortable with multi-day execution times, so we plan to extend the duration of each test when that happens.
  8. "Also, a 15-second warmup is not sufficient." On the contrary, we have not yet seen evidence suggesting that any additional warmup time is beneficial to any framework. In fact, for frameworks based on JIT platforms such as the Java Virtual Machine (JVM), spot tests show that the JIT has even completed its work already after just the primer and before the warmup starts—the warmup (256-concurrency) and real 256-concurrency tests yield results that are separated only by test noise. However, as with test durations, we intend to increase the duration of the warmup when we have a continuously-running test environment.


  1. "What is Wrk?" Although many web performance tests use ApacheBench from Apache to generate HTTP requests, we now use Wrk for this project. ApacheBench remains a single-threaded tool, meaning that for higher-performance test scenarios, ApacheBench itself is a limiting factor. Wrk is a multithreaded tool that provides a similar function, allowing tests to run for a prescribed amount of time (rather than limited to a number of requests) and providing us result data including total requests completed and latency information.
  2. "Doesn't benchmarking on Amazon EC2 invalidate the results?" Our opinion is that doing so confirms precisely what we're trying to test: performance of web applications within realistic production environments. Selecting EC2 as a platform also allows the tests to be readily verified by anyone interested in doing so. However, we've also executed tests on our Core i7 (Sandy Bridge) workstations running Ubuntu as a non-virtualized comparison. Doing so confirmed our suspicion that the ranked order and relative performance across frameworks is mostly consistent between EC2 and physical hardware. That is, while the EC2 instances were slower than the physical hardware, they were slower by roughly the same proportion across the spectrum of frameworks.
  3. "Tell me about your physical hardware." For the tests we refer to as "i7" tests, we're using our office workstations. These use Intel i7-2600K processors, making them a little antiquated, to be honest. These are connected via an unmanaged low-cost gigabit Ethernet switch. In previous rounds, we used a two-machine configuration where the load-generation and database role coexisted. Although these two roles were not crowding one another out (neither role was starved for CPU time), as of Round 7, we are using a three-machine configuration for the physical hardware tests. The machine roles are:
    • Application server, which hosts the application code and web server, where applicable.
    • Database server, which hosts the common databases. Starting with Round 5, we equipped the database server with a Samsung 840 Pro SSD.
    • Load generator, which makes HTTP requests to the Application server via the Wrk load generation tool.
  4. "What is Resin? Why aren't you using Tomcat for the Java frameworks?" Resin is a Java application server. The GPL version that we used for our tests is a relatively lightweight Servlet container. We tested on Tomcat as well but ultimately dropped Tomcat from our tests because Resin was slightly faster across all Servlet-based frameworks.
  5. "Do you run any warmups before collecting results data?" Yes. See "how is each test run" above. Every test is preceded by a warmup and brief (several seconds) cooldown prior to gathering test data.


  1. "I am about to start a new web application project; how should I interpret these results?" Most importantly, recognize that performance data should be one part of your decision-making process. High-performance web applications reduce hosting costs and improve user experience. Additionally, recognize that while we have aimed to select test types that represent workloads that are common for web applications, nothing beats conducting performance tests yourself for the specific workload of your application. In addition to performance, consider other requirements such as your language and platform preference; your invested knowledge in one or more of the frameworks we've tested; and the documentation and support provided by the framework's community. Combined with an examination of the source code, the results seen here should help you identify a platform and framework that is high-performance while still meeting your other requirements.
  2. "Why are the leaderboards for JSON Serialization and Plaintext so different on EC2 versus i7?" Put briefly, for fast frameworks on our i7 physical hardware, the limiting factor for the JSON test is our gigabit Ethernet; whereas on EC2, the limit is the CPU. Assuming proper response headers are provided, at approximately 200,000 non-pipelined and 550,000 pipelined responses per second and above, the network is saturated.
  3. "Where did earlier rounds go?" To better capture HTTP errors reported by Wrk, we have restructured the format of our results.json file. The test tool changed at Round 2 and some framework IDs were changed at Round 3. As a result, the results.json for Rounds 1 and 2 would have required manual editing and we opted to simply remove the previous rounds from this site. You can still see those rounds at our blog: Round 1, Round 2.
  4. "What does 'Did not complete' mean?" Starting with Round 9, we have added validation checks to confirm that implementations are behaving as we have specified in the requirements section of this site. An implementation that does not return the correct results, bypasses some of the requirements, or even formats the results in a manner inconsistent with the requirements will be marked as "Did not complete." We have solicited corrections from prior contributors and have attempted to address many of these, but it will take more time for all implementations to be correct. If you are a project participant and your contribution is marked as "Did not complete," please help us resolve this by contacting us at the GitHub repository. We may ultimately need a pull request from you, but we'd be happy to help you understand what specifically is triggering a validation error with your implementation.
  5. "Why are Stripped test implementations hidden by default?" Since the introduction of Stripped test implementations, we have debated whether they should be included at all. A Stripped test implementation is one that is specially crafted to excel at our benchmark. By comparison, a "Realistic" test implementation should be demonstrative of the general-purpose, best-practices compliant, and production-class approach for the given framework. We have decided to hide Stripped tests by default because we feel that while their results have some value, that value is exceedingly low for the vast majority of consumers of the data. You may still view the results for Stripped tests by enabling the Stripped Implementation Approach in the filters control panel.
  6. "What exactly causes a test implementation to be classified as Stripped?" It's not possible to paint an exact picture, but conceptually, a Stripped implementation is characterized by being configured or engineered expressly to the requirements of our benchmark tests. By comparison, a Realistic implementation will use a production-grade configuration of a general-purpose web application framework that meets our requirements. When we first introduced the notion of Stripped, it referred to configurations of otherwise normal software that had been stripped of some normal behaviors. For example, removing some middleware from Rails or Django. However, we have broadened the definition of Stripped to also refer to bespoke software that has been crafted expressly to meet our test types' requirements.
  7. "I have collected results from my own test environment; can I visualize them in a manner similar to this web site?" Yes, use the test results visualization feature to visualize results you've gathered. Be aware that you need to specify the per-test duration setting (in seconds) and that only known frameworks will be rendered.


  1. "Do you accept contributions?" Absolutely! Please visit the project's GitHub repository to join the project. In fact, the majority of the test implementations are community-contributed.
  2. "You emphasize production-grade software. Do you accept early builds of frameworks or toy projects?" Actually, yes, we are quite liberal with accepting contributions, including those that don't meet the target of production-grade. However, in many cases, such implementations will be marked as "Stripped," meaning they are not recommended for use in real-world production projects. They will still be measured and can be made visible by enabling Stripped in the filters control panel.
  3. "I am a contributor and my framework hasn't shown up in an official round yet. How do I know how my contribution will performn in your test environment?" We have a benchmark environment that is continuously running the benchmark suite, which we imaginatively named "Continuous Benchmarking™" (not actually trademarked). The continuous benchmark runner posts results to the TFB Results Dashboard. If you have made a contribution, you should see its results posted to the dashboard within a few days to a week, when the next regular continuous run completes.

Simulating production environments

We aim to configure every framework according to the best practices for production deployments gleaned from documentation and popular community opinion, and we ask contributors to apply the same rule of thumb. We want each test implementation (see "Terminology" section) to approximate a sensible production deployment as accurately as possible. We also want this project to be as transparent as possible, so we have posted our test suites on GitHub.

Environment details

This project measures performance in two common deployment scenarios: cloud instances and physical hardware. To-date, each round has used a single representative environment for each of these scenarios. The particular specifications of the environments have evolved over time as shown below.

Cloud environments

Azure (rounds 13 onward)
Microsoft Azure D3v2 instances; switched gigabit Ethernet.
AWS (rounds 1 through 12)
Amazon EC2 c3.large instances (2 vCPU each); switched gigabit Ethernet (m1.large was used through Round 9).

Physical hardware environments

Citrine (rounds 16 onward)
Three homogeneous Dell R440 servers each equipped with an Intel Xeon Gold 5120 CPU, 32 GB of memory, and an enterprise SSD. Dedicated Cisco 10-gigabit Ethernet switch. Provided by Microsoft.
ServerCentral (rounds 13 through 15)
Dell R910 (4x 10-Core Intel Xeon E7-4850 CPUs) application server; Dell R710 (2x 4-Core Intel Xeon E5520 CPUs) database server; switched 10-gigabit Ethernet. Provided by ServerCentral.
Peak (rounds 9 through 12)
Dell R720xd dual Intel Xeon E5-2660 v2 (40 HT cores) with 32 GB memory; database servers equipped with SSDs in RAID; switched 10-gigabit Ethernet. Provided by Peak Hosting.
i7 (rounds 1 through 8)
In-house Intel Sandy Bridge Core i7-2600K workstations with 8 GB memory (early 2011 vintage); database server equipped with Samsung 840 Pro SSD; switched gigabit Ethernet.

Join the conversation