Framework Benchmarks Round 20

February 8, 2021

Nate Brady

 

Today we announce the results of the twentieth official round of the TechEmpower Framework Benchmarks project.

Now in its eighth year, this project measures the high-water mark performance of server side web application frameworks and platforms using predominantly community-contributed test implementations. The project has processed more than 5,200 pull requests from contributors.

Round 20 Updates from our contributors

In the months between Round 19 and Round 20, about four hundred pull requests were processed. Some highlights shared by our contributors:

(Please reach out if you are a contributor and didn’t yet get a chance to share your updates. We’ll get them added here.)

Notes

Thanks again to contributors and fans of this project! As always, we really appreciate your continued interest, feedback, and patience!

Round 20 is composed of:

Framework Benchmarks Round 19

May 28, 2020

Nate Brady

 

Round 19 of the TechEmpower Framework Benchmarks project is now available!

This project measures the high-water mark performance of server side web application frameworks and platforms using predominantly community-contributed test implementations. Since its inception as an open source project in 2013, community contributions have been numerous and continuous. Today, at the launch of Round 19, the project has processed more than 4,600 pull requests!

We can also measure the breadth of the project using time. We continuously run the benchmark suite, and each full run now takes approximately 111 hours (4.6 days) to execute the current suite of 2,625 tests. And that number continues to steadily grow, as we receive further test implementations

Composite scores and TPR

Round 19 introduces two new features in the results web site: Composite scores and a hardware environment score we’re calling the TechEmpower Performance Rating (TPR). Both are available on the Composite scores tab for Rounds 19 and beyond.

Composite scores

Composite scores

Frameworks for which we have full test coverage will now have composite scores, which reflect an overall performance score across the project’s test types: JSON serialization, Single-query, Multi-query, Updates, Fortunes, and Plaintext. For each round, we normalize results for each test type and then apply subjective weights for each (e.g., we have given Fortunes a higher weight than Plaintext because Fortunes is a more realistic test type).

When additional test types are added, frameworks will need to include implementations of these test types to be included in the composite score chart.

You can read more about composite scores at the GitHub wiki.

TechEmpower Performance Rating (TPR)

TechEmpower Performance Rating

With the composite scores described above, we are now able to use web application frameworks to measure the performance of hardware environments. This is an exploration of a new use-case for this project that is unrelated to the original goal of improving software performance. We believe this could be an interesting measure of hardware environment performance because it’s a holistic test of compute and network capacity, and based on a wide spectrum of software platforms and frameworks used in the creation of real-world applications. We look forward to your feedback on this feature.

Right now, the only hardware environments being measured are our Citrine physical hardware environment and Azure D3v2 instances. However, we are implementing a means for users to contribute and visualize results from other hardware environments for comparison.

Hardware performance measurements must use the specific commit for a round (such as 801ee924 for Round 19) to be comparable, since the test implementations continue to evolve over time.

Because a hardware performance measurement shouldn’t take 4.6 days to complete, we use a subset of the project’s immense number of frameworks when measuring hardware performance. We’ve selected and flagged frameworks that represent the project’s diversity of technology platforms. Any results files that include this subset can be used for measuring hardware environment performance.

The set of TPR-flagged frameworks will evolve over time, especially if we receive further input from the community. Our goal is to constrain a run intended for hardware performance measurement to several hours of execution time rather than several days. As a result, we want to keep the total number of flagged frameworks somewhere between 15 to 25.

You can read more about TPR at the GitHub wiki.

Other Round 19 Updates

Once again, Nate Brady tracked interesting changes since the previous round at the GitHub repository for the project. In summary:

Notes

Thanks again to contributors and fans of this project! As always, we really appreciate your continued interest, feedback, and patience!

Round 19 is composed of:

Framework Benchmarks Round 18

July 9, 2019

Nate Brady

 

Round 18 of the TechEmpower Framework Benchmarks project is now available!

When we posted the previous round in late 2018, the project had processed about 3,250 pull requests. Today, with Round 18 just concluded, the project is closing in on 4,000 pull requests. We are repeatedly surprised and delighted by the contributions and interest from the community. The project is immensely fun and useful for us and we’re happy it is useful for so many others as well!

Notable for Round 18

Nate Brady tracked interesting changes since the previous round at the GitHub repository for the project. Several of these are clarifications of requirements for test implementations. In summary:

  • Thanks to An Tao (@an-tao), we clarified that the “Date” header in HTTP Responses must be accurate. It is acceptable for it to be recomputed by the platform or framework once per second, and cached as a string or byte buffer for the duration of that second.
  • To keep frameworks from breaking the test environments by consuming too much memory, the toolset now limits the amount of memory provided to the containers used by test implementations.
  • The requirements for the Updates test were clarified to permit a single update. We are still considering whether to classify test implementations by whether they use this tactic.
  • The requirements were clarified to specify caching or memoization of the output from JSON serialization is not permitted.
  • The toolset now more strictly validates that responses are providing the right JSON serialization of responses.
  • Cloud tests in Azure are using Azure’s accelerated networking feature.
  • Postgres has been upgraded to version 11.
  • Nikolay Kim (@fafhrd91) explained the tactics used by Actix to acheive record performance on the Fortunes test.

Other updates

  • Round 18 now includes just over two hundred test implementations (which we call “frameworks” for simplicity).
  • The results web site now includes a dark theme, because dark themes are the new hotness. Select it at the bottom right of the window when viewing results.

Notes

Thank you to all contributors and fans of this project! As always, we really appreciate your continued interest, feedback, and patience!

Round 18 is composed of:

Framework Benchmarks Round 17

October 30, 2018

Nate Brady

 

We’re happy to announce that Round 17 of the TechEmpower Framework Benchmarks project is now available. Since the adoption of Continuous Benchmarking, the creation of an official Round is a fairly simple process:

  1. Try to reduce errors in framework implementations. We want an official Round to have a solid showing by as many frameworks as feasible given limited personnel bandwidth.
  2. Select a continuous run on the physical hardware (named “Citrine”) that looks good and identify its Git commit.
  3. Run the same commit on cloud (Azure).
  4. Write a blog entry and post the Round.

For weeks, we have been stuck at step 4 waiting on me to write something, so I’m going to keep it short and sweet to get this out before Halloween.

Stratified database results

As you review Round 17 results, you’ll notice that Postgres database tests are stratified—there are groups of test implementations that seem implausibly faster than other test implementations.

The underlying cause of this is use of a Postgres protocol feature we have characterized as “query pipelining” because it is conceptually similar to HTTP pipelining. We call it pipelining, but you could also call it multiplexing. It’s a feature of the “Extended Query” protocol in Postgres. Query pipelining allows database clients to send multiple queries without needing to wait for each response before sending the next. It’s similar to batching but provided invisibly in the driver. Thereotically (and we believe in practice) short queries could be sent together in the same network packets.

Importantly (at least for us), the client application’s business logic is unchanged. This is not batching of queries implemented by the application or web frameworks, but rather an optimization provided invisibly by the database driver.

We discussed the pros and cons of permitting this optimization in our tests. While we have disallowed several performance-optimizing tactics elsewhere as violations of the rules or spirit of our tests, we ultimately arrived at permitting this optimization for two reasons. First, it can be applied “seamlessly” to application code. The application code does not need to be aware this is happening. And second, because this isn’t a trick applied at the application tier, but rather an optimization that potentially benefits all applications (once implemented by drivers), we feel this is precisely the sort of improvement this project should encourage.

In short, we think this feature in the Postgres wire protocol is great. We suspect that over time, more platforms will support the “pipelining” capability, gradually re-balancing the stratification we’re seeing.

The effect of this feature is most emphatic on the Multi-query and Updates tests, both of which execute a tremendous number of queries per second. We are also considering adding an attribute to indicate and filter test implementations using traditional database connections versus those using pipelining.

Other updates

  • Round 17 is now measuring 179 frameworks.
  • New languages such as F# are now represented.
  • The results web site is a bit wider in response to feedback.

In the works

We are presently working on a few things that we hope to share with the community soon. These include:

  • Potential changes to the physical environment’s network to allow further differentiation of ultra high-performance frameworks on the Plaintext test in particular.
  • Badges for framework maintainers to share and celebrate their performance tier, similar to popular continuous integration badges seen in GitHub repositories.

Notes

Thank you to all contributors and fans of this project. As always, we really appreciate your continued interest, feedback, and patience!

Round 17 is composed of:

Framework Benchmarks Round 16

June 6, 2018

Nate Brady

 

Now in its fifth year, the TechEmpower Framework Benchmarks project has another official round of results available. Round 16 is a real treat for anyone who likes big numbers. Not just in measured results per second (several metric crap tonne), but in number of tests measured (~1830), number of framework permutations tested (~464), number of languages included (26), and total execution time of the test suite (67 hours, or 241 billion microseconds to make that sound properly enormous). Take a look at the results and marvel in the magnitude of the numbers.

Recent months have been a very exciting time for this project. Most importantly, the community has been contributing some amazing test implementations and demonstrating the fun and utility of some good-natured performance competition. More on that later. This is a longer-than-average TFB round announcement blog entry, but there is a lot to share, so bear with me.

Dockerification… Dockerifying… Docking?

After concluding Round 15, we took on the sizeable challenge of converting the full spectrum of ~460 test implementations from our home-brew quasi-sandboxed (only mostly sandboxed) configuration to a stupefying array of Docker containers. It took some time, but The Great Dockerification has yielded great benefits.

Most importantly, thanks to Dockerizing, the reproducibility and consistency of our measurements is considerably better than previous rounds. Combined with our continuous benchmarking, we now see much lower variability between each run of the full suite.

Across the board, our sanity checking of performance metrics has indicated Docker’s overhead is immeasurably minute. It’s lost in the noise. And in any event, whatever overhead Docker incurs is uniformly applicable as all test implementations are required to be Dockered.

Truly, what we are doing with this project is a perfect fit for Docker. Or Docker is a perfect fit for this. Whichever. The only regret is not having done this earlier (if only someone had told us about Docker!). That and not knowing what the verb form of Docker is.

Dockerificationization.

New hardware

As we mentioned in March, we have a new physical hardware environment for Round 16. Nicknamed the “Citrine” environment, it is three homogeneous Dell R440 servers, each equipped with a Xeon Gold 5120 CPU. Characterized as entry- or mid-tier servers, these are nevertheless turning out to be impressive when paired with a 10-gigabit Ethernet switch.

Being freshly minted by Dell and Cisco, this new environment is notably quicker than equipment we have used in previous rounds. We have not produced a “difference” view between Round 15 and Round 16 because there are simply too many variables—most importantly this new hardware and Docker—to make a comparison remotely relevant. But in brief, Round 16 results are higher than Round 15 by a considerable margin.

In some cases, the throughput is so high that we have a new challenge from our old friend, “network saturation.” We last were acquainted with this adversary in Round 8, in the form of a Giant Sloar, otherwise known as one-gigabit Ethernet. Now The Destructor comes to us laughing about 10-gigabit Ethernet. But we have an idea for dealing with Gozer.

(Thanks again to Server Central for the previous hardware!)

Convergence in Plaintext and JSON serialization results

In Round 16, and in the continuous benchmarking results gathered prior to finalizing the round, we observed that the results for the Plaintext and JSON serialization tests were converging on theoretical maximums for 10-gigabit networking.

Yes, that means that there are several frameworks and platforms that—when allowed to use HTTP pipelining—can completely saturate ten gigabit per second with ~140-byte response payloads using relatively cheap commodity servers.

To remove the network bottleneck for future rounds, we are concocting a plan to cross the streams, in a manner of speaking: use lasers and the fiberoptic QSFP28 ports on our Cisco switch to bump the network capacity up a notch.

Expect to hear more about this as the plan develops during Round 17.

Continuous benchmarking

Introduced prior to Round 16, the continuous benchmarking platform really came into a fully-realized state in the past several months. Combined with the Great Dockening, we now see good (not perfect, but good) results materializing automatically every 67 hours or thereabouts.

Some quick points to make here:

  • We don’t expect to have perfection in the results. Perfection would imply stability of code and implementations and that is not at all what we have in mind for this project. Rather, we expect and want participants to frequently improve their frameworks and contribute new implementations. We also want the ever-increasing diversity of options for web development to be represented. So expecting perfection is incongruous with tolerating and wanting dynamism.
  • A full suite run takes 67 hours today. This fluctuates over time as implementation permutations are added (or deleted).
  • Total execution time will also increase when we add more test types in the future. And we are still considering increasing the duration of each individual benchmarking exercise (the duration we run the load generator for to gather a single result datum). That is the fundamental unit of time for this project, so increasing that will approximately linearly increase the total execution time.
  • We have already seen tremendous social adoption of the continuous benchmarking results. For selfish reasons, we want to continue creating and posting official rounds such as today’s Round 16 periodically. (Mostly so that we can use the opportunity to write a blog entry and generate hype!) We ask that you humor us and treat official rounds as the super interesting and meaningful events that they are.
  • Jokes aside, the continuous results are intended for contributors to the project. The official rounds are less-frequent snapshots suitable for everyone else who may find the data interesting.

Social media presence

As hinted above, we created a Twitter account for the TechEmpower Framework Benchmarks project: @TFBenchmarks. Don’t don’t @ us.

Engaging with the community this way has been especially rewarding during Round 16 because it coincided with significant performance campaigns from framework communities.
Rust has blasted onto the server-side performance scene with several ultra high-performance options that are competing alongside C, C++, Go, Java, and C#.

Speaking of C#, a mainstream C# framework from a scrappy startup named Microsoft has been taking huge leaps up the charts. ASP.NET Core is not your father’s ASP.NET.

Warming our hearts with performance

There is no single reason we created this project over five years ago. It was a bunch of things: frustration with slow web apps; a desire to quantify the strata of high-water marks across platforms; confirming or dispelling commonly-held hunches or prevailing wisdom about performance.

But most importantly, I think, we created the project with a hopeful mindset of “perhaps we can convince some people to invest in performance for the better of all web-app developers.”

With expectations set dutifully low from the start, we continue to be floored by statements that warm our heart by directly or indirectly suggesting an impact of this project.

When asked about this project, I have often said that I believe that performance improvements are best made in platforms and frameworks because they have the potential to benefit the entire space of application developers using those platforms. I argue that if you raise the framework’s performance ceiling, application developers get the headroom—which is a type of luxury—to develop their application more freely (rapidly, brute-force, carefully, carelessly, or somewhere in between). In large part, they can defer the mental burden of worrying about performance, and in some cases can defer that concern forever. Developers on slower platforms often have so thoroughly internalized the limitations of their platform that they don’t even recognize the resulting pathologies: Slow platforms yield premature architectural complexity as the weapons of “high-scale” such as message queues, caches, job queues, worker clusters, and beyond are introduced at load levels that simply should not warrant the complexity.

So when we see developers upgrade to the latest release of their favorite platform and rejoice over a performance win, we celebrate a victory. We see a developer kicking performance worry further down the road with confidence and laughter.

I hope all of the participants in this project share in this celebration. And anyone else who cares about fast software.

On to Round 17!

Technical notes

Round 16 is composed of:

Framework Benchmarks Hardware Update

March 13, 2018

Nate Brady

 

We have retired the hardware environment provided by Server Central for our Web Framework Benchmarks project. We want to sincerely thank Server Central for having provided servers from their lab environment to our project.

Their contribution allowed us to continue testing on physical hardware with 10-gigabit Ethernet. Ten-gigabit Ethernet gives the highest-performing frameworks opportunity to shine. We were particularly impressed at Server Central’s customer service and technical support, which was responsive and helpful in troubleshooting configuration issues even though we were using their servers free of charge. (And since the advent of our Continuous Benchmarking, we were essentially using the servers at full load around the clock.)

Thank you, Server Central!

New hardware for Round 16 and beyond

For Round 16 and beyond, we are happy to announce that Microsoft has provided three Dell R440 servers and a Cisco 10-gigabit switch. These three servers are homogeneous, each configured with an Intel Xeon Gold 5120 CPU (14/28 cores at 2.2/3.2 GHz), 32 GB of memory, and an enterprise SSD.

If your contributed framework or platform performs best with hand-tuning based on cores, please send us a pull request to adjust the necessary parameters.

These servers together compose a hardware environment we’ve named “Citrine” and are visible on the TFB Results Dashboard. Initial results are impressive, to say the least.

Adopting Docker for Round 16

Concurrent to the change in hardware, we are hard at work converting all test implementations and the test suite to use Docker. The are several upsides to this change, the most important being better isolation. Our past home-brew mechanisms to clean up after each framework were, at times, akin to whack-a-mole as we encountered new and fascinating ways in which software may refuse to stop after being subjected to severe levels of load.

Docker will be used uniformly—across all test implementations—so any impact will be imparted on all platforms and frameworks equally. Our measurements indicate trivial performance impact versus bare metal: on the order of less than 1%.

As you might imagine, the level of effort to convert all test implementations to Docker is not small. We are making steady progress. But we would gladly accept contributions from the community. If you would like to participate in the effort, please see GitHub issue #3296.

Framework Benchmarks Round 15

February 14, 2018

Nate Brady

As of 2018-03-13, Azure results for Round 15 have been posted. These were not available when Round 15 was originally published.

What better day than Valentine’s Day to renew one’s vow to create high-performance web applications? Respecting the time of your users is a sure way to earn their love and loyalty. And the perfect start is selecting high-performance platforms and frameworks.

Results from Round 15 of the Web Framework Benchmarks project are now available! Round 15 includes results from the physical hardware environment at Server Central and cloud results from Microsoft Azure.

We ❤️ Performance

High-performance software warms our hearts like a Super Bowl ad about water or an NBC Olympics athlete biography.

But really, who doesn’t love fast software? No one wants to wait for computers. There are more important things to do in life than wait for a server to respond. For programmers, few things are as rewarding as seeing delighted users, and respecting users’ time is a key element of achieving that happiness.

Among the many effects of this project, one of which we are especially proud is how it encourages platforms and frameworks to be fast—to elevate the high-water marks of performance potential. When frameworks and platforms lift their performance ceiling upward, application developers enjoy the freedom and peace of mind of knowing they control their applications’ performance fate. Application developers can work rapidly or methodically; they can write a quick implementation or squeeze their algorithms to economize on milliseconds; they can choose to optimize early or later. This flexibility is made possible when the framework and platform aren’t boxing out the application—preemptively consuming the performance pie—leaving only scraps for the application developer. High-performance frameworks take but a small slice and give the bulk of the pie to the application developer to do with as they please.

This Valentine’s Day, respect yourself as a developer, own your application’s performance destiny, and fall in love with a high-performance framework. Your users will love you back.

Love from the Community

Community contributions to the project continue to amaze us. As of Round 15, we have processed nearly 2,500 pull requests and the project has over 3,000 stars on GitHub. We are honored by the community’s feedback and participation.

We are routinely delighted to see the project referenced elsewhere, such as this project that monitors TCP connections that used our project to measure overhead, or the hundreds of GitHub issues discussing the project within other repositories. We love knowing others receive value from this project!

More Immediate Results for Contributors

When you are making contributions to this project, you want to see the result of your effort so you can measure and observe performance improvements. You also want/need log files in case things don’t go as expected. To help accelerate the process, we have made the output of our continuous benchmarking platform available as a results dashboard. Our hardware test environment is continuously running, so new results are available every few days (at this time, a full run takes approximately 90 hours). As each run completes, a raw results.json file will be posted as well as zipped log files and direct links to log files for frameworks that encountered significant testing errors. We hope this will streamline the process of troubleshooting contributions.

We used run ed713ee9 from Server Central and run a1110174 from Azure.

In Progress

We are working to update the entire suite to Ubuntu 16 LTS and aim to be able to migrate to Ubuntu 18 LTS soon after it’s available. This update will allow us to keep up with several features in both hardware and cloud environments, such as Azure’s Accelerated Networking. Watch the GitHub project for more updates on this as they arrive!

Thank You!

Thank you so much to all of the contributors! Check out Round 15 and if you are a contributor to the project or just keenly interested, keep an eye on continuous results.

Framework Benchmarks Round 14

May 10, 2017

Nate Brady

 

Results from Round 14 of the Web Framework Benchmarks project are now available! This round’s results are limited to the physical hardware environment only, but cloud results will be included again in the next round.

Recent improvements

Our efforts during Round 14 focused on improvements that help us manage the project, mostly by removing some of our manual work.

Continuous Benchmarking

When we are not running one-off tests or modifying the toolset, the dedicated physical hardware environment at ServerCentral is continuously running the full benchmark suite. We call this “Continuous Benchmarking.” As Round 14 was wrapping up, Continuous Benchmarking allowed us to more rapidly deploy multiple preview rounds for review by the community than we have done in previous rounds.

Going forward, we expect Continuous Benchmarking to facilitate immediate procession into community-facing previews of Round 15. We hope to have the first Round 15 preview within a few days.

Paired with the continuous benchmarker is an internally-facing dashboard that shows us how things are progressing. We plan to eventually evolve this into an externally-facing interface for project contributors.

Differences

Contributors and the project’s community will have seen several renderings of the differences between Round 13 and Round 14. The final capture of differences between Round 13 to Round 14 is an example. These help us confirm changes that are planned or expected and also identify unexpected changes or volatility.

We have, in fact, observed volatility with a small number of frameworks and aim to investigate and address each as time permits. Although the benchmarking suite includes two phases of warmup prior prior to gathering data for each test, we might find that some frameworks or platforms require additional warmup time to be consistent across multiple measurements.

Mention-bot

We added Facebook’s mention-bot into the project’s GitHub repository. This has helped keep past contributors in the loop if and when changes are made to their prior contributions. For example, if a contributor updates the Postgres JDBC driver for the full spectrum of JVM frameworks, the original contributors of those frameworks will be notified by mention-bot. This allows for widespread changes such as a driver update while simultaneously allowing each contributor to override changes according to their framework’s best practices.

Previously, we had to either manually notify people or do a bit of testing on our own to determine if the update made sense. In practice, this often meant not bothering to update the driver, which isn’t what we want. (Have you seen the big performance boost in the newer Postgres JDBC drivers?)

Community contributions

This project includes a large amount of community-contributed code. Community contributions are up recently and we believe that is thanks to mention-bot. We expect to pass the milestone of 2,000 Pull Requests processed within a week or two. That is amazing.

Thank you so much to all of the contributors! Check out Round 14 and then on to Round 15!

EnumSet and EnumMap

February 14, 2017

Michael Hixson

This article discusses java.util.EnumSet and java.util.EnumMap from Java’s standard libraries.

What are they?

EnumSet
and
EnumMap are compact, efficient implementations of the Set and Map interfaces. They have the constraint that their elements/keys come from a single enum type.

Like HashSet and HashMap, they are modifiable.

In contrast to HashSet, EnumSet:

  • Consumes less memory, usually.
  • Is faster at all the things a Set can do, usually.
  • Iterates over elements in a predictable order (the declaration order of the element type’s enum constants).
  • Rejects null elements.

In contrast to HashMap, EnumMap:

  • Consumes less memory, usually.
  • Is faster at all the things a Map can do, usually.
  • Iterates over entries in a predictable order (the declaration order of the key type’s enum constants).
  • Rejects null keys.

If you’re wondering how this is possible, I encourage you to look at the source code:

  • EnumSet

    A bit vector of the ordinals of the elements in the Set. This is an abstract superclass of RegularEnumSet and JumboEnumSet.

  • RegularEnumSet

    An EnumSet whose bit vector is a single primitive long, which is enough to handle all enum types having 64 or fewer constants.

  • JumboEnumSet

    An EnumSet whose bit vector is a long[] array, which is allocated however many slots are necessary for the given enum type. Two slots are allocated for 128 or fewer constants, three slots for 192 or fewer constants, etc.

  • EnumMap

    A flat array of the Map‘s values indexed by the ordinals of their keys.

EnumSet and EnumMap cheat! They use privileged code like this:

                    
**
* Returns all of the values comprising E.
* The result is uncloned, cached, and shared by all callers.
*/
private static <E extends Enum<E>> E[] getUniverse(Class<E> elementType) {
return SharedSecrets.getJavaLangAccess()
                    .getEnumConstantsShared(elementType);
}
                    
                

If you want all the Month constants, you might call Month.values(), giving you a Month[] array. There is a single backing array instance of those Month constants living in memory somewhere (a private field in the Class object for Month), but it wouldn’t be safe to pass that array directly to every caller of values(). Imagine if someone modified that array! Instead, values() creates a fresh clone of the array for each caller.

EnumSet and EnumMap get to skip that cloning step. They have direct access to the backing array.

Effectively, no third-party versions of these classes can be as efficient. Third-party libraries that provide enum-specialized collections tend to delegate to EnumSet and EnumMap. It’s not that the library authors are lazy or incapable; delegating is the correct choice for them.

When should they be used?

Historically, Enum{Set,Map} were recommended as a matter of safety, taking better advantage of Java’s type system than the alternatives.

Prefer enum types and Enum{Set,Map} over int flags.

Effective Java goes into detail about this use case for Enum{Set,Map} and enum types in general. If you write a lot of Java code, then you should read that book and follow its advice.

Before enum types existed, people would declare flags as int constants. Sometimes the flags would be powers of two and combined into sets using bitwise arithmetic:

                    
static final int OVERLAY_STREETS  = 1 << 0;
static final int OVERLAY_ELECTRIC = 1 << 1;
static final int OVERLAY_PLUMBING = 1 << 2;
static final int OVERLAY_TERRAIN  = 1 << 3;

void drawCityMap(int overlays) { ... }

drawCityMap(OVERLAY_STREETS | OVERLAY_PLUMBING);
                    
                

Other times the flags would start at zero and count up by one, and they would be used as array indexes:

                    
static final int MONSTER_SLIME    = 0;
static final int MONSTER_GHOST    = 1;
static final int MONSTER_SKELETON = 2;
static final int MONSTER_GOLEM    = 3;

int[] kills = getMonstersSlain();

if (kills[MONSTER_SLIME] >= 10) { ... }
                    
                

These approaches got the job done for many people, but they were somewhat error-prone and difficult to maintain.

When enum types were introduced to the language, Enum{Set,Map} came with them. Together they were meant to provide better tooling for problems previously solved with int flags. We would say, “Don’t use int flags, use enum constants. Don’t use bitwise arithmetic for sets of flags, use EnumSet. Don’t use arrays for mappings of flags, use EnumMap.” This was not because the enum-based solutions were faster than int flags — they were probably slower — but because the enum-based solutions were easier to understand and implement correctly.

Fast forward to today, I don’t see many people using int flags anymore (though there are notable exceptions). We’ve had enum types in the language for more than a decade. We’re all using enum types here and there, we’re all using the collections framework. At this point, while Effective Java‘s advice regarding Enum{Set,Map} is still valid, I think most people will never have a chance to put it into practice.

Today, we’re using enum types in the right places, but we’re forgetting about the collection types that came with them.

Prefer Enum{Set,Map} over Hash{Set,Map} as a performance optimization.

  • Prefer EnumSet over HashSet when the elements come from a single enum type.
  • Prefer EnumMap over HashMap when the keys come from a single enum type.

Should you refactor all of your existing code to use Enum{Set,Map} instead of Hash{Set,Map}? No.

Your code that uses Hash{Set,Map} isn’t wrong. Migrating to Enum{Set,Map} might make it faster. That’s it.

If you’ve ever used primitive collection libraries like fastutil or Trove, then it may help to think of Enum{Set,Map} like those primitive collections. The difference is that Enum{Set,Map} are specialized for enum types, not primitive types, and you can use them without depending on any third-party libraries.

Enum{Set,Map} don’t have identical semantics to Hash{Set,Map}, so please don’t make blind, blanket replacements in your existing code.

Instead, try to remember these classes for next time. If you can make your code more efficient for free, then why not go ahead and do that, right?

If you use IntelliJ IDEA, you can have it remind you to use Enum{Set,Map} with inspections:

  • Analyze – Run inspection by name – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”

…or…

  • File – Settings – Editor – Inspections – Java – Performance issues – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”

SonarQube can also remind you to use Enum{Set,Map}:

  • S1641: “Sets with elements that are enum values should be replaced with EnumSet”
  • S1640: “Maps with keys that are enum values should be replaced with EnumMap”

For immutable versions of Enum{Set,Map}, see the following methods from Guava:

If you don’t want to use Guava, then wrap the modifiable Enum{Set,Map} instances in Collections.unmodifiableSet(set) or Collections.unmodifiableMap(map) and throw away the direct references to the modifiable collections.

The resulting collections may be less efficient when it comes to operations like containsAll and equals than their counterparts in Guava, which may in turn be less efficient than the raw modifiable collections themselves.

Could the implementations be improved?

Since they can’t be replaced by third-party libraries, Enum{Set,Map} had better be as good as possible! They’re good already, but they could be better.

Enum{Set,Map} have missed out on potential upgrades since Java 8. New methods were added in Java 8 to Set and Map (or higher-level interfaces like Collection and Iterable). While the default implementations of those methods are correct, we could do better with overrides in Enum{Set,Map}.

This issue is tracked as JDK-8170826.

Specifically, these methods should be overridden:

  • {Regular,Jumbo}EnumSet.forEach(action)
  • {Regular,Jumbo}EnumSet.iterator().forEachRemaining(action)
  • {Regular,Jumbo}EnumSet.spliterator()
  • EnumMap.forEach(action)
  • EnumMap.{keySet,values,entrySet}().forEach(action)
  • EnumMap.{keySet,values,entrySet}().iterator().forEachRemaining(action)
  • EnumMap.{keySet,values,entrySet}().spliterator()

I put sample implementations on GitHub in case you’re curious what these overrides might look like. They’re all pretty straightforward.

Rather than walk through each implementation in detail, I’ll share some high-level observations about them.

  • The optimized forEach and forEachRemaining methods are roughly 50% better than the defaults (in terms of operations per second).
  • EnumMap.forEach(action) benefits the most, becoming twice as fast as the default implementation.
  • The iterable.forEach(action) method is popular. Optimizing it tends to affect a large audience, which increases the likelihood that the optimization (even if small) is worthwhile. (I’d claim that iterable.forEach(action) is too popular, and I’d suggest that the traditional enhanced for loop should be preferred over forEach except when the argument to forEach can be written as a method reference. That’s a topic for another discussion, though.)
  • The iterator.forEachRemaining(action) method is more important than it seems. Few people use it directly, but many people use it indirectly through streams. The default spliterator() delegates to the iterator(), and the default stream() delegates to the spliterator(). In the end, stream traversal may delegate to iterator().forEachRemaining(...). Given the popularity of streams, optimizing this method is a good idea!
  • The iterable.spliterator() method is critical when it comes to stream performance, but writing a custom Spliterator from scratch is a non-trivial task. I recommend this approach:
    • Check whether the characteristics of the default spliterator are correct for your collection (often times the defaults are too conservative — for example, EnumSet‘s spliterator is currently missing the ORDERED, SORTED, and NONNULL characteristics). If they’re not correct, then provide a trivial override of the spliterator that uses Spliterators.spliterator(collection, characteristics) to define the correct characteristics.
    • Don’t go further than that until you’ve read through the implementation of that spliterator, and you understand how it works, and you’re confident that you can do better. In particular, your tryAdvance(action) and trySplit() should both be better. Write a benchmark afterwards to confirm your assumptions.
  • The map.forEach(action) method is extremely popular and is almost always worth overriding. This is especially true for maps like EnumMap that create their Entry objects on demand.
  • It’s usually possible to share code across the forEach and forEachRemaining methods. If you override one, you’re already most of the way there to overriding the others.
  • I don’t think it’s worthwhile to override collection.removeIf(filter) in any of these classes. For RegularEnumSet, where it seemed most likely to be worthwhile, I couldn’t come up with a faster implementation than the default.
  • Enum{Set,Map} could provide faster hashCode() implementations than the ones they currently inherit from AbstractSet and AbstractMap, but I don’t think that would be worthwhile. In general, I don’t think optimizing the hashCode() of collections is worthwhile unless it can somehow become a constant-time (O(1)) operation, and even then it is questionable. Collection hash codes aren’t used very often.

Could the APIs be improved?

The implementation-level changes I’ve described are purely beneficial. There is no downside other than a moderate increase in lines of code, and the new lines of code aren’t all that complicated. (Even if they were complicated, this is java.util! Bring on the micro-optimizations.)

Since the existing code is already so good, though, changes of this nature have limited impact. Cutting one third or one half of the execution time from an operation that’s already measured in nanoseconds is a good thing but not game-changing. I suspect that those changes will cause exactly zero users of the JDK to write their applications differently.

The more tantalizing, meaningful, and dangerous changes are the realm of the APIs.

I think that Enum{Set,Map} are chronically underused. They have a bit of a PR problem. Some developers don’t know these classes exist. Other developers know about these classes but don’t bother to reach for them when the time comes. It’s just not a priority for them. That’s totally understandable, but… There’s avoiding premature optimization and then there’s throwing away performance for no reason — performance nihilism? Maybe we can win their hearts with API-level changes.

No one should have to go out of their way to use Enum{Set,Map}. Ideally it should be easier than using Hash{Set,Map}. The EnumSet.allOf(elementType) method is a great example. If you want a Set containing all the enum constants of some type, then EnumSet.allOf(elementType) is the best solution and the easiest solution.

The high-level JDK-8145048 tracks a couple of ideas for improvements in this area. In the following sections, I expand on these ideas and discuss other API-level changes.

Add immutable Enum{Set,Map} (maybe?)

In a recent conversation on Twitter about JEP 301: Enhanced Enums, Joshua Bloch and Brian Goetz referred to theoretical immutable Enum{Set,Map} types in the JDK.

Joshua Bloch also discussed the possibility of an immutable EnumSet in Effective Java:

“The one real disadvantage of EnumSet is that it is not, as of release 1.6, possible to create an immutable EnumSet, but this will likely be remedied in an upcoming release. In the meantime, you can wrap an EnumSet with Collections.unmodifiableSet, but conciseness and performance will suffer.”

When he said “performance will suffer”, he was probably referring to the fact that certain bulk operations of EnumSet won’t execute as quickly when inside a wrapper collection (tracked as JDK-5039214). Consider RegularEnumSet.equals(object):

                    
public boolean equals(Object o) {
    if (!(o instanceof RegularEnumSet))
        return super.equals(o);

    RegularEnumSet<?> es = (RegularEnumSet<?>)o;
    if (es.elementType != elementType)
        return elements == 0 && es.elements == 0;

    return es.elements == elements;
}
                    
                

It’s optimized for the case that the argument is another instance of RegularEnumSet. In that case the equality check boils down to a comparison of two primitive long values. Now that’s fast!

If the argument to equals(object) was not a RegularEnumSet but instead a Collections.unmodifiableSet wrapper, that code would fall back to its slow path.

Guava’s approach is similar to the Collections.unmodifiableSet one, although Guava does a bit better in terms of unwrapping the underlying Enum{Set,Map} and delegating to the super-fast optimized paths.

If your application deals exclusively with Guava’s immutable Enum{Set,Map} wrappers, you should get the full benefit of those optimized paths from the JDK. If you mix and match Guava’s collections with the JDK’s though, the results won’t be quite as good. (RegularEnumSet doesn’t know how to unwrap Guava’s ImmutableEnumSet, so a comparison in that direction would invoke the slow path.)

If immutable Enum{Set,Map} had full support in the JDK, however, it would not have those same limitations. RegularEnumSet and friends can be changed.

What should be done in the JDK?

I spent a long time and tested a lot of code trying to come up with an answer to this. Sadly the end result is:

I don’t know.

Personally, I’m content to use Guava for this. I’ll share some observations I made along the way.

Immutable Enum{Set,Map} won’t be faster than mutable Enum{Set,Map}.

The current versions of Enum{Set,Map} are really, really good. They’ll be even better once they override the defaults from Java 8.

Sometimes, having to support mutability comes with a tax on efficiency. I don’t think this is the case with Enum{Set,Map}. At best, immutable versions of these classes will be exactly as efficient as the mutable ones.

The more likely outcome is that immutable versions will come with a small penalty to performance by expanding the Enum{Set,Map} ecosystem.

Take RegularEnumSet.equals(object) for example. Each time we create a new type of EnumSet, are we going to change that code to add a new instanceof check for our new type? If we add the check, we make that code worse at handling everything except our new type. If we don’t add the check, we…. still make that code worse! It’s less effective than it used to be; more EnumSet instances trigger the slow path.

Classes like Enum{Set,Map} have a userbase that is more sensitive to changes in performance than average users. If adding a new type causes some call site to become megamorphic, we might have thrown their carefully-crafted assumptions regarding performance out the window.

If we decide to add immutable Enum{Set,Map}, we should do so for reasons unrelated to performance.

As an exception to the rule, an immutable EnumSet containing all constants of a single enum type would be really fast.

RegularEnumSet sets such a high bar for efficiency. There is almost no wiggle room in Set operations like contains(element) for anyone else to be faster. Here’s the source code for RegularEnumSet.contains(element):

                    
public boolean contains(Object e) {
    if (e == null)
        return false;
    Class<?> eClass = e.getClass();
    if (eClass != elementType && eClass.getSuperclass() != elementType)
        return false;

    return (elements & (1L << ((Enum<?>)e).ordinal())) != 0;
}
                    
                

If you can’t do contains(element) faster than that, you’ve already lost. Your EnumSet is probably worthless.

There is a worthy contender, which I’ll call FullEnumSet. It is an EnumSet that (always) contains every constant of a single enum type. Here is one way to write that class:

                    

import java.util.function.Consumer;
import java.util.function.Predicate;

class FullEnumSet<E extends Enum&lt;E>> extends EnumSet<E> {

  // TODO: Add a static factory method somewhere.
  FullEnumSet(Class<E> elementType, Enum<?>[] universe) {
    super(elementType, universe);
  }

  @Override
  @SuppressWarnings("unchecked")
  public Iterator<E> iterator() {
    // TODO: Avoid calling Arrays.asList.
    //       The iterator class can be shared and used directly.
    return Arrays.asList((E[]) universe).iterator();
  }

  @Override
  public Spliterator<E> spliterator() {
    return Spliterators.spliterator(
        universe,
        Spliterator.ORDERED |
            Spliterator.SORTED |
            Spliterator.IMMUTABLE |
            Spliterator.NONNULL |
            Spliterator.DISTINCT);
  }

  @Override
  public int size() {
    return universe.length;
  }

  @Override
  public boolean contains(Object e) {
    if (e == null)
      return false;

    Class<?> eClass = e.getClass();
    return eClass == elementType || eClass.getSuperclass() == elementType;
  }

  @Override
  public boolean containsAll(Collection<?> c) {
    if (!(c instanceof EnumSet))
      return super.containsAll(c);

    EnumSet<?> es = (EnumSet<?>) c;
    return es.elementType == elementType || es.isEmpty();
  }

  @Override
  @SuppressWarnings("unchecked")
  public void forEach(Consumer<? super E> action) {
    int i = 0, n = universe.length;
    if (i >= n) {
      Objects.requireNonNull(action);
      return;
    }
    do action.accept((E) universe[i]);
    while (++i < n);
  }

  @Override void addAll()               {throw uoe();}
  @Override void addRange(E from, E to) {throw uoe();}
  @Override void complement()           {throw uoe();}

  @Override public boolean add(E e)                          {throw uoe();}
  @Override public boolean addAll(Collection<? extends E> c) {throw uoe();}
  @Override public void    clear()                           {throw uoe();}
  @Override public boolean remove(Object e)                  {throw uoe();}
  @Override public boolean removeAll(Collection<?> c)        {throw uoe();}
  @Override public boolean removeIf(Predicate<? super E> f)  {throw uoe();}
  @Override public boolean retainAll(Collection<?> c)        {throw uoe();}

  private static UnsupportedOperationException uoe() {
    return new UnsupportedOperationException();
  }

  // TODO: Figure out serialization.
  //       Serialization should preserve these qualities:
  //         - Immutable
  //         - Full
  //         - Singleton?
  //       Maybe it's a bad idea to extend EnumSet?
  private static final long serialVersionUID = 0;
}
                    
                

FullEnumSet has many desirable properties. Of note:

  • contains(element) only needs to check the type of the argument to know whether it’s a member of the set.
  • containsAll(collection) is extremely fast when the argument is an EnumSet (of any kind); it boils down to comparing the element types of the two sets. It follows that equals(object) is just as fast in that case, since equals delegates the hard work to containsAll.
  • Since all the elements are contained in one flat array with no empty spaces, conditions are ideal for iterating and for splitting (splitting efficiency is important in the context of parallel streams).
  • It beats RegularEnumSet in all important metrics:
    • Query speed (contains(element), etc.)
    • Iteration speed
    • Space consumed

Asking for the full set of enum constants of some type is a very common operation. See: every user of values(), elementType.getEnumConstants(), and EnumSet.allOf(elementType). I bet the vast majority of those users do not modify (their copy of) that set of constants. A class that is specifically tailored to that use case has a good chance of being worthwhile.

Since it’s immutable, the FullEnumSet of each enum type could be a lazy-initialized singleton.

Should immutable Enum{Set,Map} reuse existing code, or should they be rewritten from scratch?

As I said earlier, the immutable versions of these classes aren’t going to be any faster. If they’re built from scratch, that code is going to look near-identical to the existing code. There would be a painful amount of copy and pasting, and I would not envy the people responsible for maintaining that code in the future.

Suppose we want to reuse the existing code. I see two general approaches:

  1. Do what Guava did, basically. Create unmodifiable wrappers around modifiable Enum{Set,Map}. Both the wrappers and the modifiable collections should be able to unwrap intelligently to take advantage of the existing optimizations for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)).
  2. Extend the modifiable Enum{Set,Map} classes with new classes that override modifier methods to throw UnsupportedOperationException. Optimizations that sniff for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)) remain exactly as effective as before without changes.

Of those two, I prefer the Guava-like approach. Extending the existing classes raises some difficult questions about the public API, particularly with respect to serialization.

What’s the public API for immutable Enum{Set,Map}? What’s the immutable version of EnumSet.of(e1, e2, e3)?

Here’s where I gave up.

  • Should we add public java.util.ImmutableEnum{Set,Map} classes?
  • If not, where do we put the factory methods, and what do we name them? EnumSet.immutableOf(e1, e2, e3)? EnumSet.immutableAllOf(Month.class)? Yuck! (Clever synonyms like “having” and “universeOf” might be even worse.)
  • Are the new classes instances of Enum{Set,Map} or do they exist in an unrelated class hierarchy?
  • If the new classes do extend Enum{Set,Map}, how is serialization affected? Do we add an “isImmutable” bit to the current serialized forms? Can that be done without breaking backwards compatibility?

Good luck to whoever has to produce the final answers to those questions.

That’s enough about this topic. Let’s move on.

Add factory methods

JDK-8145048 mentions the possibility of adding factory methods in Enum{Set,Map} to align them with Java 9’s Set and Map factories. EnumSet already has a varargs EnumSet.of(...) factory method, but EnumMap has nothing like that.

It would be nice to be able to declare EnumMap instances like this, for some reasonable number of key-value pairs:

                    
Map<DayOfWeek, String> dayNames =
    EnumMap.of(
        DayOfWeek.MONDAY,    "lunes",
        DayOfWeek.TUESDAY,   "martes",
        DayOfWeek.WEDNESDAY, "miércoles",
        DayOfWeek.THURSDAY,  "jueves",
        DayOfWeek.FRIDAY,    "viernes",
        DayOfWeek.SATURDAY,  "sábado",
        DayOfWeek.SUNDAY,    "domingo");
                    
                

Users could use EnumMap‘s copy constructor in conjunction with Java 9’s Map factory methods to achieve the same result less efficiently…

                    
Map<DayOfWeek, String> dayNames =
    new EnumMap<>(
        Map.of(
            DayOfWeek.MONDAY,    "lunes",
            DayOfWeek.TUESDAY,   "martes",
            DayOfWeek.WEDNESDAY, "miércoles",
            DayOfWeek.THURSDAY,  "jueves",
            DayOfWeek.FRIDAY,    "viernes",
            DayOfWeek.SATURDAY,  "sábado",
            DayOfWeek.SUNDAY,    "domingo"));
                    
                

…but the more we give up efficiency like that, the less EnumMap makes sense in the first place. A reasonable person might start to question why they should bother with EnumMap at all — just get rid of the new EnumMap<>(...) wrapper and use Map.of(...) directly.

Speaking of that EnumMap(Map) copy constructor, the fact that it may throw IllegalArgumentException when provided an empty Map leads people to use this pattern instead:

                    
Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class);
copy.putAll(otherMap);
                    
                

We could give them a shortcut:

                    
Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class, otherMap);
                    
                

Similarly, to avoid an IllegalArgumentException from EnumSet.copyOf(collection), I see code like this:

                    
Set<Month> copy = EnumSet.noneOf(Month.class);
copy.addAll(otherCollection);
                    
                

We could give them a shortcut too:

                    
Set<Month> copy = EnumSet.copyOf(Month.class, otherCollection);
                    
                

Existing code may define mappings from enum constants to values as standalone functions. Maybe the users of that code would like to view those (function-based) mappings as Map objects.

To that end, we could give people the means to generate an EnumMap from a Function:

                    
Locale locale = Locale.forLanguageTag("es-MX");

Map<DayOfWeek, String> dayNames =
    EnumMap.map(DayOfWeek.class,
                day -> day.getDisplayName(TextStyle.FULL, locale));

// We could interpret the function returning null to mean that the
// key is not present.  That would allow this method to support
// more than the "every constant is a key" use case while dropping
// support for the "there may be present null values" use case,
// which is probably a good trade.
                    
                

We could provide a similar factory method for EnumSet, accepting a Predicate instead of a Function:

                    
Set<Month> shortMonths =
    EnumSet.filter(Month.class,
                   month -> month.minLength() < 31);
                    
                

This functionality could be achieved less efficiently and more verbosely with streams. Again, the more we give up efficiency like that, the less sense it makes to use Enum{Set,Map} in the first place. I acknowledge that there is a cost to making API-level changes like the ones I’m discussing, but I feel that we are solidly in the “too little API-level support for Enum{Set,Map}” part of the spectrum and not even close to approaching the opposite “API bloat” end.

I don’t mean to belittle streams. There should also be more support for Enum{Set,Map} in the stream API.

Add collectors

Code written for Java 8+ will often produce collections using streams and collectors rather than invoking collection constructors or factory methods directly. I don’t think it would be outlandish to estimate that one third of collections are produced by collectors. Some of these collections will be (or could be) Enum{Set,Map}, and more could be done to serve that use case.

Collectors with these signatures should exist somewhere in the JDK:

                    
public static <T extends Enum<T>>
Collector<T, ?, EnumSet<T>> toEnumSet(
    Class<T> elementType)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U>; mergeFunction)
                    
                

Similar collectors can be obtained from the existing collector factories in the Collectors class (specifically toCollection(collectionSupplier) and toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)) or by using Collector.of(...), but that requires a little more effort on the users’ part, adding a little bit of extra friction to using Enum{Set,Map} that we don’t need.

I referenced these collectors from Guava earlier in this article:

They do not require the Class object argument, making them easier to use than the collectors that I proposed. The reason the Guava collectors can do this is that they produce ImmutableSet and ImmutableMap, not EnumSet and EnumMap. One cannot create an Enum{Set,Map} instance without having the Class object for that enum type. In order to have a collector that reliably produces Enum{Set,Map} (even when the stream contains zero input elements to grab the Class object from), the Class object must be provided up front.

We could provide similar collectors in the JDK that would produce immutable Set and Map instances. For streams with no elements, the collectors would produce Collections.emptySet() or Collections.emptyMap(). For streams with at least one element, the collectors would produce an Enum{Set,Map} instance wrapped by Collections.unmodifiable{Set,Map}.

The signatures would look like this:

                    
public static <T extends Enum<T>>
Collector<T, ?, Set<T>> toImmutableEnumSet()

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U>gt; mergeFunction)
                    
                

I’m not sure that those collectors are worthwhile. I might never recommend them over their counterparts in Guava.

The StreamEx library also provides a couple of interesting enum-specialized collectors:

They’re interesting because they are potentially short-circuiting. With MoreCollectors.toEnumSet(elementType), when the collector can determine that it has encountered all of the elements of that enum type (which is easy — the set of already-collected elements can be compared to EnumSet.allOf(elementType)), it stops collecting. These collectors may be well-suited for streams having a huge number of elements (or having elements that are expensive to compute) mapping to a relatively small set of enum constants.

I don’t know how feasible it is to port these StreamEx collectors to the JDK. As I understand it, the concept of short-circuiting collectors is not supported by the JDK. Adding support may necessitate other changes to the stream and collector APIs.

Be navigable? (No)

Over the years, many people have suggested that Enum{Set,Map} should implement the NavigableSet and NavigableMap interfaces. Every enum type is Comparable, so it’s technically possible. Why not?

I think the Navigable{Set,Map} interfaces are a poor fit for Enum{Set,Map}.

Those interfaces are huge! Implementing Navigable{Set,Map} would bloat the size of Enum{Set,Map} by 2-4x (in terms of lines of code). It would distract them from their core focus and strengths. Supporting the navigable API would most likely come with a non-zero penalty to runtime performance.

Have you ever looked closely at the specified behavior of methods like subSet and subMap, specifically when they might throw IllegalArgumentException? Those contracts impose a great deal of complexity for what seems like undesirable behavior. Enum{Set,Map} could take a stance on those methods similar to Guava’s ImmutableSortedSet and ImmutableSortedMap: acknowledge the contract of the interface but do something else that is more reasonable instead…

I say forget about it. If you want navigable collections, use TreeSet and TreeMap (or their thread-safe cousins, ConcurrentSkipListSet and ConcurrentSkipListMap). The cross-section of people who need the navigable API and the efficiency of enum-specialized collections must be very small.

There are few cases where the Comparable nature of enum types comes into play at all. In practice, I expect that the ordering of most enum constants is arbitrary (with respect to intended behavior).

I’ll go further than that; I think that making all enum types Comparable in the first place was a mistake.

  • Which ordering of Collector.Characteristics is “natural”, [CONCURRENT,UNORDERED] or [UNORDERED,CONCURRENT]?
  • Which is the “greater” Thread.State, WAITING or TIMED_WAITING?
  • FileVisitOption.FOLLOW_LINKS is “comparable” — to what? (There is no other FileVisitOption.)
  • How many instances of RoundingMode are in the “range” from FLOOR to CEILING?
                                
    import java.math.RoundingMode;
    import java.util.EnumSet;
    import java.util.Set;
    
    class RangeTest {
      public static void main(String[] args) {
        Set<RoundingMode> range =
            EnumSet.range(RoundingMode.FLOOR,
                          RoundingMode.CEILING);
        System.out.println(range.size());
      }
    }
    
    // java.lang.IllegalArgumentException: FLOOR > CEILING
                                
                            

There are other enum types where questions like that actually make sense, and those should be Comparable.

  • Is Month.JANUARY “before” Month.FEBRUARY? Yes.
  • Is TimeUnit.HOURS “larger” than TimeUnit.MINUTES? Yes.

Implementing Comparable or not should have been a choice for authors of individual enum types. To serve people who really did want to sort enum constants by declaration order for whatever reason, we could have automatically provided a static Comparator from each enum type:

                    
Comparator<JDBCType> c = JDBCType.declarationOrder();
                    
                

It’s too late for that now. Let’s not double down on the original mistake by making Enum{Set,Map} navigable.

Conclusion

EnumSet
and
EnumMap are cool collections, and you should use them!

They’re already great, but they can become even better with changes to their private implementation details. I propose some ideas here. If you want to find out what happens in the JDK, the changes (if there are any) should be noted in JDK-8170826.

API-level changes are warranted as well. New factory methods and collectors would make it easier to obtain instances of Enum{Set,Map}, and immutable Enum{Set,Map} could be better-supported. I propose some ideas here, but if there are any actual changes made then they should be noted in JDK-8145048.

Efficient multiple-stream concatenation in Java

October 19, 2016

Michael Hixson

I want to combine the elements of multiple Stream instances into a single Stream. What’s the best way to do this?

This article compares a few different solutions.

Stream.concat(a, b)

The JDK provides Stream.concat(a, b) for concatenating two streams.

void exampleConcatTwo() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> out = Stream.concat(a, b);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
}

What if we have more than two streams?

We could use Stream.concat(a, b) multiple times. With three streams we could write Stream.concat(Stream.concat(a, b), c).

To me that approach is depressing at three streams, and it rapidly gets worse as we add more streams.

Reduce

Alternatively, we can use reduce to perform the multiple incantations of Stream.concat(a, b) for us. The code adapts elegantly to handle any number of input streams.

void exampleReduce() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = Stream.of(a, b, c)
      .reduce(Stream::concat)
      .orElseGet(Stream::empty);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

Be careful using this pattern! Note the warning in the documentation of Stream.concat(a, b):

Use caution when constructing streams from repeated concatenation. Accessing an element of a deeply concatenated stream can result in deep call chains, or even StackOverflowError.

It takes quite a few input streams to trigger this problem, but it is trivial to demonstrate:

void exampleStackOverflow() {
  List<Stream<String>> inputs = new AbstractList<Stream<String>>() {
    @Override
    public Stream<String> get(int index) {
      return Stream.of("one", "two");
    }

    @Override
    public int size() {
      return 1_000_000; // try changing this number
    }
  };
  Stream<String> out = inputs.stream()
      .reduce(Stream::concat)
      .orElseGet(Stream::empty);
  long count = out.count(); // probably throws
  System.out.println("count: " + count); // probably never reached
}

On my workstation, this method throws StackOverflowError after several seconds of churning.

What’s going on here?

We can think of the calls to Stream.concat(a, b) as forming a binary tree. At the root is the concatenation of all the input streams. At the leaves are the individual input streams. Let’s look at the trees for up to five input streams as formed by our reduce operation.

Two streams:
concat(a,b)ab
Three streams:
concat(concat(a,b),c)concat(a,b)cab
Four streams:
concat(concat(concat(a,b),c),d)concat(concat(a,b),c)dconcat(a,b)cab
Five streams:
concat(concat(concat(concat(a,b),c),d),e)concat(concat(econcat(a,b),c),d)concat(concat(a,b),c)dconcat(a,b)cab

The trees are perfectly unbalanced! Each additional input stream adds one layer of depth to the tree and one layer of indirection to reach all the other streams. This can have a noticeable negative impact on performance. With enough layers of indirection we’ll see a StackOverflowError.

Balance

If we’re worried that we’ll concatenate a large number of streams and run into the aforementioned problems, we can balance the tree. This is as if we’re optimizing a O(n) algorithm into a O(logn) one. We won’t totally eliminate the possibility of StackOverflowError, and there may be other approaches that perform even better, but this should be quite an improvement over the previous solution.

void exampleBalance() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = concat(a, b, c);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

@SafeVarargs
static <T> Stream<T> concat(Stream<T>... in) {
  return concat(in, 0, in.length);
}

static <T> Stream<T> concat(Stream<T>[] in, int low, int high) {
  switch (high - low) {
    case 0: return Stream.empty();
    case 1: return in[low];
    default:
      int mid = (low + high) >>> 1;
      Stream<T> left = concat(in, low, mid);
      Stream<T> right = concat(in, mid, high);
      return Stream.concat(left, right);
  }
}

Flatmap

There is another way to concatenate streams that is built into the JDK, and it does not involve Stream.concat(a, b) at all. It is flatMap.

void exampleFlatMap() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = Stream.of(a, b, c).flatMap(s -> s);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

This generally outperforms the solutions based on Stream.concat(a, b) when each input stream contains fewer than 32 elements. As we increase the element count past 32, flatMap performs comparatively worse and worse as the element count rises.

flatMap avoids the StackOverflowError issue but it comes with its own set of quirks. For example, it interacts poorly with infinite streams. Calling findAny on the concatenated stream may cause the program to enter an infinite loop, whereas the other solutions would terminate almost immediately.

void exampleInfiniteLoop() {
  Stream<String> a = Stream.generate(() -> "one");
  Stream<String> b = Stream.generate(() -> "two");
  Stream<String> c = Stream.generate(() -> "three");
  Stream<String> out = Stream.of(a, b, c).flatMap(s -> s);
  Optional<String> any = combined.findAny(); // infinite loop
  System.out.println(any); // never reached
}

(The infinite loop is an implementation detail. This could be fixed in the JDK without changing the contract of flatMap.)

Also, flatMap forces its input streams into sequential mode even if they were originally parallel. The outermost concatenated stream can still be made parallel, and we will be able to process elements from distinct input streams in parallel, but the elements of each individual input stream must all be processed sequentially.

Analysis

Let me share a few trends that I’ve noticed when dealing with streams and stream concatenation in general, having written a fair amount of code in Java 8 by now.

  • There have been maybe one dozen cases where I’ve needed to concatenate streams. That’s not all that many, so no matter how good the solution is, it’s not going to have much of an impact for me.
  • In all but one of those one dozen cases, I needed to concatenate exactly two streams, so Stream.concat(a, b) was sufficient.
  • In the remaining case, I needed to concatenate exactly three streams. I was not even close to the point where StackOverflowError would become an issue. Stream.concat(Stream.concat(a, b), c) would have worked just fine, although I went with flatMap because I felt that it was easier to read.
  • I have never needed to concatenate streams in performance-critical sections of code.
  • I use infinite streams very rarely. When I do use them, it is obvious in context that they are infinite. And so concatenating infinite streams together and then asking a question like findAny on the result is just not something that I would be tempted to do. That particular issue with flatMap seems like one that I’ll never come across.
  • I use parallel streams very rarely. I think I’ve only used them twice in production code. It is almost never the case that going parallel improves performance, and even when it might improve performance, it is unlikely that processing them in the singleton ForkJoinPool.commonPool() is how I will want to manage that work. The issue with flatMap forcing the input streams to be sequential seems very unlikely to be a real problem for me.
  • Let’s suppose that I do want to concatenate parallel streams and have them processed in parallel. If I have eight input streams on an eight core machine, and each stream has roughly the same number of elements, the fact that flatMap forces the individual streams to be sequential will not degrade performance for me at all. All eight cores will be fully utilized, each core processing one of the eight input streams. If I have seven input streams on that same machine, I will see only slightly degraded performance. With six, slightly more degraded, and so on.

What’s the takeaway from all this? Here is my advice:

For two input streams, use:
Stream.concat(a, b)

For more than two input streams, use:
Stream.of(a, b, c, ...).flatMap(s -> s)

That solution is good enough…

Overboard

…but what if we’re not satisfied with “good enough”? What if we want a solution that’s really fast no matter the size and shape of the input and doesn’t have any of the quirks of the other solutions?

It is a bit much to inline in a blog article, so take a look at StreamConcatenation.java for the source code.

This implementation is similar to Stream.concat(a, b) in that it uses a custom Spliterator, except this implementation handles any number of input streams.

It performs quite well. It does not outperform every other solution in every scenario (flatMap is generally better for very small input streams), but it never performs much worse and it scales nicely with the number and size of the input streams.

Benchmark

I wrote a JMH benchmark to compare the four solutions discussed in this article. The benchmark uses each solution to concatenate a variable number of input streams with a variable number of elements per stream, then iterates over the elements of the concatenated stream. Here is the raw JMH output from my workstation and a prettier visualization of the benchmark results.