Blog

Results from Round 14 of the Web Framework Benchmarks project are now available! This round's results are limited to the physical hardware environment only, but cloud results will be included again in the next round.

Recent improvements

Our efforts during Round 14 focused on improvements that help us manage the project, mostly by removing some of our manual work.

Continuous Benchmarking

When we are not running one-off tests or modifying the toolset, the dedicated physical hardware environment at ServerCentral is continuously running the full benchmark suite. View Round 14 resultsWe call this "Continuous Benchmarking." As Round 14 was wrapping up, Continuous Benchmarking allowed us to more rapidly deploy multiple preview rounds for review by the community than we have done in previous rounds.

Going forward, we expect Continuous Benchmarking to facilitate immediate procession into community-facing previews of Round 15. We hope to have the first Round 15 preview within a few days.

Paired with the continuous benchmarker is an internally-facing dashboard that shows us how things are progressing. We plan to eventually evolve this into an externally-facing interface for project contributors.

Differences

Contributors and the project's community will have seen several renderings of the differences between Round 13 and Round 14. The final capture of differences between Round 13 to Round 14 is an example. These help us confirm changes that are planned or expected and also identify unexpected changes or volatility.

We have, in fact, observed volatility with a small number of frameworks and aim to investigate and address each as time permits. Although the benchmarking suite includes two phases of warmup prior prior to gathering data for each test, we might find that some frameworks or platforms require additional warmup time to be consistent across multiple measurements.

Mention-bot

We added Facebook's mention-bot into the project's GitHub repository. This has helped keep past contributors in the loop if and when changes are made to their prior contributions. For example, if a contributor updates the Postgres JDBC driver for the full spectrum of JVM frameworks, the original contributors of those frameworks will be notified by mention-bot. This allows for widespread changes such as a driver update while simultaneously allowing each contributor to override changes according to their framework's best practices.

Previously, we had to either manually notify people or do a bit of testing on our own to determine if the update made sense. In practice, this often meant not bothering to update the driver, which isn't what we want. (Have you seen the big performance boost in the newer Postgres JDBC drivers?)

Community contributions

This project includes a large amount of community-contributed code. Community contributions are up recently and we believe that is thanks to mention-bot. We expect to pass the milestone of 2,000 Pull Requests processed within a week or two. That is amazing.

Thank you so much to all of the contributors! Check out Round 14 and then on to Round 15!

February 14, 2017

EnumSet and EnumMap

This article discusses java.util.EnumSet and java.util.EnumMap from Java's standard libraries.

What are they?

EnumSet and EnumMap are compact, efficient implementations of the Set and Map interfaces. They have the constraint that their elements/keys come from a single enum type.

Like HashSet and HashMap, they are modifiable.

In contrast to HashSet, EnumSet:

  • Consumes less memory, usually.
  • Is faster at all the things a Set can do, usually.
  • Iterates over elements in a predictable order (the declaration order of the element type's enum constants).
  • Rejects null elements.

In contrast to HashMap, EnumMap:

  • Consumes less memory, usually.
  • Is faster at all the things a Map can do, usually.
  • Iterates over entries in a predictable order (the declaration order of the key type's enum constants).
  • Rejects null keys.

If you're wondering how this is possible, I encourage you to look at the source code:

  • EnumSet

    A bit vector of the ordinals of the elements in the Set. This is an abstract superclass of RegularEnumSet and JumboEnumSet.

  • RegularEnumSet

    An EnumSet whose bit vector is a single primitive long, which is enough to handle all enum types having 64 or fewer constants.

  • JumboEnumSet

    An EnumSet whose bit vector is a long[] array, which is allocated however many slots are necessary for the given enum type. Two slots are allocated for 128 or fewer constants, three slots for 192 or fewer constants, etc.

  • EnumMap

    A flat array of the Map's values indexed by the ordinals of their keys.

EnumSet and EnumMap cheat! They use privileged code like this:

/**
 * Returns all of the values comprising E.
 * The result is uncloned, cached, and shared by all callers.
 */
private static <E extends Enum<E>> E[] getUniverse(Class<E> elementType) {
  return SharedSecrets.getJavaLangAccess()
                      .getEnumConstantsShared(elementType);
}

If you want all the Month constants, you might call Month.values(), giving you a Month[] array. There is a single backing array instance of those Month constants living in memory somewhere (a private field in the Class object for Month), but it wouldn't be safe to pass that array directly to every caller of values(). Imagine if someone modified that array! Instead, values() creates a fresh clone of the array for each caller.

EnumSet and EnumMap get to skip that cloning step. They have direct access to the backing array.

Effectively, no third-party versions of these classes can be as efficient. Third-party libraries that provide enum-specialized collections tend to delegate to EnumSet and EnumMap. It's not that the library authors are lazy or incapable; delegating is the correct choice for them.

When should they be used?

Historically, Enum{Set,Map} were recommended as a matter of safety, taking better advantage of Java's type system than the alternatives.

Prefer enum types and Enum{Set,Map} over int flags.

Effective Java goes into detail about this use case for Enum{Set,Map} and enum types in general. If you write a lot of Java code, then you should read that book and follow its advice.

Before enum types existed, people would declare flags as int constants. Sometimes the flags would be powers of two and combined into sets using bitwise arithmetic:

static final int OVERLAY_STREETS  = 1 << 0;
static final int OVERLAY_ELECTRIC = 1 << 1;
static final int OVERLAY_PLUMBING = 1 << 2;
static final int OVERLAY_TERRAIN  = 1 << 3;

void drawCityMap(int overlays) { ... }

drawCityMap(OVERLAY_STREETS | OVERLAY_PLUMBING);

Other times the flags would start at zero and count up by one, and they would be used as array indexes:

static final int MONSTER_SLIME    = 0;
static final int MONSTER_GHOST    = 1;
static final int MONSTER_SKELETON = 2;
static final int MONSTER_GOLEM    = 3;

int[] kills = getMonstersSlain();

if (kills[MONSTER_SLIME] >= 10) { ... }

These approaches got the job done for many people, but they were somewhat error-prone and difficult to maintain.

When enum types were introduced to the language, Enum{Set,Map} came with them. Together they were meant to provide better tooling for problems previously solved with int flags. We would say, "Don't use int flags, use enum constants. Don't use bitwise arithmetic for sets of flags, use EnumSet. Don't use arrays for mappings of flags, use EnumMap." This was not because the enum-based solutions were faster than int flags — they were probably slower — but because the enum-based solutions were easier to understand and implement correctly.

Fast forward to today, I don't see many people using int flags anymore (though there are notable exceptions). We've had enum types in the language for more than a decade. We're all using enum types here and there, we're all using the collections framework. At this point, while Effective Java's advice regarding Enum{Set,Map} is still valid, I think most people will never have a chance to put it into practice.

Today, we're using enum types in the right places, but we're forgetting about the collection types that came with them.

Prefer Enum{Set,Map} over Hash{Set,Map} as a performance optimization.

  • Prefer EnumSet over HashSet when the elements come from a single enum type.
  • Prefer EnumMap over HashMap when the keys come from a single enum type.

Should you refactor all of your existing code to use Enum{Set,Map} instead of Hash{Set,Map}? No.

Your code that uses Hash{Set,Map} isn't wrong. Migrating to Enum{Set,Map} might make it faster. That's it.

If you've ever used primitive collection libraries like fastutil or Trove, then it may help to think of Enum{Set,Map} like those primitive collections. The difference is that Enum{Set,Map} are specialized for enum types, not primitive types, and you can use them without depending on any third-party libraries.

Enum{Set,Map} don't have identical semantics to Hash{Set,Map}, so please don't make blind, blanket replacements in your existing code.

Instead, try to remember these classes for next time. If you can make your code more efficient for free, then why not go ahead and do that, right?

If you use IntelliJ IDEA, you can have it remind you to use Enum{Set,Map} with inspections:

  • Analyze - Run inspection by name - "Set replaceable with EnumSet" or "Map replaceable with EnumMap"

...or...

  • File - Settings - Editor - Inspections - Java - Performance issues - "Set replaceable with EnumSet" or "Map replaceable with EnumMap"

SonarQube can also remind you to use Enum{Set,Map}:

  • S1641: "Sets with elements that are enum values should be replaced with EnumSet"
  • S1640: "Maps with keys that are enum values should be replaced with EnumMap"

For immutable versions of Enum{Set,Map}, see the following methods from Guava:

If you don't want to use Guava, then wrap the modifiable Enum{Set,Map} instances in Collections.unmodifiableSet(set) or Collections.unmodifiableMap(map) and throw away the direct references to the modifiable collections.

The resulting collections may be less efficient when it comes to operations like containsAll and equals than their counterparts in Guava, which may in turn be less efficient than the raw modifiable collections themselves.

Could the implementations be improved?

Since they can't be replaced by third-party libraries, Enum{Set,Map} had better be as good as possible! They're good already, but they could be better.

Enum{Set,Map} have missed out on potential upgrades since Java 8. New methods were added in Java 8 to Set and Map (or higher-level interfaces like Collection and Iterable). While the default implementations of those methods are correct, we could do better with overrides in Enum{Set,Map}.

This issue is tracked as JDK-8170826.

Specifically, these methods should be overridden:

  • {Regular,Jumbo}EnumSet.forEach(action)
  • {Regular,Jumbo}EnumSet.iterator().forEachRemaining(action)
  • {Regular,Jumbo}EnumSet.spliterator()
  • EnumMap.forEach(action)
  • EnumMap.{keySet,values,entrySet}().forEach(action)
  • EnumMap.{keySet,values,entrySet}().iterator().forEachRemaining(action)
  • EnumMap.{keySet,values,entrySet}().spliterator()

I put sample implementations on GitHub in case you're curious what these overrides might look like. They're all pretty straightforward.

Rather than walk through each implementation in detail, I'll share some high-level observations about them.

  • The optimized forEach and forEachRemaining methods are roughly 50% better than the defaults (in terms of operations per second).
  • EnumMap.forEach(action) benefits the most, becoming twice as fast as the default implementation.
  • The iterable.forEach(action) method is popular. Optimizing it tends to affect a large audience, which increases the likelihood that the optimization (even if small) is worthwhile. (I'd claim that iterable.forEach(action) is too popular, and I'd suggest that the traditional enhanced for loop should be preferred over forEach except when the argument to forEach can be written as a method reference. That's a topic for another discussion, though.)
  • The iterator.forEachRemaining(action) method is more important than it seems. Few people use it directly, but many people use it indirectly through streams. The default spliterator() delegates to the iterator(), and the default stream() delegates to the spliterator(). In the end, stream traversal may delegate to iterator().forEachRemaining(...). Given the popularity of streams, optimizing this method is a good idea!
  • The iterable.spliterator() method is critical when it comes to stream performance, but writing a custom Spliterator from scratch is a non-trivial task. I recommend this approach:
    • Check whether the characteristics of the default spliterator are correct for your collection (often times the defaults are too conservative — for example, EnumSet's spliterator is currently missing the ORDERED, SORTED, and NONNULL characteristics). If they're not correct, then provide a trivial override of the spliterator that uses Spliterators.spliterator(collection, characteristics) to define the correct characteristics.
    • Don't go further than that until you've read through the implementation of that spliterator, and you understand how it works, and you're confident that you can do better. In particular, your tryAdvance(action) and trySplit() should both be better. Write a benchmark afterwards to confirm your assumptions.
  • The map.forEach(action) method is extremely popular and is almost always worth overriding. This is especially true for maps like EnumMap that create their Entry objects on demand.
  • It's usually possible to share code across the forEach and forEachRemaining methods. If you override one, you're already most of the way there to overriding the others.
  • I don't think it's worthwhile to override collection.removeIf(filter) in any of these classes. For RegularEnumSet, where it seemed most likely to be worthwhile, I couldn't come up with a faster implementation than the default.
  • Enum{Set,Map} could provide faster hashCode() implementations than the ones they currently inherit from AbstractSet and AbstractMap, but I don't think that would be worthwhile. In general, I don't think optimizing the hashCode() of collections is worthwhile unless it can somehow become a constant-time (O(1)) operation, and even then it is questionable. Collection hash codes aren't used very often.

Could the APIs be improved?

The implementation-level changes I've described are purely beneficial. There is no downside other than a moderate increase in lines of code, and the new lines of code aren't all that complicated. (Even if they were complicated, this is java.util! Bring on the micro-optimizations.)

Since the existing code is already so good, though, changes of this nature have limited impact. Cutting one third or one half of the execution time from an operation that's already measured in nanoseconds is a good thing but not game-changing. I suspect that those changes will cause exactly zero users of the JDK to write their applications differently.

The more tantalizing, meaningful, and dangerous changes are the realm of the APIs.

I think that Enum{Set,Map} are chronically underused. They have a bit of a PR problem. Some developers don't know these classes exist. Other developers know about these classes but don't bother to reach for them when the time comes. It's just not a priority for them. That's totally understandable, but... There's avoiding premature optimization and then there's throwing away performance for no reason — performance nihilism? Maybe we can win their hearts with API-level changes.

No one should have to go out of their way to use Enum{Set,Map}. Ideally it should be easier than using Hash{Set,Map}. The EnumSet.allOf(elementType) method is a great example. If you want a Set containing all the enum constants of some type, then EnumSet.allOf(elementType) is the best solution and the easiest solution.

The high-level JDK-8145048 tracks a couple of ideas for improvements in this area. In the following sections, I expand on these ideas and discuss other API-level changes.

Add immutable Enum{Set,Map} (maybe?)

In a recent conversation on Twitter about JEP 301: Enhanced Enums, Joshua Bloch and Brian Goetz referred to theoretical immutable Enum{Set,Map} types in the JDK.

Joshua Bloch:

"Not sure I see the point. ImmutableEnumSet (a library change) seems far more valuable and easier to implement. Am I missing something?"

Brian Goetz:

"@joshbloch a) not either/or; b) not solving same prob; c) gen enum easier than you think; d) we'd gladly take contrib of IES!"

Brian Goetz:

"@joshbloch Still open to IE{S,M} contributions!"

Joshua Bloch also discussed the possibility of an immutable EnumSet in Effective Java:

"The one real disadvantage of EnumSet is that it is not, as of release 1.6, possible to create an immutable EnumSet, but this will likely be remedied in an upcoming release. In the meantime, you can wrap an EnumSet with Collections.unmodifiableSet, but conciseness and performance will suffer."

When he said "performance will suffer", he was probably referring to the fact that certain bulk operations of EnumSet won't execute as quickly when inside a wrapper collection (tracked as JDK-5039214). Consider RegularEnumSet.equals(object):

public boolean equals(Object o) {
  if (!(o instanceof RegularEnumSet))
    return super.equals(o);

  RegularEnumSet<?> es = (RegularEnumSet<?>)o;
  if (es.elementType != elementType)
    return elements == 0 && es.elements == 0;

  return es.elements == elements;
}

It's optimized for the case that the argument is another instance of RegularEnumSet. In that case the equality check boils down to a comparison of two primitive long values. Now that's fast!

If the argument to equals(object) was not a RegularEnumSet but instead a Collections.unmodifiableSet wrapper, that code would fall back to its slow path.

Guava's approach is similar to the Collections.unmodifiableSet one, although Guava does a bit better in terms of unwrapping the underlying Enum{Set,Map} and delegating to the super-fast optimized paths.

If your application deals exclusively with Guava's immutable Enum{Set,Map} wrappers, you should get the full benefit of those optimized paths from the JDK. If you mix and match Guava's collections with the JDK's though, the results won't be quite as good. (RegularEnumSet doesn't know how to unwrap Guava's ImmutableEnumSet, so a comparison in that direction would invoke the slow path.)

If immutable Enum{Set,Map} had full support in the JDK, however, it would not have those same limitations. RegularEnumSet and friends can be changed.

What should be done in the JDK?

I spent a long time and tested a lot of code trying to come up with an answer to this. Sadly the end result is:

I don't know.

Personally, I'm content to use Guava for this. I'll share some observations I made along the way.

Immutable Enum{Set,Map} won't be faster than mutable Enum{Set,Map}.

The current versions of Enum{Set,Map} are really, really good. They'll be even better once they override the defaults from Java 8.

Sometimes, having to support mutability comes with a tax on efficiency. I don't think this is the case with Enum{Set,Map}. At best, immutable versions of these classes will be exactly as efficient as the mutable ones.

The more likely outcome is that immutable versions will come with a small penalty to performance by expanding the Enum{Set,Map} ecosystem.

Take RegularEnumSet.equals(object) for example. Each time we create a new type of EnumSet, are we going to change that code to add a new instanceof check for our new type? If we add the check, we make that code worse at handling everything except our new type. If we don't add the check, we.... still make that code worse! It's less effective than it used to be; more EnumSet instances trigger the slow path.

Classes like Enum{Set,Map} have a userbase that is more sensitive to changes in performance than average users. If adding a new type causes some call site to become megamorphic, we might have thrown their carefully-crafted assumptions regarding performance out the window.

If we decide to add immutable Enum{Set,Map}, we should do so for reasons unrelated to performance.

As an exception to the rule, an immutable EnumSet containing all constants of a single enum type would be really fast.

RegularEnumSet sets such a high bar for efficiency. There is almost no wiggle room in Set operations like contains(element) for anyone else to be faster. Here's the source code for RegularEnumSet.contains(element):

public boolean contains(Object e) {
  if (e == null)
    return false;
  Class<?> eClass = e.getClass();
  if (eClass != elementType && eClass.getSuperclass() != elementType)
    return false;

  return (elements & (1L << ((Enum<?>)e).ordinal())) != 0;
}

If you can't do contains(element) faster than that, you've already lost. Your EnumSet is probably worthless.

There is a worthy contender, which I'll call FullEnumSet. It is an EnumSet that (always) contains every constant of a single enum type. Here is one way to write that class:

package java.util;

import java.util.function.Consumer;
import java.util.function.Predicate;

class FullEnumSet<E extends Enum<E>> extends EnumSet<E> {

  // TODO: Add a static factory method somewhere.
  FullEnumSet(Class<E> elementType, Enum<?>[] universe) {
    super(elementType, universe);
  }

  @Override
  @SuppressWarnings("unchecked")
  public Iterator<E> iterator() {
    // TODO: Avoid calling Arrays.asList.
    //       The iterator class can be shared and used directly.
    return Arrays.asList((E[]) universe).iterator();
  }

  @Override
  public Spliterator<E> spliterator() {
    return Spliterators.spliterator(
        universe,
        Spliterator.ORDERED |
            Spliterator.SORTED |
            Spliterator.IMMUTABLE |
            Spliterator.NONNULL |
            Spliterator.DISTINCT);
  }

  @Override
  public int size() {
    return universe.length;
  }

  @Override
  public boolean contains(Object e) {
    if (e == null)
      return false;

    Class<?> eClass = e.getClass();
    return eClass == elementType || eClass.getSuperclass() == elementType;
  }

  @Override
  public boolean containsAll(Collection<?> c) {
    if (!(c instanceof EnumSet))
      return super.containsAll(c);

    EnumSet<?> es = (EnumSet<?>) c;
    return es.elementType == elementType || es.isEmpty();
  }

  @Override
  @SuppressWarnings("unchecked")
  public void forEach(Consumer<? super E> action) {
    int i = 0, n = universe.length;
    if (i >= n) {
      Objects.requireNonNull(action);
      return;
    }
    do action.accept((E) universe[i]);
    while (++i < n);
  }

  @Override void addAll()               {throw uoe();}
  @Override void addRange(E from, E to) {throw uoe();}
  @Override void complement()           {throw uoe();}

  @Override public boolean add(E e)                          {throw uoe();}
  @Override public boolean addAll(Collection<? extends E> c) {throw uoe();}
  @Override public void    clear()                           {throw uoe();}
  @Override public boolean remove(Object e)                  {throw uoe();}
  @Override public boolean removeAll(Collection<?> c)        {throw uoe();}
  @Override public boolean removeIf(Predicate<? super E> f)  {throw uoe();}
  @Override public boolean retainAll(Collection<?> c)        {throw uoe();}
  
  private static UnsupportedOperationException uoe() {
    return new UnsupportedOperationException();
  }

  // TODO: Figure out serialization.
  //       Serialization should preserve these qualities:
  //         - Immutable
  //         - Full
  //         - Singleton?
  //       Maybe it's a bad idea to extend EnumSet?
  private static final long serialVersionUID = 0;
}

FullEnumSet has many desirable properties. Of note:

  • contains(element) only needs to check the type of the argument to know whether it's a member of the set.
  • containsAll(collection) is extremely fast when the argument is an EnumSet (of any kind); it boils down to comparing the element types of the two sets. It follows that equals(object) is just as fast in that case, since equals delegates the hard work to containsAll.
  • Since all the elements are contained in one flat array with no empty spaces, conditions are ideal for iterating and for splitting (splitting efficiency is important in the context of parallel streams).
  • It beats RegularEnumSet in all important metrics:
    • Query speed (contains(element), etc.)
    • Iteration speed
    • Space consumed

Asking for the full set of enum constants of some type is a very common operation. See: every user of values(), elementType.getEnumConstants(), and EnumSet.allOf(elementType). I bet the vast majority of those users do not modify (their copy of) that set of constants. A class that is specifically tailored to that use case has a good chance of being worthwhile.

Since it's immutable, the FullEnumSet of each enum type could be a lazy-initialized singleton.

Should immutable Enum{Set,Map} reuse existing code, or should they be rewritten from scratch?

As I said earlier, the immutable versions of these classes aren't going to be any faster. If they're built from scratch, that code is going to look near-identical to the existing code. There would be a painful amount of copy and pasting, and I would not envy the people responsible for maintaining that code in the future.

Suppose we want to reuse the existing code. I see two general approaches:

  1. Do what Guava did, basically. Create unmodifiable wrappers around modifiable Enum{Set,Map}. Both the wrappers and the modifiable collections should be able to unwrap intelligently to take advantage of the existing optimizations for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)).
  2. Extend the modifiable Enum{Set,Map} classes with new classes that override modifier methods to throw UnsupportedOperationException. Optimizations that sniff for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)) remain exactly as effective as before without changes.

Of those two, I prefer the Guava-like approach. Extending the existing classes raises some difficult questions about the public API, particularly with respect to serialization.

What's the public API for immutable Enum{Set,Map}? What's the immutable version of EnumSet.of(e1, e2, e3)?

Here's where I gave up.

  • Should we add public java.util.ImmutableEnum{Set,Map} classes?
  • If not, where do we put the factory methods, and what do we name them? EnumSet.immutableOf(e1, e2, e3)? EnumSet.immutableAllOf(Month.class)? Yuck! (Clever synonyms like "having" and "universeOf" might be even worse.)
  • Are the new classes instances of Enum{Set,Map} or do they exist in an unrelated class hierarchy?
  • If the new classes do extend Enum{Set,Map}, how is serialization affected? Do we add an "isImmutable" bit to the current serialized forms? Can that be done without breaking backwards compatibility?

Good luck to whoever has to produce the final answers to those questions.

That's enough about this topic. Let's move on.

Add factory methods

JDK-8145048 mentions the possibility of adding factory methods in Enum{Set,Map} to align them with Java 9's Set and Map factories. EnumSet already has a varargs EnumSet.of(...) factory method, but EnumMap has nothing like that.

It would be nice to be able to declare EnumMap instances like this, for some reasonable number of key-value pairs:

Map<DayOfWeek, String> dayNames =
    EnumMap.of(
        DayOfWeek.MONDAY,    "lunes",
        DayOfWeek.TUESDAY,   "martes",
        DayOfWeek.WEDNESDAY, "miércoles",
        DayOfWeek.THURSDAY,  "jueves",
        DayOfWeek.FRIDAY,    "viernes",
        DayOfWeek.SATURDAY,  "sábado",
        DayOfWeek.SUNDAY,    "domingo");

Users could use EnumMap's copy constructor in conjunction with Java 9's Map factory methods to achieve the same result less efficiently...

Map<DayOfWeek, String> dayNames =
    new EnumMap<>(
        Map.of(
            DayOfWeek.MONDAY,    "lunes",
            DayOfWeek.TUESDAY,   "martes",
            DayOfWeek.WEDNESDAY, "miércoles",
            DayOfWeek.THURSDAY,  "jueves",
            DayOfWeek.FRIDAY,    "viernes",
            DayOfWeek.SATURDAY,  "sábado",
            DayOfWeek.SUNDAY,    "domingo"));

...but the more we give up efficiency like that, the less EnumMap makes sense in the first place. A reasonable person might start to question why they should bother with EnumMap at all — just get rid of the new EnumMap<>(...) wrapper and use Map.of(...) directly.

Speaking of that EnumMap(Map) copy constructor, the fact that it may throw IllegalArgumentException when provided an empty Map leads people to use this pattern instead:

Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class);
copy.putAll(otherMap);

We could give them a shortcut:

Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class, otherMap);

Similarly, to avoid an IllegalArgumentException from EnumSet.copyOf(collection), I see code like this:

Set<Month> copy = EnumSet.noneOf(Month.class);
copy.addAll(otherCollection);

We could give them a shortcut too:

Set<Month> copy = EnumSet.copyOf(Month.class, otherCollection);

Existing code may define mappings from enum constants to values as standalone functions. Maybe the users of that code would like to view those (function-based) mappings as Map objects.

To that end, we could give people the means to generate an EnumMap from a Function:

Locale locale = Locale.forLanguageTag("es-MX");

Map<DayOfWeek, String> dayNames =
    EnumMap.map(DayOfWeek.class,
                day -> day.getDisplayName(TextStyle.FULL, locale));

// We could interpret the function returning null to mean that the
// key is not present.  That would allow this method to support
// more than the "every constant is a key" use case while dropping
// support for the "there may be present null values" use case,
// which is probably a good trade.

We could provide a similar factory method for EnumSet, accepting a Predicate instead of a Function:

Set<Month> shortMonths =
    EnumSet.filter(Month.class,
                   month -> month.minLength() < 31);

This functionality could be achieved less efficiently and more verbosely with streams. Again, the more we give up efficiency like that, the less sense it makes to use Enum{Set,Map} in the first place. I acknowledge that there is a cost to making API-level changes like the ones I'm discussing, but I feel that we are solidly in the "too little API-level support for Enum{Set,Map}" part of the spectrum and not even close to approaching the opposite "API bloat" end.

I don't mean to belittle streams. There should also be more support for Enum{Set,Map} in the stream API.

Add collectors

Code written for Java 8+ will often produce collections using streams and collectors rather than invoking collection constructors or factory methods directly. I don't think it would be outlandish to estimate that one third of collections are produced by collectors. Some of these collections will be (or could be) Enum{Set,Map}, and more could be done to serve that use case.

Collectors with these signatures should exist somewhere in the JDK:

public static <T extends Enum<T>>
Collector<T, ?, EnumSet<T>> toEnumSet(
    Class<T> elementType)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U> mergeFunction)

Similar collectors can be obtained from the existing collector factories in the Collectors class (specifically toCollection(collectionSupplier) and toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)) or by using Collector.of(...), but that requires a little more effort on the users' part, adding a little bit of extra friction to using Enum{Set,Map} that we don't need.

I referenced these collectors from Guava earlier in this article:

They do not require the Class object argument, making them easier to use than the collectors that I proposed. The reason the Guava collectors can do this is that they produce ImmutableSet and ImmutableMap, not EnumSet and EnumMap. One cannot create an Enum{Set,Map} instance without having the Class object for that enum type. In order to have a collector that reliably produces Enum{Set,Map} (even when the stream contains zero input elements to grab the Class object from), the Class object must be provided up front.

We could provide similar collectors in the JDK that would produce immutable Set and Map instances. For streams with no elements, the collectors would produce Collections.emptySet() or Collections.emptyMap(). For streams with at least one element, the collectors would produce an Enum{Set,Map} instance wrapped by Collections.unmodifiable{Set,Map}.

The signatures would look like this:

public static <T extends Enum<T>>
Collector<T, ?, Set<T>> toImmutableEnumSet()

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U> mergeFunction)

I'm not sure that those collectors are worthwhile. I might never recommend them over their counterparts in Guava.

The StreamEx library also provides a couple of interesting enum-specialized collectors:

They're interesting because they are potentially short-circuiting. With MoreCollectors.toEnumSet(elementType), when the collector can determine that it has encountered all of the elements of that enum type (which is easy — the set of already-collected elements can be compared to EnumSet.allOf(elementType)), it stops collecting. These collectors may be well-suited for streams having a huge number of elements (or having elements that are expensive to compute) mapping to a relatively small set of enum constants.

I don't know how feasible it is to port these StreamEx collectors to the JDK. As I understand it, the concept of short-circuiting collectors is not supported by the JDK. Adding support may necessitate other changes to the stream and collector APIs.

Be navigable? (No)

Over the years, many people have suggested that Enum{Set,Map} should implement the NavigableSet and NavigableMap interfaces. Every enum type is Comparable, so it's technically possible. Why not?

I think the Navigable{Set,Map} interfaces are a poor fit for Enum{Set,Map}.

Those interfaces are huge! Implementing Navigable{Set,Map} would bloat the size of Enum{Set,Map} by 2-4x (in terms of lines of code). It would distract them from their core focus and strengths. Supporting the navigable API would most likely come with a non-zero penalty to runtime performance.

Have you ever looked closely at the specified behavior of methods like subSet and subMap, specifically when they might throw IllegalArgumentException? Those contracts impose a great deal of complexity for what seems like undesirable behavior. Enum{Set,Map} could take a stance on those methods similar to Guava's ImmutableSortedSet and ImmutableSortedMap: acknowledge the contract of the interface but do something else that is more reasonable instead...

I say forget about it. If you want navigable collections, use TreeSet and TreeMap (or their thread-safe cousins, ConcurrentSkipListSet and ConcurrentSkipListMap). The cross-section of people who need the navigable API and the efficiency of enum-specialized collections must be very small.

There are few cases where the Comparable nature of enum types comes into play at all. In practice, I expect that the ordering of most enum constants is arbitrary (with respect to intended behavior).

I'll go further than that; I think that making all enum types Comparable in the first place was a mistake.

  • Which ordering of Collector.Characteristics is "natural", [CONCURRENT,UNORDERED] or [UNORDERED,CONCURRENT]?
  • Which is the "greater" Thread.State, WAITING or TIMED_WAITING?
  • FileVisitOption.FOLLOW_LINKS is "comparable" — to what? (There is no other FileVisitOption.)
  • How many instances of RoundingMode are in the "range" from FLOOR to CEILING?
    import java.math.RoundingMode;
    import java.util.EnumSet;
    import java.util.Set;
    
    class RangeTest {
      public static void main(String[] args) {
        Set<RoundingMode> range =
            EnumSet.range(RoundingMode.FLOOR,
                          RoundingMode.CEILING);
        System.out.println(range.size());
      }
    }
    
    // java.lang.IllegalArgumentException: FLOOR > CEILING
    

There are other enum types where questions like that actually make sense, and those should be Comparable.

  • Is Month.JANUARY "before" Month.FEBRUARY? Yes.
  • Is TimeUnit.HOURS "larger" than TimeUnit.MINUTES? Yes.

Implementing Comparable or not should have been a choice for authors of individual enum types. To serve people who really did want to sort enum constants by declaration order for whatever reason, we could have automatically provided a static Comparator from each enum type:

Comparator<JDBCType> c = JDBCType.declarationOrder();

It's too late for that now. Let's not double down on the original mistake by making Enum{Set,Map} navigable.

Conclusion

EnumSet and EnumMap are cool collections, and you should use them!

They're already great, but they can become even better with changes to their private implementation details. I propose some ideas here. If you want to find out what happens in the JDK, the changes (if there are any) should be noted in JDK-8170826.

API-level changes are warranted as well. New factory methods and collectors would make it easier to obtain instances of Enum{Set,Map}, and immutable Enum{Set,Map} could be better-supported. I propose some ideas here, but if there are any actual changes made then they should be noted in JDK-8145048.

November 16, 2016

Framework Benchmarks Round 13

Round 13 of the ongoing Web Framework Benchmarks project is here! The project now features 230 framework implementations (of our JSON serialization test) and includes new entrants on platforms as diverse as Kotlin and Qt. Yes, that Qt. We also congratulate the ASP.NET team for the most dramatic performance improvement we've ever seen, making ASP.NET Core a top performer.

View Round 13 resultsThe large filters panel on our results web site is a testament to the ever-broadening spectrum of options for web developers. What a great time to be building web apps! A great diversity of frameworks means there are likely many options that provide high-performance while meeting your language and productivity requirements.

Good fortunes

As the previous round—Round 12—was wrapping up, we were unfortunately rushed as the project’s physical hardware environment was being decommissioned. But good fortune was just around the corner, thanks to the lucky number 13!

New hardware and cloud environments

For Round 13, we have all new test environments, for both physical hardware and the virtualized public cloud.

Microsoft has provided the project with Azure credits, so starting with Round 13, the cloud environment is on Azure D3v2 instances. Previous rounds’ cloud tests were run on AWS.

Meanwhile, ServerCentral has provided the project a trio of physical servers in one of their development lab environments with 10 gigabit Ethernet. Starting with Round 13, the physical hardware environment is composed of a Dell R910 application server (4x 10-Core E7-4850 CPUs) and a Dell R420 database server (2x 4-Core E5-2406 CPUs).

We’d like to extend huge thanks to ServerCentral and Microsoft for generously supporting the project!

We recognize that as a result of these changes, Round 13 is not easy to directly compare to Round 12. Although changing the test environments was not intentional, it was necessary. We believe the results are still as valuable as ever. An upside of this environment diversity is visibility into the ways various frameworks and platforms work with the myriad variables of cores, clock speed, and virtualization technologies. For example, our new physical application server has twice as many HT cores as the previous environment, but the CPUs are older, so there is an interesting balance of higher concurrency but potentially lower throughput. In aggregate, the Round 13 results on physical hardware are generally lower due to the older CPUs, all else being equal.

Many fixes to long-broken tests

Along with the addition of new frameworks, Round 13 also marks a sizeable decrease in the number of existing framework tests that have failed to execute properly in previous rounds. This is largely the result of a considerable community effort over the past few months to identify and fix dozens of frameworks, some of which we haven’t been able to successfully test since 2014.

Continuous benchmarking

Round 13 is the first round conducted with what we’re calling Continuous Benchmarking. Continuous Benchmarking is the notion of setting up the test environment to automatically reset to a clean state, pull the latest from the source repository, prepare the environment, execute the test suite, deliver results, and repeat.

There are many benefits of Continuous Benchmarking. For example:

  • At any given time, we can grab the most recent results and mark them as a preview or final for an official Round. This should allow us to accelerate the delivery of Rounds.
  • With some additional work, we will be able to capture and share results as they are made available. This should give participants in the project much quicker insight into how their performance tuning efforts are playing out in our test environment. Think of it as continuous integration but for benchmark results. Our long-term goal is to provide a results viewer that plots performance results over time.
  • Any changes that break the test environment as a whole or a specific framework’s test implementation should be visible much earlier. Prior to Continuous Benchmarking, breaking changes were often not detected until a preview run.

Microsoft’s ASP.NET Core

We consider ourselves very fortunate that our project has received the attention that it has from the web framework community. It has become a source of a great pride for our team. Among every reaction and piece of feedback we’ve received, our very favorite kind is when a framework maintainer recognizes a performance deficiency highlighted by this project and then works to improve that performance. We love this because we think of it as a small way of improving performance of the whole web, and we are passionate about performance.

Round 13 is especially notable for us because we are honored that Microsoft has made it a priority to improve ASP.NET’s performance in these benchmarks, and in so doing, improve the performance of all applications built on ASP.NET.

Thanks to Microsoft’s herculean performance tuning effort, ASP.NET—in the new cross-platform friendly form of ASP.NET Core—is now a top performer in our Plaintext test, making it among the fastest platforms at the fundamentals of web request routing. The degree of improvement is absolutely astonishing, going from 2,120 requests per second on Mono in Round 11 to 1,822,366 requests per second on ASP.NET Core in Round 13. That’s an approximately 85,900% improvement, and that doesn’t even account for Round 11’s hardware being faster than our new hardware. That is not a typo, it's 859 times faster! We believe this to be the most significant performance improvement that this project has ever seen.

By delivering cross-platform performance alongside their development toolset, Microsoft has made C# and ASP.NET one of the most interesting web development platforms available. We have a brief message to those developers who have avoided Microsoft’s web stack thinking it’s “slow” or that it’s for Windows only: ASP.NET Core is now wicked sick fast at the fundamentals and is improving in our other tests. Oh, and of course we’re running it on Linux. You may be thinking about the Microsoft of 10 years ago.

The best part, in our opinion, is that Microsoft is making performance a long-term priority. There is room to improve on our other more complex tests such as JSON serialization and Fortunes (which exercises database connectivity, data structures, encoding of unsafe text, and templating). Microsoft is taking on those challenges and will continue to improve the performance of its platform.

Our Plaintext test has historically been a playground for the ultra-fast Netty platform and several lesser-known/exotic platforms. (To be clear, there is nothing wrong with being exotic! We love them too!) Microsoft’s tuning work has brought a mainstream platform into the frontrunners. That achievement stands on its own. We congratulate the Microsoft .NET team for a massive performance improvement and for making ASP.NET Core a mainstream option that has the performance characteristics of an acutely-tuned fringe platform. It’s like an F1 car that anyone can drive. We should all be so lucky.