EnumSet and EnumMap

February 14, 2017

Alan Laser

This article discusses java.util.EnumSet and java.util.EnumMap from Java’s standard libraries.

What are they?
When should they be used?
Could the implementations be improved?
Could the APIs be improved?
Conclusion

What are they?

EnumSet
and
EnumMap are compact, efficient implementations of the Set and Map interfaces. They have the constraint that their elements/keys come from a single enum type.

Like HashSet and HashMap, they are modifiable.

In contrast to HashSet, EnumSet:

Consumes less memory, usually.
Is faster at all the things a Set can do, usually.
Iterates over elements in a predictable order (the declaration order of the element type’s enum constants).
Rejects null elements.

In contrast to HashMap, EnumMap:

Consumes less memory, usually.
Is faster at all the things a Map can do, usually.
Iterates over entries in a predictable order (the declaration order of the key type’s enum constants).
Rejects null keys.

If you’re wondering how this is possible, I encourage you to look at the source code:

EnumSetA bit vector of the ordinals of the elements in the Set. This is an abstract superclass of RegularEnumSet and JumboEnumSet.
RegularEnumSetAn EnumSet whose bit vector is a single primitive long, which is enough to handle all enum types having 64 or fewer constants.
JumboEnumSetAn EnumSet whose bit vector is a long[] array, which is allocated however many slots are necessary for the given enum type. Two slots are allocated for 128 or fewer constants, three slots for 192 or fewer constants, etc.
EnumMapA flat array of the Map‘s values indexed by the ordinals of their keys.

EnumSet and EnumMap cheat! They use privileged code like this:


**
* Returns all of the values comprising E.
* The result is uncloned, cached, and shared by all callers.
*/
private static <E extends Enum<E>> E[] getUniverse(Class<E> elementType) {
return SharedSecrets.getJavaLangAccess()
                    .getEnumConstantsShared(elementType);
}

If you want all the Month constants, you might call Month.values(), giving you a Month[] array. There is a single backing array instance of those Month constants living in memory somewhere (a private field in the Class object for Month), but it wouldn’t be safe to pass that array directly to every caller of values(). Imagine if someone modified that array! Instead, values() creates a fresh clone of the array for each caller.

EnumSet and EnumMap get to skip that cloning step. They have direct access to the backing array.

Effectively, no third-party versions of these classes can be as efficient. Third-party libraries that provide enum-specialized collections tend to delegate to EnumSet and EnumMap. It’s not that the library authors are lazy or incapable; delegating is the correct choice for them.

When should they be used?

Historically, Enum{Set,Map} were recommended as a matter of safety, taking better advantage of Java’s type system than the alternatives.

Prefer enum types and Enum{Set,Map} over int flags.

Effective Java goes into detail about this use case for Enum{Set,Map} and enum types in general. If you write a lot of Java code, then you should read that book and follow its advice.

Before enum types existed, people would declare flags as int constants. Sometimes the flags would be powers of two and combined into sets using bitwise arithmetic:


static final int OVERLAY_STREETS  = 1 << 0;
static final int OVERLAY_ELECTRIC = 1 << 1;
static final int OVERLAY_PLUMBING = 1 << 2;
static final int OVERLAY_TERRAIN  = 1 << 3;

void drawCityMap(int overlays) { ... }

drawCityMap(OVERLAY_STREETS | OVERLAY_PLUMBING);

Other times the flags would start at zero and count up by one, and they would be used as array indexes:


static final int MONSTER_SLIME    = 0;
static final int MONSTER_GHOST    = 1;
static final int MONSTER_SKELETON = 2;
static final int MONSTER_GOLEM    = 3;

int[] kills = getMonstersSlain();

if (kills[MONSTER_SLIME] >= 10) { ... }

These approaches got the job done for many people, but they were somewhat error-prone and difficult to maintain.

When enum types were introduced to the language, Enum{Set,Map} came with them. Together they were meant to provide better tooling for problems previously solved with int flags. We would say, “Don’t use int flags, use enum constants. Don’t use bitwise arithmetic for sets of flags, use EnumSet. Don’t use arrays for mappings of flags, use EnumMap.” This was not because the enum-based solutions were faster than int flags — they were probably slower — but because the enum-based solutions were easier to understand and implement correctly.

Fast forward to today, I don’t see many people using int flags anymore (though there are notable exceptions). We’ve had enum types in the language for more than a decade. We’re all using enum types here and there, we’re all using the collections framework. At this point, while Effective Java‘s advice regarding Enum{Set,Map} is still valid, I think most people will never have a chance to put it into practice.

Today, we’re using enum types in the right places, but we’re forgetting about the collection types that came with them.

Prefer Enum{Set,Map} over Hash{Set,Map} as a performance optimization.

Prefer EnumSet over HashSet when the elements come from a single enum type.
Prefer EnumMap over HashMap when the keys come from a single enum type.

Should you refactor all of your existing code to use Enum{Set,Map} instead of Hash{Set,Map}? No.

Your code that uses Hash{Set,Map} isn’t wrong. Migrating to Enum{Set,Map} might make it faster. That’s it.

If you’ve ever used primitive collection libraries like fastutil or Trove, then it may help to think of Enum{Set,Map} like those primitive collections. The difference is that Enum{Set,Map} are specialized for enum types, not primitive types, and you can use them without depending on any third-party libraries.

Enum{Set,Map} don’t have identical semantics to Hash{Set,Map}, so please don’t make blind, blanket replacements in your existing code.

Instead, try to remember these classes for next time. If you can make your code more efficient for free, then why not go ahead and do that, right?

If you use IntelliJ IDEA, you can have it remind you to use Enum{Set,Map} with inspections:

Analyze – Run inspection by name – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”

…or…

File – Settings – Editor – Inspections – Java – Performance issues – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”

SonarQube can also remind you to use Enum{Set,Map}:

S1641: “Sets with elements that are enum values should be replaced with EnumSet”
S1640: “Maps with keys that are enum values should be replaced with EnumMap”

For immutable versions of Enum{Set,Map}, see the following methods from Guava:

If you don’t want to use Guava, then wrap the modifiable Enum{Set,Map} instances in Collections.unmodifiableSet(set) or Collections.unmodifiableMap(map) and throw away the direct references to the modifiable collections.

The resulting collections may be less efficient when it comes to operations like containsAll and equals than their counterparts in Guava, which may in turn be less efficient than the raw modifiable collections themselves.

Could the implementations be improved?

Since they can’t be replaced by third-party libraries, Enum{Set,Map} had better be as good as possible! They’re good already, but they could be better.

Enum{Set,Map} have missed out on potential upgrades since Java 8. New methods were added in Java 8 to Set and Map (or higher-level interfaces like Collection and Iterable). While the default implementations of those methods are correct, we could do better with overrides in Enum{Set,Map}.

This issue is tracked as JDK-8170826.

Specifically, these methods should be overridden:

{Regular,Jumbo}EnumSet.forEach(action)
{Regular,Jumbo}EnumSet.iterator().forEachRemaining(action)
{Regular,Jumbo}EnumSet.spliterator()
EnumMap.forEach(action)
EnumMap.{keySet,values,entrySet}().forEach(action)
EnumMap.{keySet,values,entrySet}().iterator().forEachRemaining(action)
EnumMap.{keySet,values,entrySet}().spliterator()

I put sample implementations on GitHub in case you’re curious what these overrides might look like. They’re all pretty straightforward.

Rather than walk through each implementation in detail, I’ll share some high-level observations about them.

The optimized forEach and forEachRemaining methods are roughly 50% better than the defaults (in terms of operations per second).
EnumMap.forEach(action) benefits the most, becoming twice as fast as the default implementation.
The iterable.forEach(action) method is popular. Optimizing it tends to affect a large audience, which increases the likelihood that the optimization (even if small) is worthwhile. (I’d claim that iterable.forEach(action) is too popular, and I’d suggest that the traditional enhanced for loop should be preferred over forEach except when the argument to forEach can be written as a method reference. That’s a topic for another discussion, though.)
The iterator.forEachRemaining(action) method is more important than it seems. Few people use it directly, but many people use it indirectly through streams. The default spliterator() delegates to the iterator(), and the default stream() delegates to the spliterator(). In the end, stream traversal may delegate to iterator().forEachRemaining(...). Given the popularity of streams, optimizing this method is a good idea!
The iterable.spliterator() method is critical when it comes to stream performance, but writing a custom Spliterator from scratch is a non-trivial task. I recommend this approach:
- Check whether the characteristics of the default spliterator are correct for your collection (often times the defaults are too conservative — for example, EnumSet‘s spliterator is currently missing the ORDERED, SORTED, and NONNULL characteristics). If they’re not correct, then provide a trivial override of the spliterator that uses Spliterators.spliterator(collection, characteristics) to define the correct characteristics.
- Don’t go further than that until you’ve read through the implementation of that spliterator, and you understand how it works, and you’re confident that you can do better. In particular, your tryAdvance(action) and trySplit() should both be better. Write a benchmark afterwards to confirm your assumptions.
The map.forEach(action) method is extremely popular and is almost always worth overriding. This is especially true for maps like EnumMap that create their Entry objects on demand.
It’s usually possible to share code across the forEach and forEachRemaining methods. If you override one, you’re already most of the way there to overriding the others.
I don’t think it’s worthwhile to override collection.removeIf(filter) in any of these classes. For RegularEnumSet, where it seemed most likely to be worthwhile, I couldn’t come up with a faster implementation than the default.
Enum{Set,Map} could provide faster hashCode() implementations than the ones they currently inherit from AbstractSet and AbstractMap, but I don’t think that would be worthwhile. In general, I don’t think optimizing the hashCode() of collections is worthwhile unless it can somehow become a constant-time (O(1)) operation, and even then it is questionable. Collection hash codes aren’t used very often.

Could the APIs be improved?

The implementation-level changes I’ve described are purely beneficial. There is no downside other than a moderate increase in lines of code, and the new lines of code aren’t all that complicated. (Even if they were complicated, this is java.util! Bring on the micro-optimizations.)

Since the existing code is already so good, though, changes of this nature have limited impact. Cutting one third or one half of the execution time from an operation that’s already measured in nanoseconds is a good thing but not game-changing. I suspect that those changes will cause exactly zero users of the JDK to write their applications differently.

The more tantalizing, meaningful, and dangerous changes are the realm of the APIs.

I think that Enum{Set,Map} are chronically underused. They have a bit of a PR problem. Some developers don’t know these classes exist. Other developers know about these classes but don’t bother to reach for them when the time comes. It’s just not a priority for them. That’s totally understandable, but… There’s avoiding premature optimization and then there’s throwing away performance for no reason — performance nihilism? Maybe we can win their hearts with API-level changes.

No one should have to go out of their way to use Enum{Set,Map}. Ideally it should be easier than using Hash{Set,Map}. The EnumSet.allOf(elementType) method is a great example. If you want a Set containing all the enum constants of some type, then EnumSet.allOf(elementType) is the best solution and the easiest solution.

The high-level JDK-8145048 tracks a couple of ideas for improvements in this area. In the following sections, I expand on these ideas and discuss other API-level changes.

Add immutable `Enum{Set,Map}` (maybe?)

In a recent conversation on Twitter about JEP 301: Enhanced Enums, Joshua Bloch and Brian Goetz referred to theoretical immutable Enum{Set,Map} types in the JDK.

Not sure I see the point. ImmutableEnumSet (a library change) seems far more valuable and easier to implement. Am I missing something? https://t.co/mLL3cb8etc

— Joshua Bloch (@joshbloch) December 7, 2016

a) not either/or; b) not solving same prob; c) gen enum easier than you think; d) we’d gladly take contrib of IES!

— Brian Goetz (@BrianGoetz) December 7, 2016

Still open to IE{S,M} contributions!

— Brian Goetz (@BrianGoetz) December 7, 2016

Joshua Bloch also discussed the possibility of an immutable EnumSet in Effective Java:

“The one real disadvantage of EnumSet is that it is not, as of release 1.6, possible to create an immutable EnumSet, but this will likely be remedied in an upcoming release. In the meantime, you can wrap an EnumSet with Collections.unmodifiableSet, but conciseness and performance will suffer.”

When he said “performance will suffer”, he was probably referring to the fact that certain bulk operations of EnumSet won’t execute as quickly when inside a wrapper collection (tracked as JDK-5039214). Consider RegularEnumSet.equals(object):


public boolean equals(Object o) {
    if (!(o instanceof RegularEnumSet))
        return super.equals(o);

    RegularEnumSet<?> es = (RegularEnumSet<?>)o;
    if (es.elementType != elementType)
        return elements == 0 && es.elements == 0;

    return es.elements == elements;
}

It’s optimized for the case that the argument is another instance of RegularEnumSet. In that case the equality check boils down to a comparison of two primitive long values. Now that’s fast!

If the argument to equals(object) was not a RegularEnumSet but instead a Collections.unmodifiableSet wrapper, that code would fall back to its slow path.

Guava’s approach is similar to the Collections.unmodifiableSet one, although Guava does a bit better in terms of unwrapping the underlying Enum{Set,Map} and delegating to the super-fast optimized paths.

If your application deals exclusively with Guava’s immutable Enum{Set,Map} wrappers, you should get the full benefit of those optimized paths from the JDK. If you mix and match Guava’s collections with the JDK’s though, the results won’t be quite as good. (RegularEnumSet doesn’t know how to unwrap Guava’s ImmutableEnumSet, so a comparison in that direction would invoke the slow path.)

If immutable Enum{Set,Map} had full support in the JDK, however, it would not have those same limitations. RegularEnumSet and friends can be changed.

What should be done in the JDK?

I spent a long time and tested a lot of code trying to come up with an answer to this. Sadly the end result is:

I don’t know.

Personally, I’m content to use Guava for this. I’ll share some observations I made along the way.

Immutable Enum{Set,Map} won’t be faster than mutable Enum{Set,Map}.

The current versions of Enum{Set,Map} are really, really good. They’ll be even better once they override the defaults from Java 8.

Sometimes, having to support mutability comes with a tax on efficiency. I don’t think this is the case with Enum{Set,Map}. At best, immutable versions of these classes will be exactly as efficient as the mutable ones.

The more likely outcome is that immutable versions will come with a small penalty to performance by expanding the Enum{Set,Map} ecosystem.

Take RegularEnumSet.equals(object) for example. Each time we create a new type of EnumSet, are we going to change that code to add a new instanceof check for our new type? If we add the check, we make that code worse at handling everything except our new type. If we don’t add the check, we…. still make that code worse! It’s less effective than it used to be; more EnumSet instances trigger the slow path.

Classes like Enum{Set,Map} have a userbase that is more sensitive to changes in performance than average users. If adding a new type causes some call site to become megamorphic, we might have thrown their carefully-crafted assumptions regarding performance out the window.

If we decide to add immutable Enum{Set,Map}, we should do so for reasons unrelated to performance.

As an exception to the rule, an immutable EnumSet containing all constants of a single enum type would be really fast.

RegularEnumSet sets such a high bar for efficiency. There is almost no wiggle room in Set operations like contains(element) for anyone else to be faster. Here’s the source code for RegularEnumSet.contains(element):


public boolean contains(Object e) {
    if (e == null)
        return false;
    Class<?> eClass = e.getClass();
    if (eClass != elementType && eClass.getSuperclass() != elementType)
        return false;

    return (elements & (1L << ((Enum<?>)e).ordinal())) != 0;
}

If you can’t do contains(element) faster than that, you’ve already lost. Your EnumSet is probably worthless.

There is a worthy contender, which I’ll call FullEnumSet. It is an EnumSet that (always) contains every constant of a single enum type. Here is one way to write that class:



import java.util.function.Consumer;
import java.util.function.Predicate;

class FullEnumSet<E extends Enum&lt;E>> extends EnumSet<E> {

  // TODO: Add a static factory method somewhere.
  FullEnumSet(Class<E> elementType, Enum<?>[] universe) {
    super(elementType, universe);
  }

  @Override
  @SuppressWarnings("unchecked")
  public Iterator<E> iterator() {
    // TODO: Avoid calling Arrays.asList.
    //       The iterator class can be shared and used directly.
    return Arrays.asList((E[]) universe).iterator();
  }

  @Override
  public Spliterator<E> spliterator() {
    return Spliterators.spliterator(
        universe,
        Spliterator.ORDERED |
            Spliterator.SORTED |
            Spliterator.IMMUTABLE |
            Spliterator.NONNULL |
            Spliterator.DISTINCT);
  }

  @Override
  public int size() {
    return universe.length;
  }

  @Override
  public boolean contains(Object e) {
    if (e == null)
      return false;

    Class<?> eClass = e.getClass();
    return eClass == elementType || eClass.getSuperclass() == elementType;
  }

  @Override
  public boolean containsAll(Collection<?> c) {
    if (!(c instanceof EnumSet))
      return super.containsAll(c);

    EnumSet<?> es = (EnumSet<?>) c;
    return es.elementType == elementType || es.isEmpty();
  }

  @Override
  @SuppressWarnings("unchecked")
  public void forEach(Consumer<? super E> action) {
    int i = 0, n = universe.length;
    if (i >= n) {
      Objects.requireNonNull(action);
      return;
    }
    do action.accept((E) universe[i]);
    while (++i < n);
  }

  @Override void addAll()               {throw uoe();}
  @Override void addRange(E from, E to) {throw uoe();}
  @Override void complement()           {throw uoe();}

  @Override public boolean add(E e)                          {throw uoe();}
  @Override public boolean addAll(Collection<? extends E> c) {throw uoe();}
  @Override public void    clear()                           {throw uoe();}
  @Override public boolean remove(Object e)                  {throw uoe();}
  @Override public boolean removeAll(Collection<?> c)        {throw uoe();}
  @Override public boolean removeIf(Predicate<? super E> f)  {throw uoe();}
  @Override public boolean retainAll(Collection<?> c)        {throw uoe();}

  private static UnsupportedOperationException uoe() {
    return new UnsupportedOperationException();
  }

  // TODO: Figure out serialization.
  //       Serialization should preserve these qualities:
  //         - Immutable
  //         - Full
  //         - Singleton?
  //       Maybe it's a bad idea to extend EnumSet?
  private static final long serialVersionUID = 0;
}

FullEnumSet has many desirable properties. Of note:

contains(element) only needs to check the type of the argument to know whether it’s a member of the set.
containsAll(collection) is extremely fast when the argument is an EnumSet (of any kind); it boils down to comparing the element types of the two sets. It follows that equals(object) is just as fast in that case, since equals delegates the hard work to containsAll.
Since all the elements are contained in one flat array with no empty spaces, conditions are ideal for iterating and for splitting (splitting efficiency is important in the context of parallel streams).
It beats RegularEnumSet in all important metrics:
- Query speed (contains(element), etc.)
- Iteration speed
- Space consumed

Asking for the full set of enum constants of some type is a very common operation. See: every user of values(), elementType.getEnumConstants(), and EnumSet.allOf(elementType). I bet the vast majority of those users do not modify (their copy of) that set of constants. A class that is specifically tailored to that use case has a good chance of being worthwhile.

Since it’s immutable, the FullEnumSet of each enum type could be a lazy-initialized singleton.

Should immutable Enum{Set,Map} reuse existing code, or should they be rewritten from scratch?

As I said earlier, the immutable versions of these classes aren’t going to be any faster. If they’re built from scratch, that code is going to look near-identical to the existing code. There would be a painful amount of copy and pasting, and I would not envy the people responsible for maintaining that code in the future.

Suppose we want to reuse the existing code. I see two general approaches:

Do what Guava did, basically. Create unmodifiable wrappers around modifiable Enum{Set,Map}. Both the wrappers and the modifiable collections should be able to unwrap intelligently to take advantage of the existing optimizations for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)).
Extend the modifiable Enum{Set,Map} classes with new classes that override modifier methods to throw UnsupportedOperationException. Optimizations that sniff for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)) remain exactly as effective as before without changes.

Of those two, I prefer the Guava-like approach. Extending the existing classes raises some difficult questions about the public API, particularly with respect to serialization.

What’s the public API for immutable Enum{Set,Map}? What’s the immutable version of EnumSet.of(e1, e2, e3)?

Here’s where I gave up.

Should we add public java.util.ImmutableEnum{Set,Map} classes?
If not, where do we put the factory methods, and what do we name them? EnumSet.immutableOf(e1, e2, e3)? EnumSet.immutableAllOf(Month.class)? Yuck! (Clever synonyms like “having” and “universeOf” might be even worse.)
Are the new classes instances of Enum{Set,Map} or do they exist in an unrelated class hierarchy?
If the new classes do extend Enum{Set,Map}, how is serialization affected? Do we add an “isImmutable” bit to the current serialized forms? Can that be done without breaking backwards compatibility?

Good luck to whoever has to produce the final answers to those questions.

That’s enough about this topic. Let’s move on.

Add factory methods

JDK-8145048 mentions the possibility of adding factory methods in Enum{Set,Map} to align them with Java 9’s Set and Map factories. EnumSet already has a varargs EnumSet.of(...) factory method, but EnumMap has nothing like that.

It would be nice to be able to declare EnumMap instances like this, for some reasonable number of key-value pairs:


Map<DayOfWeek, String> dayNames =
    EnumMap.of(
        DayOfWeek.MONDAY,    "lunes",
        DayOfWeek.TUESDAY,   "martes",
        DayOfWeek.WEDNESDAY, "miércoles",
        DayOfWeek.THURSDAY,  "jueves",
        DayOfWeek.FRIDAY,    "viernes",
        DayOfWeek.SATURDAY,  "sábado",
        DayOfWeek.SUNDAY,    "domingo");

Users could use EnumMap‘s copy constructor in conjunction with Java 9’s Map factory methods to achieve the same result less efficiently…


Map<DayOfWeek, String> dayNames =
    new EnumMap<>(
        Map.of(
            DayOfWeek.MONDAY,    "lunes",
            DayOfWeek.TUESDAY,   "martes",
            DayOfWeek.WEDNESDAY, "miércoles",
            DayOfWeek.THURSDAY,  "jueves",
            DayOfWeek.FRIDAY,    "viernes",
            DayOfWeek.SATURDAY,  "sábado",
            DayOfWeek.SUNDAY,    "domingo"));

…but the more we give up efficiency like that, the less EnumMap makes sense in the first place. A reasonable person might start to question why they should bother with EnumMap at all — just get rid of the new EnumMap<>(...) wrapper and use Map.of(...) directly.

Speaking of that EnumMap(Map) copy constructor, the fact that it may throw IllegalArgumentException when provided an empty Map leads people to use this pattern instead:


Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class);
copy.putAll(otherMap);

We could give them a shortcut:


Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class, otherMap);

Similarly, to avoid an IllegalArgumentException from EnumSet.copyOf(collection), I see code like this:


Set<Month> copy = EnumSet.noneOf(Month.class);
copy.addAll(otherCollection);

We could give them a shortcut too:


Set<Month> copy = EnumSet.copyOf(Month.class, otherCollection);

Existing code may define mappings from enum constants to values as standalone functions. Maybe the users of that code would like to view those (function-based) mappings as Map objects.

To that end, we could give people the means to generate an EnumMap from a Function:


Locale locale = Locale.forLanguageTag("es-MX");

Map<DayOfWeek, String> dayNames =
    EnumMap.map(DayOfWeek.class,
                day -> day.getDisplayName(TextStyle.FULL, locale));

// We could interpret the function returning null to mean that the
// key is not present.  That would allow this method to support
// more than the "every constant is a key" use case while dropping
// support for the "there may be present null values" use case,
// which is probably a good trade.

We could provide a similar factory method for EnumSet, accepting a Predicate instead of a Function:


Set<Month> shortMonths =
    EnumSet.filter(Month.class,
                   month -> month.minLength() < 31);

This functionality could be achieved less efficiently and more verbosely with streams. Again, the more we give up efficiency like that, the less sense it makes to use Enum{Set,Map} in the first place. I acknowledge that there is a cost to making API-level changes like the ones I’m discussing, but I feel that we are solidly in the “too little API-level support for Enum{Set,Map}” part of the spectrum and not even close to approaching the opposite “API bloat” end.

I don’t mean to belittle streams. There should also be more support for Enum{Set,Map} in the stream API.

Add collectors

Code written for Java 8+ will often produce collections using streams and collectors rather than invoking collection constructors or factory methods directly. I don’t think it would be outlandish to estimate that one third of collections are produced by collectors. Some of these collections will be (or could be) Enum{Set,Map}, and more could be done to serve that use case.

Collectors with these signatures should exist somewhere in the JDK:


public static <T extends Enum<T>>
Collector<T, ?, EnumSet<T>> toEnumSet(
    Class<T> elementType)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U>; mergeFunction)

Similar collectors can be obtained from the existing collector factories in the Collectors class (specifically toCollection(collectionSupplier) and toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)) or by using Collector.of(...), but that requires a little more effort on the users’ part, adding a little bit of extra friction to using Enum{Set,Map} that we don’t need.

I referenced these collectors from Guava earlier in this article:

They do not require the Class object argument, making them easier to use than the collectors that I proposed. The reason the Guava collectors can do this is that they produce ImmutableSet and ImmutableMap, not EnumSet and EnumMap. One cannot create an Enum{Set,Map} instance without having the Class object for that enum type. In order to have a collector that reliably produces Enum{Set,Map} (even when the stream contains zero input elements to grab the Class object from), the Class object must be provided up front.

We could provide similar collectors in the JDK that would produce immutable Set and Map instances. For streams with no elements, the collectors would produce Collections.emptySet() or Collections.emptyMap(). For streams with at least one element, the collectors would produce an Enum{Set,Map} instance wrapped by Collections.unmodifiable{Set,Map}.

The signatures would look like this:


public static <T extends Enum<T>>
Collector<T, ?, Set<T>> toImmutableEnumSet()

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U>gt; mergeFunction)

I’m not sure that those collectors are worthwhile. I might never recommend them over their counterparts in Guava.

The StreamEx library also provides a couple of interesting enum-specialized collectors:

They’re interesting because they are potentially short-circuiting. With MoreCollectors.toEnumSet(elementType), when the collector can determine that it has encountered all of the elements of that enum type (which is easy — the set of already-collected elements can be compared to EnumSet.allOf(elementType)), it stops collecting. These collectors may be well-suited for streams having a huge number of elements (or having elements that are expensive to compute) mapping to a relatively small set of enum constants.

I don’t know how feasible it is to port these StreamEx collectors to the JDK. As I understand it, the concept of short-circuiting collectors is not supported by the JDK. Adding support may necessitate other changes to the stream and collector APIs.

Be navigable? (No)

Over the years, many people have suggested that Enum{Set,Map} should implement the NavigableSet and NavigableMap interfaces. Every enum type is Comparable, so it’s technically possible. Why not?

I think the Navigable{Set,Map} interfaces are a poor fit for Enum{Set,Map}.

Those interfaces are huge! Implementing Navigable{Set,Map} would bloat the size of Enum{Set,Map} by 2-4x (in terms of lines of code). It would distract them from their core focus and strengths. Supporting the navigable API would most likely come with a non-zero penalty to runtime performance.

Have you ever looked closely at the specified behavior of methods like subSet and subMap, specifically when they might throw IllegalArgumentException? Those contracts impose a great deal of complexity for what seems like undesirable behavior. Enum{Set,Map} could take a stance on those methods similar to Guava’s ImmutableSortedSet and ImmutableSortedMap: acknowledge the contract of the interface but do something else that is more reasonable instead…

I say forget about it. If you want navigable collections, use TreeSet and TreeMap (or their thread-safe cousins, ConcurrentSkipListSet and ConcurrentSkipListMap). The cross-section of people who need the navigable API and the efficiency of enum-specialized collections must be very small.

There are few cases where the Comparable nature of enum types comes into play at all. In practice, I expect that the ordering of most enum constants is arbitrary (with respect to intended behavior).

I’ll go further than that; I think that making all enum types Comparable in the first place was a mistake.

Which ordering of Collector.Characteristics is “natural”, [CONCURRENT,UNORDERED] or [UNORDERED,CONCURRENT]?
Which is the “greater” Thread.State, WAITING or TIMED_WAITING?
FileVisitOption.FOLLOW_LINKS is “comparable” — to what? (There is no other FileVisitOption.)

How many instances of RoundingMode are in the “range” from FLOOR to CEILING?


import java.math.RoundingMode;
import java.util.EnumSet;
import java.util.Set;

class RangeTest {
  public static void main(String[] args) {
    Set<RoundingMode> range =
        EnumSet.range(RoundingMode.FLOOR,
                      RoundingMode.CEILING);
    System.out.println(range.size());
  }
}

// java.lang.IllegalArgumentException: FLOOR > CEILING

There are other enum types where questions like that actually make sense, and those should be Comparable.

Is Month.JANUARY “before” Month.FEBRUARY? Yes.
Is TimeUnit.HOURS “larger” than TimeUnit.MINUTES? Yes.

Implementing Comparable or not should have been a choice for authors of individual enum types. To serve people who really did want to sort enum constants by declaration order for whatever reason, we could have automatically provided a static Comparator from each enum type:


Comparator<JDBCType> c = JDBCType.declarationOrder();

It’s too late for that now. Let’s not double down on the original mistake by making Enum{Set,Map} navigable.

Conclusion

EnumSet
and
EnumMap are cool collections, and you should use them!

They’re already great, but they can become even better with changes to their private implementation details. I propose some ideas here. If you want to find out what happens in the JDK, the changes (if there are any) should be noted in JDK-8170826.

API-level changes are warranted as well. New factory methods and collectors would make it easier to obtain instances of Enum{Set,Map}, and immutable Enum{Set,Map} could be better-supported. I propose some ideas here, but if there are any actual changes made then they should be noted in JDK-8145048.

Efficient multiple-stream concatenation in Java

October 19, 2016

Alan Laser

I want to combine the elements of multiple Stream instances into a single Stream. What’s the best way to do this?

This article compares a few different solutions.

Stream.concat(a, b)

The JDK provides Stream.concat(a, b) for concatenating two streams.

void exampleConcatTwo() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> out = Stream.concat(a, b);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
}

What if we have more than two streams?

We could use Stream.concat(a, b) multiple times. With three streams we could write Stream.concat(Stream.concat(a, b), c).

To me that approach is depressing at three streams, and it rapidly gets worse as we add more streams.

Reduce

Alternatively, we can use reduce to perform the multiple incantations of Stream.concat(a, b) for us. The code adapts elegantly to handle any number of input streams.

void exampleReduce() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = Stream.of(a, b, c)
      .reduce(Stream::concat)
      .orElseGet(Stream::empty);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

Be careful using this pattern! Note the warning in the documentation of Stream.concat(a, b):

Use caution when constructing streams from repeated concatenation. Accessing an element of a deeply concatenated stream can result in deep call chains, or even StackOverflowError.

It takes quite a few input streams to trigger this problem, but it is trivial to demonstrate:

void exampleStackOverflow() {
  List<Stream<String>> inputs = new AbstractList<Stream<String>>() {
    @Override
    public Stream<String> get(int index) {
      return Stream.of("one", "two");
    }

    @Override
    public int size() {
      return 1_000_000; // try changing this number
    }
  };
  Stream<String> out = inputs.stream()
      .reduce(Stream::concat)
      .orElseGet(Stream::empty);
  long count = out.count(); // probably throws
  System.out.println("count: " + count); // probably never reached
}

On my workstation, this method throws StackOverflowError after several seconds of churning.

What’s going on here?

We can think of the calls to Stream.concat(a, b) as forming a binary tree. At the root is the concatenation of all the input streams. At the leaves are the individual input streams. Let’s look at the trees for up to five input streams as formed by our reduce operation.

Two streams:
concat(a,b)ab
Three streams:
concat(concat(a,b),c)concat(a,b)cab
Four streams:
concat(concat(concat(a,b),c),d)concat(concat(a,b),c)dconcat(a,b)cab
Five streams:
concat(concat(concat(concat(a,b),c),d),e)concat(concat(econcat(a,b),c),d)concat(concat(a,b),c)dconcat(a,b)cab

The trees are perfectly unbalanced! Each additional input stream adds one layer of depth to the tree and one layer of indirection to reach all the other streams. This can have a noticeable negative impact on performance. With enough layers of indirection we’ll see a StackOverflowError.

Balance

If we’re worried that we’ll concatenate a large number of streams and run into the aforementioned problems, we can balance the tree. This is as if we’re optimizing a O(n) algorithm into a O(logn) one. We won’t totally eliminate the possibility of StackOverflowError, and there may be other approaches that perform even better, but this should be quite an improvement over the previous solution.

void exampleBalance() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = concat(a, b, c);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

@SafeVarargs
static <T> Stream<T> concat(Stream<T>... in) {
  return concat(in, 0, in.length);
}

static <T> Stream<T> concat(Stream<T>[] in, int low, int high) {
  switch (high - low) {
    case 0: return Stream.empty();
    case 1: return in[low];
    default:
      int mid = (low + high) >>> 1;
      Stream<T> left = concat(in, low, mid);
      Stream<T> right = concat(in, mid, high);
      return Stream.concat(left, right);
  }
}

Flatmap

There is another way to concatenate streams that is built into the JDK, and it does not involve Stream.concat(a, b) at all. It is flatMap.

void exampleFlatMap() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = Stream.of(a, b, c).flatMap(s -> s);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

This generally outperforms the solutions based on Stream.concat(a, b) when each input stream contains fewer than 32 elements. As we increase the element count past 32, flatMap performs comparatively worse and worse as the element count rises.

flatMap avoids the StackOverflowError issue but it comes with its own set of quirks. For example, it interacts poorly with infinite streams. Calling findAny on the concatenated stream may cause the program to enter an infinite loop, whereas the other solutions would terminate almost immediately.

void exampleInfiniteLoop() {
  Stream<String> a = Stream.generate(() -> "one");
  Stream<String> b = Stream.generate(() -> "two");
  Stream<String> c = Stream.generate(() -> "three");
  Stream<String> out = Stream.of(a, b, c).flatMap(s -> s);
  Optional<String> any = combined.findAny(); // infinite loop
  System.out.println(any); // never reached
}

(The infinite loop is an implementation detail. This could be fixed in the JDK without changing the contract of flatMap.)

Also, flatMap forces its input streams into sequential mode even if they were originally parallel. The outermost concatenated stream can still be made parallel, and we will be able to process elements from distinct input streams in parallel, but the elements of each individual input stream must all be processed sequentially.

Analysis

Let me share a few trends that I’ve noticed when dealing with streams and stream concatenation in general, having written a fair amount of code in Java 8 by now.

There have been maybe one dozen cases where I’ve needed to concatenate streams. That’s not all that many, so no matter how good the solution is, it’s not going to have much of an impact for me.
In all but one of those one dozen cases, I needed to concatenate exactly two streams, so Stream.concat(a, b) was sufficient.
In the remaining case, I needed to concatenate exactly three streams. I was not even close to the point where StackOverflowError would become an issue. Stream.concat(Stream.concat(a, b), c) would have worked just fine, although I went with flatMap because I felt that it was easier to read.
I have never needed to concatenate streams in performance-critical sections of code.
I use infinite streams very rarely. When I do use them, it is obvious in context that they are infinite. And so concatenating infinite streams together and then asking a question like findAny on the result is just not something that I would be tempted to do. That particular issue with flatMap seems like one that I’ll never come across.
I use parallel streams very rarely. I think I’ve only used them twice in production code. It is almost never the case that going parallel improves performance, and even when it might improve performance, it is unlikely that processing them in the singleton ForkJoinPool.commonPool() is how I will want to manage that work. The issue with flatMap forcing the input streams to be sequential seems very unlikely to be a real problem for me.
Let’s suppose that I do want to concatenate parallel streams and have them processed in parallel. If I have eight input streams on an eight core machine, and each stream has roughly the same number of elements, the fact that flatMap forces the individual streams to be sequential will not degrade performance for me at all. All eight cores will be fully utilized, each core processing one of the eight input streams. If I have seven input streams on that same machine, I will see only slightly degraded performance. With six, slightly more degraded, and so on.

What’s the takeaway from all this? Here is my advice:

For two input streams, use:
Stream.concat(a, b)

For more than two input streams, use:
Stream.of(a, b, c, ...).flatMap(s -> s)

That solution is good enough…

Overboard

…but what if we’re not satisfied with “good enough”? What if we want a solution that’s really fast no matter the size and shape of the input and doesn’t have any of the quirks of the other solutions?

It is a bit much to inline in a blog article, so take a look at StreamConcatenation.java for the source code.

This implementation is similar to Stream.concat(a, b) in that it uses a custom Spliterator, except this implementation handles any number of input streams.

It performs quite well. It does not outperform every other solution in every scenario (flatMap is generally better for very small input streams), but it never performs much worse and it scales nicely with the number and size of the input streams.

Benchmark

I wrote a JMH benchmark to compare the four solutions discussed in this article. The benchmark uses each solution to concatenate a variable number of input streams with a variable number of elements per stream, then iterates over the elements of the concatenated stream. Here is the raw JMH output from my workstation and a prettier visualization of the benchmark results.

Do you questions about Java? Or do you have an idea for a website or app? Either way, we can help!

Email Twitter

Everything about Java 8

March 27, 2013

Alan Laser

The following post is a comprehensive summary of the developer-facing changes coming in Java 8. As of March 18, 2014, Java 8 is now generally available.

I used preview builds of IntelliJ for my IDE. It had the best support for the Java 8 language features at the time I went looking. You can find those builds here: IntelliJIDEA EAP.

Interface improvements

Interfaces can now define static methods. For instance, a naturalOrder method was added to java.util.Comparator:

public static <T extends Comparable<? super T>>
Comparator<T> naturalOrder() {
    return (Comparator<T>)
        Comparators.NaturalOrderComparator.INSTANCE;
}

A common scenario in Java libraries is, for some interface Foo, there would be a companion utility class Foos with static methods for generating or working with Foo instances. Now that static methods can exist on interfaces, in many cases the Foos utility class can go away (or be made package-private), with its public methods going on the interface instead.

Additionally, more importantly, interfaces can now define default methods. For instance, a forEach method was added to java.lang.Iterable:

public default void forEach(Consumer<? super T> action) {
    Objects.requireNonNull(action);
    for (T t : this) {
        action.accept(t);
    }
}

In the past it was essentially impossible for Java libraries to add methods to interfaces. Adding a method to an interface would mean breaking all existing code that implements the interface. Now, as long as a sensible default implementation of a method can be provided, library maintainers can add methods to these interfaces.

In Java 8, a large number of default methods have been added to core JDK interfaces. I’ll discuss many of them later.

Why can’t default methods override equals, hashCode, and toString?

An interface cannot provide a default implementation for any of the methods of the Object class. In particular, this means one cannot provide a default implementation for equals, hashCode, or toString from within an interface.

This seems odd at first, given that some interfaces actually define their equals behavior in documentation. The List interface is an example. So, why not allow this?

Brian Goetz gave four reasons in a lengthy response on the Project Lambda mailing list. I’ll only describe one here, because that one was enough to convince me:

It would become more difficult to reason about when a default method is invoked. Right now it’s simple: if a class implements a method, that always wins over a default implementation. Since all instances of interfaces are Objects, all instances of interfaces have non-default implementations of equals/hashCode/toString already. Therefore, a default version of these on an interface is always useless, and it may as well not compile.

For further reading, see this explanation written by Brian Goetz: response to “Allow default methods to override Object’s methods”

Functional interfaces

A core concept introduced in Java 8 is that of a “functional interface”. An interface is a functional interface if it defines exactly one abstract method. For instance, java.lang.Runnable is a functional interface because it only defines one abstract method:

public abstract void run();

Note that the “abstract” modifier is implied because the method lacks a body. It is not necessary to specify the “abstract” modifier, as this code does, in order to qualify as a functional interface.

Default methods are not abstract, so a functional interface can define as many default methods as it likes.

A new annotation, @FunctionalInterface, has been introduced. It can be placed on an interface to declare the intention of it being a functional interface. It will cause the interface to refuse to compile unless you’ve managed to make it a functional interface. It’s sort of like @Override in this way; it declares intention and doesn’t allow you to use it incorrectly.

Lambdas

An extremely valuable property of functional interfaces is that they can be instantiated using lambdas. Here are a few examples of lambdas:

Comma-separated list of inputs with specified types on the left, a block with a return on the right:

(int x, int y) -> { return x + y; }

Comma-separated list of inputs with inferred types on the left, a return value on the right:

(x, y) -> x + y

Single parameter with inferred type on the left, a return value on the right:

x -> x * x

No inputs on left (official name: “burger arrow”), return value on the right:

() -> x

Single parameter with inferred type on the left, a block with no return (void return) on the right:

x -> { System.out.println(x); }

Static method reference:

String::valueOf

Non-static method reference:

Object::toString

Capturing method reference:

x::toString

Constructor reference:

ArrayList::new

You can think of method reference forms as shorthand for the other lambda forms.

Method reference		Equivalent lambda expression
`String::valueOf`		`x -> String.valueOf(x)`
`Object::toString`		`x -> x.toString()`
`x::toString`		`() -> x.toString()`
`ArrayList::new`		`() -> new ArrayList<>()`

Of course, methods in Java can be overloaded. Classes can have multiple methods with the same name but different parameters. The same goes for its constructors. ArrayList::new could refer to any of its three constructors. The method it resolves to depends on which functional interface it’s being used for.

A lambda is compatible with a given functional interface when their “shapes” match. By “shapes”, I’m referring to the types of the inputs, outputs, and declared checked exceptions.

To give a couple of concrete, valid examples:

Comparator<String> c = (a, b) -> Integer.compare(a.length(),
                                                 b.length());

A Comparator<String>’s compare method takes two strings as input, and returns an int. That’s consistent with the lambda on the right, so this assignment is valid.

Runnable r = () -> { System.out.println("Running!"); }

A Runnable’s run method takes no arguments and does not have a return value. That’s consistent with the lambda on the right, so this assignment is valid.

The checked exceptions (if present) in the abstract method’s signature matter too. The lambda can only throw a checked exception if the functional interface declares that exception in its signature.

Capturing versus non-capturing lambdas

Lambdas are said to be “capturing” if they access a non-static variable or object that was defined outside of the lambda body. For example, this lambda captures the variable x:

int x = 5;
return y -> x + y;

In order for this lambda declaration to be valid, the variables it captures must be “effectively final”. So, either they must be marked with the final modifier, or they must not be modified after they’re assigned.

Whether a lambda is capturing or not has implications for performance. A non-capturing lambda is generally going to be more efficient than a capturing one. Although this is not defined in any specifications (as far as I know), and you shouldn’t count on it for a program’s correctness, a non-capturing lambda only needs to be evaluated once. From then on, it will return an identical instance. Capturing lambdas need to be evaluated every time they’re encountered, and currently that performs much like instantiating a new instance of an anonymous class.

What lambdas don’t do

There are a few features that lambdas don’t provide, which you should keep in mind. They were considered for Java 8 but were not included, for simplicity and due to time constraints.

Non-final variable capture – If a variable is assigned a new value, it can’t be used within a lambda. The “final” keyword is not required, but the variable must be “effectively final” (discussed earlier). This code does not compile:

int count = 0;
List<String> strings = Arrays.asList("a", "b", "c");
strings.forEach(s -> {
    count++; // error: can't modify the value of count
});

Exception transparency – If a checked exception may be thrown from inside a lambda, the functional interface must also declare that checked exception can be thrown. The exception is not propogated to the containing method. This code does not compile:

void appendAll(Iterable<String> values, Appendable out)
        throws IOException { // doesn't help with the error
    values.forEach(s -> {
        out.append(s); // error: can't throw IOException here
                       // Consumer.accept(T) doesn't allow it
    });
}

There are ways to work around this, where you can define your own functional interface that extends Consumer and sneaks the IOException through as a RuntimeException. I tried this out in code and found it to be too confusing to be worthwhile.

Control flow (break, early return) – In the forEach examples above, a traditional continue is possible by placing a “return;” statement within the lambda. However, there is no way to break out of the loop or return a value as the result of the containing method from within the lambda. For example:

final String secret = "foo";
boolean containsSecret(Iterable<String> values) {
    values.forEach(s -> {
        if (secret.equals(s)) {
            ??? // want to end the loop and return true, but can't
        }
    });
}

For further reading about these issues, see this explanation written by Brian Goetz: response to “Checked exceptions within Block<T>

Why abstract classes can’t be instantiated using a lambda

An abstract class, even if it declares only one abstract method, cannot be instantiated with a lambda.

Two examples of classes with one abstract method are Ordering and CacheLoader from the Guava library. Wouldn’t it be nice to be able to declare instances of them using lambdas like this?

Ordering<String> order = (a, b) -> ...;
CacheLoader<String, String> loader = (key) -> ...;

The most common argument against this was that it would add to the difficulty of reading a lambda. Instantiating an abstract class in this way could lead to execution of hidden code: that in the constructor of the abstract class.

Another reason is that it throws out possible optimizations for lambdas. In the future, it may be the case that lambdas are not evaluated into object instances. Letting users declare abstract classes with lambdas would prevent optimizations like this.

Besides, there’s an easy workaround. Actually, the two example classes from Guava already demonstrate this workaround. Add factory methods to convert from a lambda to an instance:

Ordering<String> order = Ordering.from((a, b) -> ...);
CacheLoader<String, String> loader =
    CacheLoader.from((key) -> ...);

For further reading, see this explanation written by Brian Goetz: response to “Allow lambdas to implement abstract classes”

java.util.function

Package summary: java.util.function

As demonstrated earlier with Comparator and Runnable, interfaces already defined in the JDK that happen to be functional interfaces are compatible with lambdas. The same goes for any functional interfaces defined in your own code or in third party libraries.

But there are certain forms of functional interfaces that are widely, commonly useful, which did not exist previously in the JDK. A large number of these interfaces have been added to the new java.util.function package. Here are a few:

Function<T, R> – take a T as input, return an R as ouput
Predicate<T> – take a T as input, return a boolean as output
Consumer<T> – take a T as input, perform some action and don’t return anything
Supplier<T> – with nothing as input, return a T
BinaryOperator<T> – take two T’s as input, return one T as output, useful for “reduce” operations

Primitive specializations for most of these exist as well. They’re provided in int, long, and double forms. For instance:

IntConsumer – take an int as input, perform some action and don’t return anything

These exist for performance reasons, to avoid boxing and unboxing when the inputs or outputs are primitives.

java.util.stream

Package summary: java.util.stream

The new java.util.stream package provides utilities “to support functional-style operations on streams of values” (quoting the javadoc). Probably the most common way to obtain a stream will be from a collection:

Stream<T> stream = collection.stream();

A stream is something like an iterator. The values “flow past” (analogy to a stream of water) and then they’re gone. A stream can only be traversed once, then it’s used up. Streams may also be infinite.

Streams can be sequential or parallel. They start off as one and may be switched to the other using stream.sequential() or stream.parallel(). The actions of a sequential stream occur in serial fashion on one thread. The actions of a parallel stream may be happening all at once on multiple threads.

So, what do you do with a stream? Here is the example given in the package javadocs:

int sumOfWeights = blocks.stream().filter(b -> b.getColor() == RED)
                                  .mapToInt(b -> b.getWeight())
                                  .sum();

Note: The above code makes use of a primitive stream, and a sum() method is only available on primitive streams. There will be more detail on primitive streams shortly.

A stream provides a fluent API for transforming values and performing some action on the results. Stream operations are either “intermediate” or “terminal”.

Intermediate – An intermediate operation keeps the stream open and allows further operations to follow. The filter and map methods in the example above are intermediate operations. The return type of these methods is Stream; they return the current stream to allow chaining of more operations.
Terminal – A terminal operation must be the final operation invoked on a stream. Once a terminal operation is invoked, the stream is “consumed” and is no longer usable. The sum method in the example above is a terminal operation.

Usually, dealing with a stream will involve these steps:

Obtain a stream from some source.
Perform one or more intermediate operations.
Perform one terminal operation.

It’s likely that you’ll want to perform all those steps within one method. That way, you know the properties of the source and the stream and can ensure that it’s used properly. You probably don’t want to accept arbitrary Stream<T> instances as input to your method because they may have properties you’re ill-equipped to deal with, such as being parallel or infinite.

There are a couple more general properties of stream operations to consider:

Stateful – A stateful operation imposes some new property on the stream, such as uniqueness of elements, or a maximum number of elements, or ensuring that the elements are consumed in sorted fashion. These are typically more expensive than stateless intermediate operations.
Short-circuiting – A short-circuiting operation potentially allows processing of a stream to stop early without examining all the elements. This is an especially desirable property when dealing with infinite streams; if none of the operations being invoked on a stream are short-circuiting, then the code may never terminate.

Here are short, general descriptions for each Stream method. See the javadocs for more thorough explanations. Links are provided below for each overloaded form of the operation.

Intermediate operations:

filter [1] – Exclude all elements that don’t match a Predicate.
map [1] [2] [3] [4] – Perform a one-to-one transformation of elements using a Function.
flatMap [1] [2] [3] [4] – Transform each element into zero or more elements by way of another Stream.
peek [1] – Perform some action on each element as it is encountered. Primarily useful for debugging.
distinct [1] – Exclude all duplicate elements according to their .equals behavior. This is a stateful operation.
sorted [1] [2] – Ensure that stream elements in subsequent operations are encountered according to the order imposed by a Comparator. This is a stateful operation.
limit [1] – Ensure that subsequent operations only see up to a maximum number of elements. This is a stateful, short-circuiting operation.
skip [1] – Ensure that subsequent operations do not see the first n elements. This is a stateful operation.

Terminal operations:

forEach [1] – Perform some action for each element in the stream.
toArray [1] [2] – Dump the elements in the stream to an array.
reduce [1] [2] [3] – Combine the stream elements into one using a BinaryOperator.
collect [1] [2] – Dump the elements in the stream into some container, such as a Collection or Map.
min [1] – Find the minimum element of the stream according to a Comparator.
max [1] – Find the maximum element of the stream according to a Comparator.
count [1] – Find the number of elements in the stream.
anyMatch [1] – Find out whether at least one of the elements in the stream matches a Predicate. This is a short-circuiting operation.
allMatch [1] – Find out whether every element in the stream matches a Predicate. This is a short-circuiting operation.
noneMatch [1] – Find out whether zero elements in the stream match a Predicate. This is a short-circuiting operation.
findFirst [1] – Find the first element in the stream. This is a short-circuiting operation.
findAny [1] – Find any element in the stream, which may be cheaper than findFirst for some streams. This is a short-circuiting operation.

As noted in the javadocs, intermediate operations are lazy. Only a terminal operation will start the processing of stream elements. At that point, no matter how many intermediate operations were included, the elements are then consumed in (usually, but not quite always) a single pass. (Stateful operations such as sorted() and distinct() may require a second pass over the elements.)

Streams try their best to do as little work as possible. There are micro-optimizations such as eliding a sorted() operation when it can determine the elements are already in order. In operations that include limit(x) or substream(x,y), a stream can sometimes avoid performing intermediate map operations on the elements it knows aren’t necessary to determine the result. I’m not going to be able to do the implementation justice here; it’s clever in lots of small but significant ways, and it’s still improving.

Returning to the concept of parallel streams, it’s important to note that parallelism is not free. It’s not free from a performance standpoint, and you can’t simply swap out a sequential stream for a parallel one and expect the results to be identical without further thought. There are properties to consider about your stream, its operations, and the destination for its data before you can (or should) parallelize a stream. For instance: Does encounter order matter to me? Are my functions stateless? Is my stream large enough and are my operations complex enough to make parallelism worthwhile?

There are primitive-specialized versions of Stream for ints, longs, and doubles:

One can convert back and forth between an object stream and a primitive stream using the primitive-specialized map and flatMap functions, among others. To give a few contrived examples:

List<String> strings = Arrays.asList("a", "b", "c");
strings.stream()                    // Stream<String>
       .mapToInt(String::length)    // IntStream
       .longs()                     // LongStream
       .mapToDouble(x -> x / 10.0)  // DoubleStream
       .boxed()                     // Stream<Double>
       .mapToLong(x -> 1L)          // LongStream
       .mapToObj(x -> "")           // Stream<String>
       ...

The primitive streams also provide methods for obtaining basic numeric statistics about the stream as a data structure. You can find the count, sum, min, max, and mean of the elements all from one terminal operation.

There are not primitive versions for the rest of the primitive types because it would have required an unacceptable amount of bloat in the JDK. IntStream, LongStream, and DoubleStream were deemed useful enough to include, and streams of other numeric primitives can represented using these three via widening primitive conversion.

One of the most confusing, intricate, and useful terminal stream operations is collect. It introduces a new interface called Collector. This interface is somewhat difficult to understand, but fortunately there is a Collectors utility class for generating all sorts of useful Collectors. For example:

List<String> strings = values.stream()
                             .filter(...)
                             .map(...)
                             .collect(Collectors.toList());

If you want to put your stream elements into a Collection, Map, or String, then Collectors probably has what you need. It’s definitely worthwhile to browse through the javadoc of that class.

Generic type inference improvements

Summary of proposal: JEP 101: Generalized Target-Type Inference

This was an effort to improve the ability of the compiler to determine generic types where it was previously unable to. There were many cases in previous versions of Java where the compiler could not figure out the generic types for a method in the context of nested or chained method invocations, even when it seemed “obvious” to the programmer. Those situations required the programmer to explicitly specify a “type witness”. It’s a feature of generics that surprisingly few Java programmers know about (I’m saying this based on personal interactions and reading StackOverflow questions). It looks like this:

// In Java 7:
foo(Utility.<Type>bar());
Utility.<Type>foo().bar();

Without the type witnesses, the compiler might fill in <Object> as the generic type, and the code would fail to compile if a more specific type was required instead.

Java 8 improves this situation tremendously. In many more cases, it can figure out a more specific generic type based on the context.

// In Java 8:
foo(Utility.bar());
Utility.foo().bar();

This one is still a work in progress, so I’m not sure how many of the examples listed in the proposal will actually be included for Java 8. Hopefully it’s all of them.

java.time

Package summary: java.time

The new date/time API in Java 8 is contained in the java.time package. If you’re familiar with Joda Time, it will be really easy to pick up. Actually, I think it’s so well-designed that even people who have never heard of Joda Time should find it easy to pick up.

Almost everything in the API is immutable, including the value types and the formatters. No more worrying about exposing Date fields or dealing with thread-local date formatters.

The intermingling with the legacy date/time API is minimal. It was a clean break:

The new API prefers enums over integer constants for things like months and days of the week.

So, what’s in it? The package-level javadocs do an excellent job of explaining the additional types. I’ll give a brief rundown of some noteworthy parts.

Extremely useful value types:

LocalDateTime, LocalDate, LocalTime
Year, Month, YearMonth, MonthDay, DayOfWeek (ok, maybe Year is a little less useful than the others)

Less useful value types:

Instant – similar to java.util.Date
ZonedDateTime, ZoneId – for when time zones are important
OffsetDateTime, OffsetTime, ZoneOffset – for dealing with offsets from UTC
Duration, Period – although if you’re trying to find the amount of time between two dates, you might be looking for ChronoUnit instead (see below)

Other useful types:

DateTimeFormatter – for converting datetime objects to strings
ChronoUnit – for figuring out the amount of time bewteen two points, e.g. ChronoUnit.DAYS.between(t1, t2)
TemporalAdjuster – e.g. date.with(TemporalAdjuster.firstDayOfMonth())

The new value types are, for the most part, supported by JDBC. There are minor exceptions, such as ZonedDateTime which has no counterpart in SQL.

Collections API additions

The fact that interfaces can define default methods allowed the JDK authors to make a large number of additions to the collection API interfaces. Default implementations for these are provided on all the core interfaces, and more efficient or well-behaved overridden implementations were added to all the concrete classes, where applicable.

Here’s a list of the new methods:

Also, Iterator.remove() now has a default, throwing implementation, which makes it slightly easier to define unmodifiable iterators.

Collection.stream() and Collection.parallelStream() are the main gateways into the stream API. There are other ways to generate streams, but those are going to be the most common by far.

The addition of List.sort(Comparator) is fantastic. Previously, the way to sort an ArrayList was this:

Collections.sort(list, comparator);

That code, which was your only option in Java 7, was frustratingly inefficient. It would dump the list into an array, sort the array, then use a ListIterator to insert the array contents into the list in new positions.

The default implementation of List.sort(Comparator) still does this, but concrete implementing classes are free to optimize. For instance, ArrayList.sort invokes Arrays.sort on the ArrayList’s internal array. CopyOnWriteArrayList does the same.

Performance isn’t the only potential gain from these new methods. They can have more desirable semantics, too. For instance, sorting a Collections.synchronizedList() is an atomic operation using list.sort. You can iterate over all its elements as an atomic operation using list.forEach. Previously this was not possible.

Map.computeIfAbsent makes working with multimap-like structures easier:

// Index strings by length:
Map<Integer, List<String>> map = new HashMap<>();
for (String s : strings) {
    map.computeIfAbsent(s.length(),
                        key -> new ArrayList<String>())
       .add(s);
}

// Although in this case the stream API may be a better choice:
Map<Integer, List<String>> map = strings.stream()
    .collect(Collectors.groupingBy(String::length));

Concurrency API additions

ForkJoinPool.commonPool()
ConcurrentHashMap(v8)
The following come with forms for parallel, sequentially, Object, int, long, and double. There are too many of these to link, so see the ConcurrentHashMap javadocs for more information.
- ConcurrentHashMap.reduce…
- ConcurrentHashMap.search…
- ConcurrentHashMap.forEach…
ConcurrentHashMap.newKeySet()
ConcurrentHashMap.newKeySet(int)
CompletableFuture
CompletionStage
StampedLock
LongAdder
LongAccumulator
DoubleAdder
DoubleAccumulator
CountedCompleter
Executors.newWorkStealingPool()
Executors.newWorkStealingPool(int)
The following come with forms for AtomicReference, AtomicInteger, AtomicLong, and the atomic array versions for each.

ForkJoinPool.commonPool() is the structure that handles all parallel stream operations. It is intended as an easy, good way to obtain a ForkJoinPool/ExecutorService/Executor when you need one.

ConcurrentHashMap<K, V> was completely rewritten. Internally it looks nothing like the version that was in Java 7. Externally it’s mostly the same, except it has a large number of bulk operation methods: many forms of reduce, search, and forEach.

ConcurrentHashMap.newKeySet() provides a concurrent java.util.Set implementation. It is essentially another way of writing Collections.newSetFromMap(new ConcurrentHashMap<T, Boolean>()).

StampedLock is a new lock implementation that can probably replace ReentrantReadWriteLock in most cases. It performs better than RRWL when used as a plain read-write lock. Is also provides an API for “optimistic reads”, where you obtain a weak, cheap version of a read lock, do the read operation, then check afterwards if your lock was invalidated by a write. There’s more detail about this class and its performance in a set of slides put together by Heinz Kabutz (starting about half-way through the set of slides): “Phaser and StampedLock Presentation”

CompletableFuture<T> is a nice implementation of the Future interface that provides a ton of methods for performing (and chaining together) asynchronous tasks. It relies on functional interfaces heavily; lambdas are a big reason this class was worth adding. If you are currently using Guava’s Future utilities, such as Futures, ListenableFuture, and SettableFuture, you may want to check out CompletableFuture as a potential replacement.

IO/NIO API additions

BufferedReader.lines()
Files.list(Path)
Files.walk(Path, int, FileVisitOption…)
Files.walk(Path, FileVisitOption…)
Files.find(Path, int, BiPredicate, FileVisitOption…)
Files.lines(Path, Charset)
DirectoryStream.stream()
UncheckedIOException – thrown when an IO error occurs, but since IOException isn’t allowed to be in the signature of Iterator/Stream, it has to be smuggled through with an unchecked exception.

Most of these additions give you ways to obtain java.util.stream.Stream from files and InputStreams. They’re a bit different from the streams you obtain from regular collections though. For one, they may throw UncheckedIOException. Also, they are instances of streams where using the stream.close() method is necessary. Streams implement AutoCloseable and can therefore be used in try-with-resources statements.
Streams also have an onClose(Runnable) intermediate operation that I didn’t list in the earlier section about streams. It allows you to attach handlers to a stream that execute when it is closed. Here is an example:

// Print the lines in a file, then "done"
try (Stream lines = Files.lines(path, UTF_8)) {
    lines.onClose(() -> System.out.println("done"))
	     .forEach(System.out::println);
}

Reflection and annotation changes

Annotations are allowed in more places, e.g. List<@Nullable String>. The biggest impact of this is likely to be for static analysis tools such as Sonar and FindBugs.

This JSR 308 website does a better job of explaining the motivation for these changes than I could possibly do: “Type Annotations (JSR 308) and the Checker Framework”

Nashorn JavaScript Engine

Summary of proposal: JEP 174: Nashorn JavaScript Engine

I did not experiment with Nashorn so I know very little beyond what’s described in the proposal above. Short version: It’s the successor to Rhino. Rhino is old and a little bit slow, and the developers decided they’d be better off starting from scratch.

Other miscellaneous additions to java.lang, java.util, and elsewhere

ThreadLocal.withInitial(Supplier)
String.join(CharSequence, Charsequence…)
String.join(CharSequence, Iterable)
Collections.unmodifiableNavigableSet(NavigableSet)
Collections.unmodifiableNavigableMap(NavigableMap)
Collections.synchronizedNavigableSet(NavigableSet)
Collections.synchronizedNavigableMap(NavigableMap)
Collections.checkedQueue(Queue, Class)
Collections.checkedNavigableSet(NavigableSet, Class)
Collections.checkedNavigableMap(NavigableMap, Class, Class)
Collections.emptySortedSet()
Collections.emptyNavigableSet()
Collections.emptySortedMap()
Collections.emptyNavigableMap()
SplittableRandom
Optional
OptionalInt
OptionalLong
OptionalDouble
Base64
StringJoiner
Spliterator
Spliterators
Comparator.naturalOrder()
Comparator.reverseOrder()
Comparator.nullsFirst(Comparator)
Comparator.nullsLast(Comparator)
Comparator.comparing(Function, Comparator)
Comparator.comparing(Function)
Comparator.comparingInt(ToIntFunction)
Comparator.comparingLong(ToLongFunction)
Comparator.comparingDouble(ToDoubleFunction)
Comparator.reversed()
Comparator.thenComparing(Comparator)
Comparator.thenComparing(Function, Comparator)
Comparator.thenComparing(Function)
Comparator.thenComparingInt(ToIntFunction)
Comparator.thenComparingLong(ToLongFunction)
Comparator.thenComparingDouble(ToDoubleFunction)
The following come in forms for array = T[], int[], long[], double[]. There are too many of these to link, so see the Arrays javadocs for more information.
- Arrays.spliterator(array)
- Arrays.spliterator(array, int, int)
- Arrays.stream(array)
- Arrays.stream(array, int, int);
- Arrays.setAll(array, IntFunction)
- Arrays.parallelSetAll(array, IntFunction)
- Arrays.parallelPrefix(array, BinaryOperator)
- Arrays.parallelPrefix(array, int, int, BinaryOperator)
Math.addExact(int, int)
Math.addExact(long, long)
Math.subtractExact(int, int)
Math.subtractExact(long, long)
Math.multiplyExact(int, int)
Math.multiplyExact(long, long)
Math.negateExact(int)
Math.negateExact(long)
Math.decrementExact(int)
Math.decrementExact(long)
Math.incrementExact(int)
Math.incrementExact(long)
Math.floorDiv(int, int)
Math.floorDiv(long, long)
Math.floorMod(int, int)
Math.floorMod(long, long)
Math.toIntExact(long)
Math.nextDown(float)
Math.nextDown(double)
Integer.min(int, int)
Integer.max(int, int)
Integer.sum(int, int)
Long.min(long, long)
Long.max(long, long)
Long.sum(long, long)
Double.min(double, double)
Double.max(double, double)
Double.sum(double, double)
Primitive.hashCode(primitive) (one new static method on each of the primitive wrapper types)
Boolean.logicalAnd(boolean, boolean)
Boolean.logicalOr(boolean, boolean)
Boolean.logicalXor(boolean, boolean)
Integer.toUnsignedLong(int)
Integer.toUnsignedString(int)
Integer.toUnsignedString(int, int)
Integer.parseUnsignedInt(String)
Integer.parseUnsignedInt(String, int)
Integer.compareUnsigned(int, int)
Long.toUnsignedString(long, int)
Long.toUnsignedString(long)
Long.parseUnsignedLong(String, int)
Long.parseUnsignedLong(String)
Long.compareUnsigned(long, long)
Long.divideUnsigned(long, long)
Long.remainderUnsigned(long, long)
BigInteger.longValueExact()
BigInteger.intValueExact()
BigInteger.shortValueExact()
BigInteger.byteValueExact()
Objects.isNull(Object) – useful as a predicate, e.g. stream.anyMatch(Objects::isNull)
Objects.nonNull(Object) – useful as a predicate, e.g. stream.filter(Objects::nonNull)
Objects.requireNonNull(T, Supplier)
Random.ints()
Random.longs()
Random.doubles()
SecureRandom.getStrongSecureRandom()
BitSet.stream()
IntSummaryStatistics
LongSummaryStatistics
DoubleSummaryStatistics
Pattern.splitAsStream(CharSequence)
Pattern.asPredicate()
Process.waitFor(long, TimeUnit)
Process.destroyForcibly()
Process.isAlive()
ZipFile.stream()
Adler32.update(ByteBuffer)
CRC32.update(ByteBuffer)
Miscellaneous additions to Logger
Miscellaneous additions to Locale
Miscellaneous additions to ResultSet

There is too much there to talk about, but I’ll pick out a few noteworthy items.

ThreadLocal.withInitial(Supplier<T>) makes declaring thread-local variables with initial values much nicer. Previously you would supply an initial value like this:

ThreadLocal<List<String>> strings =
    new ThreadLocal<List<String>>() {
        @Override
        protected List<String> initialValue() {
             return new ArrayList<>();
        }
    };

Now it’s like this:

ThreadLocal<List<String>> strings =
    ThreadLocal.withInital(ArrayList::new);

Optional<T> appears in the stream API as the return value for methods like min/max, findFirst/Any, and some forms of reduce. It’s used because there might not be any elements in the stream, and it provides a fluent API for handling the “some result” versus “no result” cases. You can provide a default value, throw an exception, or execute some action only if the result exists.

It’s very, very similar to Guava’s Optional class. It’s nothing at all like Option in Scala, nor is it trying to be, and the name similarity there is purely coincidental.

Aside: it’s interesting that Java 8’s Optional and Guava’s Optional ended up being so similar, despite the absurd amount of debate that occurred over its addition to both libraries.

“FYI…. Optional was the cause of possibly the single greatest conflagration on the internal Java libraries discussion lists ever.”

Kevin Bourrillion in response to “Some new Guava classes targeted for release 10”

“On a purely practical note, the discussions surrounding Optional have exceeded its design budget by several orders of magnitude.”

Brian Goetz in response to “Optional require(s) NonNull”

StringJoiner and String.join(...) are long, long overdue. They are so long overdue that the vast majority of Java developers likely have already written or have found utilities for joining strings, but it is nice for the JDK to finally provide this itself. Everyone has encountered situations where joining strings is required, and it is a Good Thing™ that we can now express that through a standard API that every Java developer (eventually) will know.

Comparator provides some very nice new methods for doing chained comparisons and field-based comparisons. For example:

people.sort(
    Comparator.comparing(Person::getLastName)
        .thenComparing(Person::getFirstName)
        .thenComparing(
            Person::getEmailAddress,
            Comparator.nullsLast(CASE_INSENSITIVE_ORDER)));

These additions provide good, readable shorthand for complex sorts. Many of the use cases served by Guava’s ComparisonChain and Ordering utility classes are now served by these JDK additions. And for what it’s worth, I think the JDK verions read better than the functionally-equivalent versions expressed in Guava-ese.

More?

There are lots of various small bug fixes and performance improvements that were not covered in this post. But they are appreciated too!

This post was intended to cover every single language-level and API-level change coming in Java 8. If any were missed, it was an error that should be corrected. Please let me know if you discover an omission.

Tag: java

EnumSet and EnumMap

What are they?

When should they be used?

Could the implementations be improved?

Could the APIs be improved?

Add immutable Enum{Set,Map} (maybe?)

Add factory methods

Add collectors

Be navigable? (No)

Conclusion

Efficient multiple-stream concatenation in Java

Stream.concat(a, b)

Reduce

Balance

Flatmap

Analysis

Overboard

Benchmark

Everything about Java 8

Interface improvements

Why can’t default methods override equals, hashCode, and toString?

Functional interfaces

Lambdas

Capturing versus non-capturing lambdas

What lambdas don’t do

Why abstract classes can’t be instantiated using a lambda

java.util.function

java.util.stream

Generic type inference improvements

java.time

Collections API additions

Concurrency API additions

IO/NIO API additions

Reflection and annotation changes

Nashorn JavaScript Engine

Other miscellaneous additions to java.lang, java.util, and elsewhere

More?

Add immutable `Enum{Set,Map}` (maybe?)