Blog

July 5, 2016

Mangling JSON numbers

If we have a long (64-bit integer) that we serialize into JSON, we might be in trouble if JavaScript consumes that JSON. JavaScript has the equivalent of double (64-bit floating point) for its numbers, and double cannot represent the same set of numbers as long. If we are not careful, our long is mangled in transit.

Consider 253 + 1. We can store that number in a long but not a double. Above 253, double does not have the bits required to represent every integer, creating gaps between the integers it can represent. 253 + 1 is the first integer to fall in one of these gaps. We can store 253 or 253 + 2 in a double, but 253 + 1 does not fit.

If we store 253 + 1 in a long and that number is meant to be precise, then we should avoid encoding it as a JSON number and sending it to a JavaScript client. The instant that client invokes JSON.parse they are doomed — they see a different number.

The JSON format does not mandate a particular number precision, but the application code on either side usually does. See also: Re: [Json] Limitations on number size?

This problem only occurs with very large numbers. Perhaps all the numbers we use are safe. Are we actually mangling our numbers? Probably not...

...but will we know? Will anything blow up, or will our application be silently, subtly wrong?

I suspect that when this problem does occur, it goes undetected for longer than it should. In the remainder of this article, we examine potential improvements to our handling of long.

Failing fast

We can change the way we serialize long into JSON.

When we encounter a long, we can require that the number fits into a double without losing information. If no information would be lost, we serialize the long as usual and move on. If information would be lost, we throw an exception and cause serialization to fail. We detonate immediately at the source of the error rather than letting it propagate around, doing who knows what.

Here is a utility method that can be used for this purpose:

public static void verifyLongFitsInDouble(long x) {
  double result = x;
  if (x != (long) result || x == Long.MAX_VALUE) {
    throw new IllegalArgumentException("Overflow: " + x);
  }
}

This approach appeals to me because it is unobtrusive. The check can be made in one central location, no changes to our view classes or client-side code are required, and it only throws exceptions in the specific cases where our default behavior is wrong.

A number that should be safe

Consider the number 262, which spelled out in base ten is 4611686018427387904. This number fits in both a long and a double. It passes our verifyLongFitsInDouble check. Theoretically we can send it from a Java server to a JavaScript client via JSON and both sides see exactly the same number.

To convince ourselves that this number is safe, we examine various representations of this number in Java and JavaScript:

// In Java
long x = 1L << 62;
System.out.println(Long.toString(x));    // 4611686018427387904
System.out.println(Double.toString(x));  // 4.6116860184273879E18
// 100000000000000000000000000000000000000000000000000000000000000
System.out.println(Long.toString(x, 2)); 
  
// In JavaScript
var x = Math.pow(2, 62);
console.log(x.toString());               // 4611686018427388000
console.log(x.toExponential());          // 4.611686018427388e+18
console.log(x.toFixed());                // 4611686018427387904
// 100000000000000000000000000000000000000000000000000000000000000
console.log(x.toString(2));

The output of x.toString() in JavaScript is suspicious. Do we really have the right number? We do, but we print it lazily.

x.toString() is similar in spirit to x.toExponential() and Double.toString(double) from Java. These algorithms essentially print significant digits, from most significant to least, until the output is unambiguously closer to this floating point number than any other floating point number. (And that is true here. The next lowest floating point number is 262 - 512, the next highest is 262 + 1024, and 4611686018427388000 is closer to 262 than either of those two nearby numbers.) See also: ES6 specification for ToString(Number)

x.toFixed() and the base two string give us more confidence that we have the correct number.

Verifying our assumptions with code

If 262 really is a safe number, we should be able to send it from the server to the client and back again. To verify that this number survives a round trip, we create an HTTP server with two kinds of endpoints:

  • GET endpoints that serialize a Java object into a JSON string like {"x":number}, where the number is a known constant (262). The number and the JSON string are printed to stdout. The response is that JSON string.
  • POST endpoints that deserialize a client-provided JSON string like {"x":number} into a Java object. The number and JSON string are printed to stdout. We hope that the number printed here is the same as the known constant (262) used in our GET endpoints.

Any server-side web framework or HTTP server will do. We happen to use JAX-RS in our example code.

Behavior may differ between JSON (de)serialization libraries, so we test two:

In total the server provides four endpoints, each named after the JSON serialization library used by that endpoint:

GET   /gson
POST  /gson
GET   /jackson
POST  /jackson

In the JavaScript client, we:

  • Loop through each library-specific pair of GET/POST endpoints.
  • Make a request to the GET endpoint.
  • Use JSON.parse to deserialize the response text (a JSON string) into a JavaScript object.
  • Use JSON.stringify to serialize that JavaScript object back into a JSON string.
  • Print each of the following to the console:
    • the incoming JSON string
    • the number contained in the JavaScript object, using x.toString()
    • the number contained in the JavaScript object, using x.toFixed()
    • the outgoing JSON string
  • Make a request to the POST endpoint, providing the (re)serialized JSON string as the request body.

Here is the server-side Java code:

package test;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.Gson;

import javax.ws.rs.Consumes;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import java.io.IOException;

@Path("/")
public final class JsonResource {

  public static final class Payload {
    public long x;
  }

  private static final long EXPECTED_NUMBER = 1L << 62;

  @GET
  @Path("gson")
  @Produces("application/json")
  public String getGson() {
    Payload object = new Payload();
    object.x = EXPECTED_NUMBER;
    String json = new Gson().toJson(object);
    System.out.println("GET   /gson     outgoing number:  "
        + object.x);
    System.out.println("GET   /gson     outgoing JSON:    " 
        + json);
    return json;
  }

  @POST
  @Path("gson")
  @Consumes("application/json")
  public void postGson(String json) {
    Payload object = new Gson().fromJson(json, Payload.class);
    System.out.println("POST  /gson     incoming JSON:    " 
        + json);
    System.out.println("POST  /gson     incoming number:  "
        + object.x);
  }

  @GET
  @Path("jackson")
  @Produces("application/json")
  public String getJackson() throws IOException {
    Payload object = new Payload();
    object.x = EXPECTED_NUMBER;
    String json = new ObjectMapper().writeValueAsString(object);
    System.out.println("GET   /jackson  outgoing number:  "
        + object.x);
    System.out.println("GET   /jackson  outgoing JSON:    "
        + json);
    return json;
  }

  @POST
  @Path("jackson")
  @Consumes("application/json")
  public void postJackson(String json) throws IOException {
    Payload object = new ObjectMapper().readValue(json, Payload.class);
    System.out.println("POST  /jackson  incoming JSON:    "
        + json);
    System.out.println("POST  /jackson  incoming number:  "
        + object.x);
  }
}

Here is the client-side JavaScript code:

[ "/gson", "/jackson" ].forEach(function(endpoint) {
  function handleResponse() {
    var incomingJson = this.responseText;
    var object = JSON.parse(incomingJson);
    var outgoingJson = JSON.stringify(object);
    console.log(endpoint + " incoming JSON: " + incomingJson);
    console.log(endpoint + " number toString: " + object.x);
    console.log(endpoint + " number toFixed: " + object.x.toFixed());
    console.log(endpoint + " outgoing JSON: " + outgoingJson);
    var post = new XMLHttpRequest();
    post.open("POST", endpoint);
    post.setRequestHeader("Content-Type", "application/json");
    post.send(outgoingJson);
  };
  var get = new XMLHttpRequest();
  get.addEventListener("load", handleResponse);
  get.open("GET", endpoint);
  get.send();
});

The results are disappointing

Here is the server-side output:

GET   /gson     outgoing number:  4611686018427387904
GET   /gson     outgoing JSON:    {"x":4611686018427387904}
POST  /gson     incoming JSON:    {"x":4611686018427388000}
POST  /gson     incoming number:  4611686018427388000
GET   /jackson  outgoing number:  4611686018427387904
GET   /jackson  outgoing JSON:    {"x":4611686018427387904}
POST  /jackson  incoming JSON:    {"x":4611686018427388000}
POST  /jackson  incoming number:  4611686018427388000

Here is the client-side output:

/gson incoming JSON: {"x":4611686018427387904}
/gson number toString: 4611686018427388000
/gson number toFixed: 4611686018427387904
/gson outgoing JSON: {"x":4611686018427388000}
/jackson incoming JSON: {"x":4611686018427387904}
/jackson number toString: 4611686018427388000
/jackson number toFixed: 4611686018427387904
/jackson outgoing JSON: {"x":4611686018427388000}

Both of our POST endpoints print the wrong number. Yuck!

We do send the correct number to JavaScript, which we can verify by looking at the output of x.toFixed() in the console. Something bad happens between when we print x.toFixed() and when we print the number out on the server.

Why is our code wrong?

Maybe there is a particular line of our own code where we can point our finger and say, "Aha! You are wrong!" Maybe it is an issue with our architecture.

There are many ways we could choose to address this problem (or not), and what follows is certainly not an exhaustive list.

“We call JSON.parse then JSON.stringify. We should echo back the original JSON string.”

This avoids the problem but is nothing like a real application. The test code is standing in for an application that gets the payload object from the server, uses it as an object throughout, then later/maybe makes a request back to the server containing some or all of the data from that object.

In practice, most applications will not even see the JSON.parse call. The call will be hidden. The front-end framework will do it, $.getJSON will do it, etc.

“We use JSON.stringify. We should write an alternative to JSON.stringify that produces an exact representation of our number.”

JSON.stringify delegates to x.toString(). If we never use JSON.stringify, and instead we use something like x.toFixed() to print numbers like this, we can avoid this problem.

This is probably infeasible in practice.

If we need to produce JSON from JavaScript, of course we expect that JSON.stringify will be involved. As with JSON.parse, most calls happen at a distance in a library rather than our own application code.

Besides, if we really plan to avoid x.toString(), we must do so everywhere. This is hopeless.

Suppose we commit to avoiding x.toString() and we have user objects that each have a numeric id field. We can no longer write Mustache or Handlebars templates like this:

<div id="user{{id}}">    {{! functionally wrong }}
  <p>ID: {{id}}</p>      {{! visually wrong }}
  <p>Name: {{name}}</p>
</div>

We can no longer write functions like this:

function updateEmailAddress(user, newEmail) {
  // Oops, we failed for user #2^62!
  var url = "/user/" + user.id + "/email";

  // Tries to update the wrong user (and fails, hopefully)
  $.post(url, { email: newEmail });
}

It is extremely unlikely that we will remember to avoid x.toString() everywhere. It is much more likely that we will forget and end up with incorrect behavior all over the place.

“We treat the number as a long literal in the POST handlers. We should treat the number as a double literal.”

If we parse the number as a double and cast it to a long, we produce the correct result in all test cases.

Such a cast should be guarded with a check similar to our verifyLongFitsInDouble(long) code from earlier. Here is a utility method that can be used for this purpose:

public static void verifyDoubleFitsInLong(double x) {
  long result = (long) x;
  if (Double.compare(x, result) != 0 || result == Long.MAX_VALUE) {
    throw new IllegalArgumentException("Overflow: " + x);
  }
}

What if the client really does mean to send us precisely the integer 4611686018427388000? If we parse it as a double then cast it to a long, we mangle the intended number!

Here it is worth considering who we actually talk to as we design our APIs. If we only talk to JavaScript clients, then we only receive numbers that fit in double because that is all our clients have. Often times these APIs are internal and the API authors are the same as the client code authors. It is reasonable in cases like that to make assumptions about who is calling us, even if technically some other caller could use our API, because we make no claim to support other callers.

If our API is designed to be public and usable by any client, we should document our behavior with respect to number precision. verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) are tricky to communicate, so we may prefer a simpler rule...

“We permit some values of long outside of the range -253 < x < 253. We should reject values outside of that range even when they fit in double.”

In other words, perform a bounds check on every long number that we (de)serialize. If the absolute value of that number is less than 253 then we (de)serialize that number as usual, otherwise we throw an exception.

JavaScript clients may find this range familiar, with built-in constants to express its bounds: Number.MIN_SAFE_INTEGER and Number.MAX_SAFE_INTEGER.

This approach is less permissive than our verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) utility methods from earlier. Those methods permit every number in this range and then more. Those methods permit numbers whose adjacent values are invalid, meaning the range of valid inputs is not contiguous.

Advantages of the less permissive approach include:

  • It is easier to express in documentation. verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) would permit 255 + 8 but not 255 + 4. Understanding the reason for that is more difficult than understanding that neither of those numbers are permitted with the |x| < 253 approach.
  • If we are actually serializing numbers like 255 + 8, it is likely that we are trying serialize nearby numbers that cannot be stored in double. Permitting the extra numbers may only mask the underlying problem: this data should not be serialized into JSON numbers.

“We encode a long as a JSON number. We should encode it as a JSON string.”

Encoding the number as a string avoids this problem.

Twitter provides string representations of its numeric ids for this reason.

This is easy to accomplish on the server. JSON serialization libraries provide a way to adopt this convention without changing the field types of our Java classes. Our Payload class keeps using long for its field, but any time the server serializes that field into JSON, it surrounds the numeric literal with quotation marks.

How viable is this approach for the client? If the number is only being used as an identifier—passed between functions as-is, compared using the === operator, used as a key in maps—then treating it as a string makes a lot of sense. If we are lucky, the client-side code is identical between the string-using and number-using versions.

If the number is used in arithmetic or passed to libraries that expect numbers, then this solution becomes less practical.

“We use JSON as the serialization format. We should use some other serialization format.”

The JSON format is not to blame for our problems, but it allows us to be sloppy.

When we use JSON we lose information about our numbers. We do not lose the values of the numbers, but we do lose the types, which tell us the precision.

A different serialization format such as Protobuf might have forced us to clarify how precise our numbers are.

“There is no problem.”

We could declare that there is no problem. Our code breaks when provided with obscenely large numbers as input, but we simply do not use numbers that large and we never will. And even though our numbers are never this large, we still want to use long in the Java code because that is convenient for us. Other Java libraries produce or consume long numbers, and we want to use those libraries without casting.

I suspect this is the solution that most people choose (conscious of that choice or not), and it is often not a bad solution. We really do not encounter this problem most of the time. There are other problems we could spend our time solving.

Numbers smaller in magnitude than 253 do not trigger this problem. Where are our long numbers coming from, and how likely are they to fall outside that range?

Auto-incrementing primary keys in a SQL database
Will we insert more than 9,007,199,254,740,992 rows into one table? Knowing nothing at all about our theoretical application, I will venture a guess: "No."
Epoch millisecond timestamps
253 milliseconds has us covered for ±300,000 years, roughly. Are we dealing with dates outside of that range? If we are, perhaps epoch milliseconds are a poor choice for units and we should solve that problem with our units first.
Randomly-generated, unbounded long numbers
The majority of these do not fit in double. If we send these to JavaScript via JSON numbers, we will have a bad time. Are we actually doing that?
User-provided, unbounded long numbers
Most of these numbers should not trigger problems, but some will. The solution may be to add bounds checking on input, filtering out misbehaving numbers before they are used.

No matter what solution (or non-solution) we choose, we should make our choice deliberately. Being oblivious is not the answer.

February 25, 2016

Framework Benchmarks Round 12

Round 12 of the ongoing Web Framework Benchmarks project is now available!

A race against the clock

Recently, we were notified that the physical hardware environment we have used for Rounds 9 through 12 will be decommissioned imminently. This news made Round 12 unusual: rather than wait until we can equip and configure a new environment, we decided to conclude Round 12 while the current environment remained available.

View Round 12 resultsAs a result, no previews of Round 12 were made available to the participants in the project. Pull requests that we would normally expect to see after a preview cycle will need to be processed for Round 12. So bear that in mind that participants were not able to sanity check the Round 12 results and submit fixes.

Furthermore, due to the modestly rushed nature (at least on our side) of Round 12, we elected to not capture Amazon EC2 results for this Round. The only data available for Round 12 is from the Peak dual Xeon E5 servers.

View the full results of Round 12.

We are now working to find and setup a new hardware environment for Rounds 13 and beyond.

Notable changes to Clojure tests

@yogthos noticed (in issue #1894) that the Compojure and http-kit test implementations were using def (value bound at compile time) instead of defn (value bound at runtime) for the JSON, single query, and fortunes tests. While the impact on the JSON test was likely minimal, this had a significant impact on the single query and fortunes tests because these implementations were not actually running a query for every request as expected. This change was unintentionally done in the Compojure test by TechEmpower staff, and then later copied to http-kit to keep the implementations in sync. We have corrected this error in Round 12.

Other notable changes

  1. The plain PHP, Slim, and Laravel tests have been upgraded to PHP 7. For example, Slim's performance in the JSON test and Laravel's performance in the Fortunes test both approximately doubled versus Round 11 with PHP 5.
  2. All JVM-hosted tests have been upgraded to Java 8.
  3. Several new frameworks were added.

Thanks!

As always, we thank all contributors to the project, especially in light of the rush to get Round 12 concluded!

We love this!

If you've not been watching the ASP.NET team's community standups, you have missed some surprisingly transparent, interesting, and oftentimes funny updates from a major web application framework development team. Every time I watch one I marvel, this is Microsoft? Clearly we're seeing a new Microsoft.

If you're not watching, you would have also missed how much emphasis the ASP.NET team is putting on performance.

Recently, they reached 1.15 million plaintext requests per second from ASP.NET Core within their test environment. As Ben Adams writes in his article detailing the achievement, that is 23 times better than prior to the start of optimization work within the framework!

Big congratulations are in order. Not only for the specific achievement, or for the continued performance tuning to come, but also for what it represents: a concerted effort to make the platform provide application developers as much performance headroom as possible.

As we discussed in our previous entry, a high performance platform/framework gives application developers the freedom to build more quickly by deferring performance tuning within the application's domain. We'll have more to say on that in the future.

For the time being, we wanted to join the celebration of the ASP.NET team and toot our own horn a bit. We're proud of this from our own point of view because the Framework Benchmarks project inspired Microsoft's team to focus on performance. They could have dismissed it as unimportant, but instead they saw the value and attacked performance with conviction.

We started the Framework Benchmarks project to collect data about performance. As the project has matured, we've realized it has a new, perhaps even more important reason for being: encouraging both application developers and framework creators to think more about performance. We have been absolutely floored by how many GitHub-hosted projects aim to join or improve their positioning within the project. Or even win outright, though just aiming for the high-performance tier is a reasonable goal for us mere mortals.

We deeply feel that competition of this sort is a good thing. It helps motivate performance improvement across the web framework ecosystem and that in turn improves performance of thousands of web applications. It makes us tremendously happy to see so many people striving to build the best performance into their frameworks and platforms.

So congratulations to the ASP.NET team for giving performance such attention. And to everyone else doing the same in their respective frameworks. And thank you to everyone who has and continues to participate in the benchmarks project. May all your requests be fulfilled in mere milliseconds!