Mangling JSON numbers

July 5, 2016

Alan Laser

If we have a long (64-bit integer) that we serialize into JSON, we might be in trouble if JavaScript consumes that JSON. JavaScript has the equivalent of double (64-bit floating point) for its numbers, and double cannot represent the same set of numbers as long. If we are not careful, our long is mangled in transit.

Consider 2⁵³ + 1. We can store that number in a long but not a double. Above 2⁵³, double does not have the bits required to represent every integer, creating gaps between the integers it can represent. 2⁵³ + 1 is the first integer to fall in one of these gaps. We can store 2⁵³ or 2⁵³ + 2 in a double, but 2⁵³ + 1 does not fit.

If we store 2⁵³ + 1 in a long and that number is meant to be precise, then we should avoid encoding it as a JSON number and sending it to a JavaScript client. The instant that client invokes JSON.parse they are doomed — they see a different number.

The JSON format does not mandate a particular number precision, but the application code on either side usually does. See also: Re: [Json] Limitations on number size?

This problem only occurs with very large numbers. Perhaps all the numbers we use are safe. Are we actually mangling our numbers? Probably not…

…but will we know? Will anything blow up, or will our application be silently, subtly wrong?

I suspect that when this problem does occur, it goes undetected for longer than it should. In the remainder of this article, we examine potential improvements to our handling of long.

Failing fast

We can change the way we serialize long into JSON.

When we encounter a long, we can require that the number fits into a double without losing information. If no information would be lost, we serialize the long as usual and move on. If information would be lost, we throw an exception and cause serialization to fail. We detonate immediately at the source of the error rather than letting it propagate around, doing who knows what.

Here is a utility method that can be used for this purpose:


public static void verifyLongFitsInDouble(long x) {
  double result = x;
  if (x != (long) result || x == Long.MAX_VALUE) {
    throw new IllegalArgumentException("Overflow: " + x);
  }
}

This approach appeals to me because it is unobtrusive. The check can be made in one central location, no changes to our view classes or client-side code are required, and it only throws exceptions in the specific cases where our default behavior is wrong.

A number that should be safe

Consider the number 2⁶², which spelled out in base ten is 4611686018427387904. This number fits in both a long and a double. It passes our verifyLongFitsInDouble check. Theoretically we can send it from a Java server to a JavaScript client via JSON and both sides see exactly the same number.

To convince ourselves that this number is safe, we examine various representations of this number in Java and JavaScript:


// In Java
long x = 1L << 62;
System.out.println(Long.toString(x));    // 4611686018427387904
System.out.println(Double.toString(x));  // 4.6116860184273879E18
// 100000000000000000000000000000000000000000000000000000000000000
System.out.println(Long.toString(x, 2));

  
// In JavaScript
var x = Math.pow(2, 62);
console.log(x.toString());               // 4611686018427388000
console.log(x.toExponential());          // 4.611686018427388e+18
console.log(x.toFixed());                // 4611686018427387904
// 100000000000000000000000000000000000000000000000000000000000000
console.log(x.toString(2));

The output of x.toString() in JavaScript is suspicious. Do we really have the right number? We do, but we print it lazily.

x.toString() is similar in spirit to x.toExponential() and Double.toString(double) from Java. These algorithms essentially print significant digits, from most significant to least, until the output is unambiguously closer to this floating point number than any other floating point number. (And that is true here. The next lowest floating point number is 2⁶² - 512, the next highest is 2⁶² + 1024, and 4611686018427388000 is closer to 2⁶² than either of those two nearby numbers.) See also: ES6 specification for ToString(Number)

x.toFixed() and the base two string give us more confidence that we have the correct number.

Verifying our assumptions with code

If 2⁶² really is a safe number, we should be able to send it from the server to the client and back again. To verify that this number survives a round trip, we create an HTTP server with two kinds of endpoints:

GET endpoints that serialize a Java object into a JSON string like {"x":number}, where the number is a known constant (2⁶²). The number and the JSON string are printed to stdout. The response is that JSON string.
POST endpoints that deserialize a client-provided JSON string like {"x":number} into a Java object. The number and JSON string are printed to stdout. We hope that the number printed here is the same as the known constant (2⁶²) used in our GET endpoints.

Any server-side web framework or HTTP server will do. We happen to use JAX-RS in our example code.

Behavior may differ between JSON (de)serialization libraries, so we test two:

Gson
Jackson

In total the server provides four endpoints, each named after the JSON serialization library used by that endpoint:


GET   /gson
POST  /gson
GET   /jackson
POST  /jackson

In the JavaScript client, we:

Loop through each library-specific pair of GET/POST endpoints.
Make a request to the GET endpoint.
Use JSON.parse to deserialize the response text (a JSON string) into a JavaScript object.
Use JSON.stringify to serialize that JavaScript object back into a JSON string.
Print each of the following to the console:
- the incoming JSON string
- the number contained in the JavaScript object, using x.toString()
- the number contained in the JavaScript object, using x.toFixed()
- the outgoing JSON string
Make a request to the POST endpoint, providing the (re)serialized JSON string as the request body.

Here is the server-side Java code:


package test;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.Gson;

import javax.ws.rs.Consumes;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import java.io.IOException;

@Path("/")
public final class JsonResource {

  public static final class Payload {
    public long x;
  }

  private static final long EXPECTED_NUMBER = 1L << 62;

  @GET
  @Path("gson")
  @Produces("application/json")
  public String getGson() {
    Payload object = new Payload();
    object.x = EXPECTED_NUMBER;
    String json = new Gson().toJson(object);
    System.out.println("GET   /gson     outgoing number:  "
        + object.x);
    System.out.println("GET   /gson     outgoing JSON:    " 
        + json);
    return json;
  }

  @POST
  @Path("gson")
  @Consumes("application/json")
  public void postGson(String json) {
    Payload object = new Gson().fromJson(json, Payload.class);
    System.out.println("POST  /gson     incoming JSON:    " 
        + json);
    System.out.println("POST  /gson     incoming number:  "
        + object.x);
  }

  @GET
  @Path("jackson")
  @Produces("application/json")
  public String getJackson() throws IOException {
    Payload object = new Payload();
    object.x = EXPECTED_NUMBER;
    String json = new ObjectMapper().writeValueAsString(object);
    System.out.println("GET   /jackson  outgoing number:  "
        + object.x);
    System.out.println("GET   /jackson  outgoing JSON:    "
        + json);
    return json;
  }

  @POST
  @Path("jackson")
  @Consumes("application/json")
  public void postJackson(String json) throws IOException {
    Payload object = new ObjectMapper().readValue(json, Payload.class);
    System.out.println("POST  /jackson  incoming JSON:    "
        + json);
    System.out.println("POST  /jackson  incoming number:  "
        + object.x);
  }
}

Here is the client-side JavaScript code:


[ "/gson", "/jackson" ].forEach(function(endpoint) {
  function handleResponse() {
    var incomingJson = this.responseText;
    var object = JSON.parse(incomingJson);
    var outgoingJson = JSON.stringify(object);
    console.log(endpoint + " incoming JSON: " + incomingJson);
    console.log(endpoint + " number toString: " + object.x);
    console.log(endpoint + " number toFixed: " + object.x.toFixed());
    console.log(endpoint + " outgoing JSON: " + outgoingJson);
    var post = new XMLHttpRequest();
    post.open("POST", endpoint);
    post.setRequestHeader("Content-Type", "application/json");
    post.send(outgoingJson);
  };
  var get = new XMLHttpRequest();
  get.addEventListener("load", handleResponse);
  get.open("GET", endpoint);
  get.send();
});

The results are disappointing

Here is the server-side output:


GET   /gson     outgoing number:  4611686018427387904
GET   /gson     outgoing JSON:    {"x":4611686018427387904}
POST  /gson     incoming JSON:    {"x":4611686018427388000}
POST  /gson     incoming number:  4611686018427388000
GET   /jackson  outgoing number:  4611686018427387904
GET   /jackson  outgoing JSON:    {"x":4611686018427387904}
POST  /jackson  incoming JSON:    {"x":4611686018427388000}
POST  /jackson  incoming number:  4611686018427388000

Here is the client-side output:


/gson incoming JSON: {"x":4611686018427387904}
/gson number toString: 4611686018427388000
/gson number toFixed: 4611686018427387904
/gson outgoing JSON: {"x":4611686018427388000}
/jackson incoming JSON: {"x":4611686018427387904}
/jackson number toString: 4611686018427388000
/jackson number toFixed: 4611686018427387904
/jackson outgoing JSON: {"x":4611686018427388000}

Both of our POST endpoints print the wrong number. Yuck!

We do send the correct number to JavaScript, which we can verify by looking at the output of x.toFixed() in the console. Something bad happens between when we print x.toFixed() and when we print the number out on the server.

Why is our code wrong?

Maybe there is a particular line of our own code where we can point our finger and say, “Aha! You are wrong!” Maybe it is an issue with our architecture.

There are many ways we could choose to address this problem (or not), and what follows is certainly not an exhaustive list.

“We call `JSON.parse` then `JSON.stringify`. We should echo back the original JSON string.”

This avoids the problem but is nothing like a real application. The test code is standing in for an application that gets the payload object from the server, uses it as an object throughout, then later/maybe makes a request back to the server containing some or all of the data from that object.

In practice, most applications will not even see the JSON.parse call. The call will be hidden. The front-end framework will do it, $.getJSON will do it, etc.

“We use `JSON.stringify`. We should write an alternative to `JSON.stringify` that produces an exact representation of our number.”

JSON.stringify delegates to x.toString(). If we never use JSON.stringify, and instead we use something like x.toFixed() to print numbers like this, we can avoid this problem.

This is probably infeasible in practice.

If we need to produce JSON from JavaScript, of course we expect that JSON.stringify will be involved. As with JSON.parse, most calls happen at a distance in a library rather than our own application code.

Besides, if we really plan to avoid x.toString(), we must do so everywhere. This is hopeless.

Suppose we commit to avoiding x.toString() and we have user objects that each have a numeric id field. We can no longer write Mustache or Handlebars templates like this:


<div id="user{{id}}">    {{! functionally wrong }}
  <p>ID: {{id}}</p>      {{! visually wrong }}
  <p>Name: {{name}}</p>
</div>

We can no longer write functions like this:


function updateEmailAddress(user, newEmail) {
  // Oops, we failed for user #2^62!
  var url = "/user/" + user.id + "/email";

  // Tries to update the wrong user (and fails, hopefully)
  $.post(url, { email: newEmail });
}

It is extremely unlikely that we will remember to avoid x.toString() everywhere. It is much more likely that we will forget and end up with incorrect behavior all over the place.

“We treat the number as a `long` literal in the POST handlers. We should treat the number as a `double` literal.”

If we parse the number as a double and cast it to a long, we produce the correct result in all test cases.

Such a cast should be guarded with a check similar to our verifyLongFitsInDouble(long) code from earlier. Here is a utility method that can be used for this purpose:


public static void verifyDoubleFitsInLong(double x) {
  long result = (long) x;
  if (Double.compare(x, result) != 0 || result == Long.MAX_VALUE) {
    throw new IllegalArgumentException("Overflow: " + x);
  }
}

What if the client really does mean to send us precisely the integer 4611686018427388000? If we parse it as a double then cast it to a long, we mangle the intended number!

Here it is worth considering who we actually talk to as we design our APIs. If we only talk to JavaScript clients, then we only receive numbers that fit in double because that is all our clients have. Often times these APIs are internal and the API authors are the same as the client code authors. It is reasonable in cases like that to make assumptions about who is calling us, even if technically some other caller could use our API, because we make no claim to support other callers.

If our API is designed to be public and usable by any client, we should document our behavior with respect to number precision. verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) are tricky to communicate, so we may prefer a simpler rule…

“We permit some values of `long` outside of the range `-2⁵³ < x < 2⁵³`. We should reject values outside of that range even when they fit in `double`.”

In other words, perform a bounds check on every long number that we (de)serialize. If the absolute value of that number is less than 2⁵³ then we (de)serialize that number as usual, otherwise we throw an exception.

JavaScript clients may find this range familiar, with built-in constants to express its bounds: Number.MIN_SAFE_INTEGER and Number.MAX_SAFE_INTEGER.

This approach is less permissive than our verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) utility methods from earlier. Those methods permit every number in this range and then more. Those methods permit numbers whose adjacent values are invalid, meaning the range of valid inputs is not contiguous.

Advantages of the less permissive approach include:

It is easier to express in documentation. verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) would permit 2⁵⁵ + 8 but not 2⁵⁵ + 4. Understanding the reason for that is more difficult than understanding that neither of those numbers are permitted with the |x| < 2⁵³ approach.
If we are actually serializing numbers like 2⁵⁵ + 8, it is likely that we are trying serialize nearby numbers that cannot be stored in double. Permitting the extra numbers may only mask the underlying problem: this data should not be serialized into JSON numbers.

“We encode a `long` as a JSON number. We should encode it as a JSON string.”

Encoding the number as a string avoids this problem.

Twitter provides string representations of its numeric ids for this reason.

This is easy to accomplish on the server. JSON serialization libraries provide a way to adopt this convention without changing the field types of our Java classes. Our Payload class keeps using long for its field, but any time the server serializes that field into JSON, it surrounds the numeric literal with quotation marks.

How viable is this approach for the client? If the number is only being used as an identifier—passed between functions as-is, compared using the === operator, used as a key in maps—then treating it as a string makes a lot of sense. If we are lucky, the client-side code is identical between the string-using and number-using versions.

If the number is used in arithmetic or passed to libraries that expect numbers, then this solution becomes less practical.

“We use JSON as the serialization format. We should use some other serialization format.”

The JSON format is not to blame for our problems, but it allows us to be sloppy.

When we use JSON we lose information about our numbers. We do not lose the values of the numbers, but we do lose the types, which tell us the precision.

A different serialization format such as Protobuf might have forced us to clarify how precise our numbers are.

“There is no problem.”

We could declare that there is no problem. Our code breaks when provided with obscenely large numbers as input, but we simply do not use numbers that large and we never will. And even though our numbers are never this large, we still want to use long in the Java code because that is convenient for us. Other Java libraries produce or consume long numbers, and we want to use those libraries without casting.

I suspect this is the solution that most people choose (conscious of that choice or not), and it is often not a bad solution. We really do not encounter this problem most of the time. There are other problems we could spend our time solving.

Numbers smaller in magnitude than 2⁵³ do not trigger this problem. Where are our long numbers coming from, and how likely are they to fall outside that range?

Auto-incrementing primary keys in a SQL database: Will we insert more than 9,007,199,254,740,992 rows into one table? Knowing nothing at all about our theoretical application, I will venture a guess: “No.”
Epoch millisecond timestamps: 2⁵³ milliseconds has us covered for ±300,000 years, roughly. Are we dealing with dates outside of that range? If we are, perhaps epoch milliseconds are a poor choice for units and we should solve that problem with our units first.
Randomly-generated, unbounded long numbers: The majority of these do not fit in double. If we send these to JavaScript via JSON numbers, we will have a bad time. Are we actually doing that?
User-provided, unbounded long numbers: Most of these numbers should not trigger problems, but some will. The solution may be to add bounds checking on input, filtering out misbehaving numbers before they are used.

No matter what solution (or non-solution) we choose, we should make our choice deliberately. Being oblivious is not the answer.

Tags javascript,json