Storage Worries

January 8, 2013

Alan Laser

In the 1980s, high-tech companies stored information about their customers on their sophisticated and high-cost computer equipment. Back then such practices were exceptional except at relatively large companies. Thirty years later, it’s so commonplace that there are numerous services to store customer data for you “in the cloud.”

A layperson would be excused to think that in 2012, questions about how to store data on computers have been worked out. The established options have indeed remained consistent for years, decades. The commonplace language for structuring, storing, finding, and fetching data-SQL-has been with us a very long time. So long, in fact, that layers of conventional thinking have been built up. More recently, a healthy desire to shake off that conventional thinking and re-imagine data storage has emerged.

Data storage is enjoying a deserved renaissance thanks to the research and hands-on work of a number of people who together are informally known as the NoSQL movement.

The good stuff

The objectives of each NoSQL implementation vary, but generally, the appeal to developers (like us) comes from these common advantages versus traditional SQL:

Thank goodness, there is no SQL language to deal with. The APIs are purpose-built for modern notions of structure, store, find, and retrieve. That usually means one fewer layer of translation between stored data and live in-memory data.
Clients and servers generally communicate over human-readable protocols like HTTP. This makes us happy because we know this protocol and we don’t know old database protocols like TDS.
As our server(s) reach their request-processing limits, we can theoretically add more with relatively little pain.
Data is automatically duplicated according to tunable rules so that we don’t panic (severely) when a server blows up.
Some implementations can distribute complex queries to multiple servers automatically.

These advantages need to be evaluated in context. For us, the context is building technology solutions that meet business objectives for our clients. From that perspective, they reduce into:

A fresh approach to integration with applications, which in some cases may decrease programmer effort. Decreased effort translates into lower implementation costs.
Less cumbersome resolution to future scale events.

Our clients rightfully don’t care how hip using a NoSQL server makes us feel. They don’t care precisely how NoSQL may reduce the pain of future scaling, but knowing that future scale is somewhat less painful does give them some comfort.

The not-so-good stuff

This context sheds some light on disadvantages that are often downplayed when building an application in-house. As mentioned earlier, internal teams are afforded the luxury (whether or not their bosses would necessarily agree) to adopt new technologies with less consideration of risk.

Higher than average risk-aversion and a horizon of hand-off to an internal team means we consider the following:

There are few standards (yet) with NoSQL.
Community popularity is slowly coalescing but volatility remains.
Finding the right team in the future may be difficult for our client.
Reality is that most clients don’t have a reasonable expectation of scale that suggest NoSQL.
Real-time performance characteristics may be an issue.

Let’s start with scale. The lure of smooth, no-pain scaling will appeal to anyone who has been through efforts to scale traditional databases. Talking about scale is important even early on in a project’s lifecycle. However, the usage level where scaling a traditional database server by “throwing better hardware at it” stops being practical is quite high. A single modern high-performance server can process a tremendous amount of user data, at least from the point of view of a start-up company.

Favoring ease of scalability in selection of a data storage platform makes sense if the scale plans are real. But if scale plans are imaginary, hopeful, ambitious, or wishful thinking, ease of scalability is not as important. Everyone wants to believe they will be the next Facebook. But more likely you’ll be the next site with ten thousand users working really hard to get to your first one hundred thousand.

In that 10,000 users to 100,000 users bracket, many applications’ entire data set fits in system memory. Performance is going to be reasonably quick (at least in terms of basic get and put operations) with any kind of database.

How about ease of development? Working with NoSQL databases can be more “fluent” because the native APIs of NoSQL discard database legacy and focus on basic verbs like put and get. As a result, developer efficiency may be slightly improved with a NoSQL platform.

Traditional relational databases are encumbered by an impedance mismatch between in-memory objects and relational data entities. This mismatch lead to object-relational mapping tools (ORMs) and subsequent debates between fans and critics of ORMs. However, lightweight ORMs reduce the most commonplace database operations into interfaces as fluent as those offered by NoSQL.

Ultimately, it’s essentially a wash in terms of comparing the level of effort. NoSQL may enjoy a small advantage in the form of developer happiness: most developers innately like working with new, cool things.

Where do we land?

Clearly, the decision is predicated on the specifics of the application. Systems with pre-existing large scale are generally well-fitted to NoSQL. Systems with anticipated but undefined analytical needs are generally better served by SQL. Often we select traditional SQL databases because we want clients to be well-positioned to deal with unknowns and traditional databases are not performance slouches.

In other words, with fairly vanilla application requirements and scale targets, a traditional SQL database avoids some degree of risk, and risk moderation is compelling even in small doses.

If and when the client wants to build an internal team, it will be easier to find developers with the necessary experience.

Traditional platforms are more stable. For example, although the risk of being abandoned by a popular NoSQL platform are low, there is virtually no risk of MySQL, Postgres, or Microsoft SQL Server being outright abandoned in the next decade. That sort of huge horizon is unnecessary, but still comforting.

Finally, scaling a traditional database may be slightly more difficult than a NoSQL option, but not sufficiently to be a factor unless scale-growth concerns are well justified.