Blog

You are viewing entries for January, 2013 and may want to check out the most recent entries.

January 8, 2013

Storage worries

In the 1980s, high-tech companies stored information about their customers on their sophisticated and high-cost computer equipment. Back then such practices were exceptional except at relatively large companies. Thirty years later, it's so commonplace that there are numerous services to store customer data for you "in the cloud."

A layperson would be excused to think that in 2012, questions about how to store data on computers have been worked out. The established options have indeed remained consistent for years, decades. The commonplace language for structuring, storing, finding, and fetching data-SQL-has been with us a very long time. So long, in fact, that layers of conventional thinking have been built up. More recently, a healthy desire to shake off that conventional thinking and re-imagine data storage has emerged.

Data storage is enjoying a deserved renaissance thanks to the research and hands-on work of a number of people who together are informally known as the NoSQL movement.

The good stuff

The objectives of each NoSQL implementation vary, but generally, the appeal to developers (like us) comes from these common advantages versus traditional SQL:

  • Thank goodness, there is no SQL language to deal with. The APIs are purpose-built for modern notions of structure, store, find, and retrieve. That usually means one fewer layer of translation between stored data and live in-memory data.
  • Clients and servers generally communicate over human-readable protocols like HTTP. This makes us happy because we know this protocol and we don't know old database protocols like TDS.
  • As our server(s) reach their request-processing limits, we can theoretically add more with relatively little pain.
  • Data is automatically duplicated according to tunable rules so that we don't panic (severely) when a server blows up.
  • Some implementations can distribute complex queries to multiple servers automatically.

These advantages need to be evaluated in context. For us, the context is building technology solutions that meet business objectives for our clients. From that perspective, they reduce into:

  • A fresh approach to integration with applications, which in some cases may decrease programmer effort. Decreased effort translates into lower implementation costs.
  • Less cumbersome resolution to future scale events.

Our clients rightfully don't care how hip using a NoSQL server makes us feel. They don't care precisely how NoSQL may reduce the pain of future scaling, but knowing that future scale is somewhat less painful does give them some comfort.

The not-so-good stuff

This context sheds some light on disadvantages that are often downplayed when building an application in-house. As mentioned earlier, internal teams are afforded the luxury (whether or not their bosses would necessarily agree) to adopt new technologies with less consideration of risk.

Higher than average risk-aversion and a horizon of hand-off to an internal team means we consider the following:

  • There are few standards (yet) with NoSQL.
  • Community popularity is slowly coalescing but volatility remains.
  • Finding the right team in the future may be difficult for our client.
  • Reality is that most clients don't have a reasonable expectation of scale that suggest NoSQL.
  • Real-time performance characteristics may be an issue.

Let's start with scale. The lure of smooth, no-pain scaling will appeal to anyone who has been through efforts to scale traditional databases. Talking about scale is important even early on in a project's lifecycle. However, the usage level where scaling a traditional database server by "throwing better hardware at it" stops being practical is quite high. A single modern high-performance server can process a tremendous amount of user data, at least from the point of view of a start-up company.

Favoring ease of scalability in selection of a data storage platform makes sense if the scale plans are real. But if scale plans are imaginary, hopeful, ambitious, or wishful thinking, ease of scalability is not as important. Everyone wants to believe they will be the next Facebook. But more likely you'll be the next site with ten thousand users working really hard to get to your first one hundred thousand.

In that 10,000 users to 100,000 users bracket, many applications' entire data set fits in system memory. Performance is going to be reasonably quick (at least in terms of basic get and put operations) with any kind of database.

How about ease of development? Working with NoSQL databases can be more "fluent" because the native APIs of NoSQL discard database legacy and focus on basic verbs like put and get. As a result, developer efficiency may be slightly improved with a NoSQL platform.

Traditional relational databases are encumbered by an impedance mismatch between in-memory objects and relational data entities. This mismatch lead to object-relational mapping tools (ORMs) and subsequent debates between fans and critics of ORMs. However, lightweight ORMs reduce the most commonplace database operations into interfaces as fluent as those offered by NoSQL.

Ultimately, it's essentially a wash in terms of comparing the level of effort. NoSQL may enjoy a small advantage in the form of developer happiness: most developers innately like working with new, cool things.

Where do we land?

Clearly, the decision is predicated on the specifics of the application. Systems with pre-existing large scale are generally well-fitted to NoSQL. Systems with anticipated but undefined analytical needs are generally better served by SQL. Often we select traditional SQL databases because we want clients to be well-positioned to deal with unknowns and traditional databases are not performance slouches.

In other words, with fairly vanilla application requirements and scale targets, a traditional SQL database avoids some degree of risk, and risk moderation is compelling even in small doses.

If and when the client wants to build an internal team, it will be easier to find developers with the necessary experience.

Traditional platforms are more stable. For example, although the risk of being abandoned by a popular NoSQL platform are low, there is virtually no risk of MySQL, Postgres, or Microsoft SQL Server being outright abandoned in the next decade. That sort of huge horizon is unnecessary, but still comforting.

Finally, scaling a traditional database may be slightly more difficult than a NoSQL option, but not sufficiently to be a factor unless scale-growth concerns are well justified.

January 7, 2013

Pragmatism with Flavor

We build web and mobile applications, and we've been doing so for a little over fifteen years. To our delight, technology has evolved and improved in many ways over that time.

Technology evolution can be cyclical, with some branches looping back to the past with surprising vigor and without much self-awareness. Platforms eschew threading and then re-invent threading anew. Client-side code is passé and then the client becomes the preferred runtime environment.

As software developers, we're technophiles, so we enjoy these cycles and quite often find humor listening to the energy spent arguing on either side of issues. It's a fun and funny business to be in.

Through our fifteen years, we've been charged with helping clients get business done with technology. Focusing on the metrics that ultimately drive the business, often the bottom line, has provided a reality check, a counter-weight to the desire to consume all new technologies. Over time, we've imbibed that reality check serum, and now it's part of our instinctual technical point of view.

We love technology, but...

Sure we still get excited and animated about a new JavaScript library, a new CSS trick, a new device, or a new data store. We've learned to think about how this really fits with the business needs of our clients. Is putting this hot new technology to use on a client project for our technical interest or is it for their business interest? Does it help our clients sell more widgets or deliver with fewer headaches? Or do we want to play with this new toy simply because playing with new stuff is fun?

Not to downplay developer happiness. It's important to us. But a newly minted developer given access to Hacker News can develop ADD in a week's time, waste countless hours reading the opinions of professional and amateur technology opinion-makers, and nervously fidget about selecting the hippest framework and libraries to (briefly) avoid the scorn of hacker pop culture.

Anyone who has been around the web development block is going to tell you that they select technologies with some balance between coolness and pragmatism. The web development continuum has sound, safe, but stodgy on one side and shiny-obsession at the risk of unknown stability on the other. Few developers will find themselves at an extreme or precisely in the middle, and we're no exception.

Our business model nudges us slightly toward the sound, safe (stodgy?) side. We work with clients to get them off the ground with a platform to build their business. As the technical partner, we want to be real partners in the business-making sure the technology helps achieve the business objectives. Incurring additional risk based on jumping on the latest and greatest technology is most often not the best route. Our clients want us to make it safe to innovate. They have enough worries.

Skewing us toward pragmatism is a combination of factors:

  • Our clients are not about technology; they are about running their business using technology. So even if that business doesn't exist as anything but a web site, we want them to avoid unnecessary technology risks.
  • Although we spend plenty of time researching and developing with new technologies based on desire and need, we don't want our clients to be guinea pigs (unless that's the nature of their business, of course).
  • We want our clients' business to be supported by technology that is fast, lightweight, and pain-free to administrate.

Easy stuff should be easy

We cut our teeth building eharmony.com back when a server with 128MB of memory was a monster. This gives us a point of view about servers and modern web application development that isn't terribly common these days: each server in your cluster should be able to process a significant amount of user load. On modern hardware, a trivial dynamic request should run as close to 0 milliseconds of server-side processing as possible; and it is unacceptable for a request to take 200+ milliseconds unless it's doing some heavy lifting.

After all, napkin math tells us that if requests take 200 milliseconds, only five requests per second would saturate a processor core. Those requests had better be worth it!

A site with a million users is much easier to manage today than it was in 2000. Today, a million users should not require dozens of servers unless the business is by its nature compute-heavy.

As with premature optimization, premature scaling is probably unnecessary and may be harmful. Most importantly, you don't necessarily know if you're scaling in the right direction. On the other hand it's silly to dogmatically avoid simple patterns that reap the easiest initial salve for scale: processing requests quickly.

Imagine your prototyping is over, and you're building a production system. You're in the thick of it and about to write a small chunk of server-side code. You estimate it will take 5 minutes to implement a clean but low-performance implementation; 7 minutes to implement clean and quickly-executing code; and 8 hours to do it optimally.

Which do you choose? We generally go for the 7-minute option. Perhaps it seems a no-brainer in this context, but you'd be surprised how many people choose the 5-minute option because it appears to save money. Yes, making the same decision several hundreds of times throughout a project adds up to measurable additional time. But our experience is that routinely selecting the "7-minute" option pays off in the long run.

Caveat: with web application development, your foundation technology choices determine if you're going with the "5-minute" approach to everything or the "7-minute" approach to everything. You can't mix and match.

With respect to software testing, most developers acknowledge that finding and resolving a defect before integration tests saves time and effort. It takes a bit more time for each developer, but it saves the effort of reporting, tracking, and dealing with bugs that leak beyond the developer's sieve. Curiously, the same sense doesn't necessarily apply to tuning and platform selection.

For production systems, we contend it's often better to select a platform that saves you money in the long run even if it means a few more dollars spent up-front.

Think of the money!

Money plays a big part in this. Allow me to be brutally clear: the more our clients' budgets go to Amazon or Rackspace, the less budget goes to continued development time. Development time is how we earn money. Besides, more development time means the client can be more agile with functionality and has a greater potential for success. So it's not an entirely selfish position.

Hosting is a necessary cost, but if you find yourself spinning up a second server when you have only a few hundred users, you might ask yourself why a CPU capable of processing billions of operations per second is brought to its knees by a few users submitting forms, placing orders, and communicating with one another.

Amazon is more than happy to accommodate you if you're not interested in asking this question. Jeff Bezos loves web application inefficiency.

Mixing technologies for a project can feel like concocting the right blend of chemicals. Some work well together and other clash. Project needs vary, of course, so technology chemistries vary in turn. Barring important indicators, our default assumptions steer us toward a mixture that aims to provide breathing room for scale together with a platform for reasonably quick evolution by the development team.

However, a severe budget limitation may tip toward a platform with ready-to-use open source components that can be leveraged as-is or with minimal tweaks. A need to on-board an in-house development team composed of recent college graduates (tempered with an understanding of performance and scale economics) pushes for Ruby/Rails, ensuring a talent pool from which to select.

Playing it safe

Building technology for non-technical clients pressures us to be mindful of that unknown future team. By comparison, an in-house development team starting a project may follow a more daring technology trajectory established by the technical director, especially if the director has a bunch of cohorts he or she can bring on board.

The clients we support often do not have an in-house team and must be prepared to eventually source that in-house team from the client's regional talent pool. This eventual hand-off to another team, in whole or in part, means that even if a client has a desire to dabble in new technologies, we still prefer to select those that have credible momentum and widespread usage. Vanilla node.js, MongoDB, Cassandra, and Apache Cordova for example. But we're not going to put a client on Meteor.js or MySQL's new NoSQL-like cluster. At least not yet. Similarly, we're not going to build key components on a platform for which it can be difficult or exceptionally expensive to source developers, such as Scala, unless the other project variables conspire accordingly.

Technology evolves and the number of people voicing their opinions keeps growing (welcome to this blog, by the way!) but getting work done requires shutting down the news feed and writing code. When working with other people's money to implement their ideas, we don't want to be technology early adopters. We'll satisfy those cravings after-hours.