Common Myths When Architecting Distributed Systems

December 5, 2010 by John Ellis

Curious Myths Illustration The great thing about meeting up with colleagues from previous jobs is that they are more than willing to keep you humble. Just recently, a few co-conspirators of mine grabbed a pint and discussed the merits of our application design choices, architecture patterns and just how gloriously over-engineered things used to be. Reflecting on things, I now realize I believed things had to be elaborate because the rest of the world appeared to be working on something complicated.

This train of thought falls in line with my previous post about managing cloud computing infrastructure – simpler remains better. In software engineering, just as in cloud computing infrastructure management, there are common misconceptions about developing distributed systems – and they have bitten me before.

Myth #1: XML is great for inter-process communication
XML is a great implementation-agnostic way of constructing documents and sharing them with unknown consumers. Think SOAP and ReST – XML works fantastic for this kind of service exposure since we may never know who is actually consuming them.

However, if you need to rapidly send messages between components then XML can absolutely kill message throughput. The effort of marshalling/unmarshalling documents not only eats CPU cycles, it also makes byte sizes of your messages much larger than they need to be. Switching from XML to a more streamlined (but still interoperable) format such as JSON can sometimes cut your bytes transferred by 50% while being faster for apps to parse.

Myth #2: If you need to speed up your application, just add more machines
There is a big difference between speed and throughput. Speed is how fast an operation can be executed, while throughput deals with how many operations you can perform in a given time.

If you have a stored procedure that takes 30 seconds, adding two servers won’t change a thing. You can add RAM to prevent disk access, attempt to add cores and multi-thread the process or increase the speed of your storage tier, but unless you break the stored procedure into several smaller procs you cannot spread the workload effectively across a farm.

A great use case of horizontal scalability is raw HTTP traffic. When it comes down to the number of simultaneous worker threads that need to be available a sea of HTTP servers behind a nice, robust load balancer that performs MAC re-writing can make a huge difference.

Myth #3: You have to have a NoSQL solution in order to scale
The blog High Scalability recently posted "What the heck are you actually using NoSQL for?" which had a very appropriate truism:

…we should choose the right tool for the job. Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at?

NoSQL databases do a great job of spreading storage or search tasks (or both) across a cloud infrastructure, but they aren’t a silver bullet. You don’t necessarily need to sacrifice ACID compliance to scale – the logical partitioning of databases can sometimes do the job quite nicely.

Myth #4: Commodity pricing means pay for what your users consume
Several cloud computing companies have made pricing really tricky to calculate. When an IaaS provider accounts for cost by IOPS or bytes per second it can be very difficult to discern what your bill will look like at the end of the month.

As much as we might want billing to be based on how many transactions are flowing through the system, most anecdotal research has shown billing ends up allocation based. For example, here is my hazy analytic of an on-demand application I deployed within a cloud hosting provider:
Hazy Analytics
First off: no, this was not drawn by a preschool class. Second: note that cost did not scale with usage. After scaling upwards we were reluctant to scale downwards since we couldn’t accurately forecast how many users would be flowing through. A more predictable, allocation-based billing model would have been welcome in a world of uncertainty.

I would love to hear your myths or the debunking of my own. Feel free to leave a comment on the blog or tweet us up @BlueLock!

  • Jason

    Nice post, but the title is Top 5 Myths but you only list 4! Love the merman dude.

  • John Ellis

    Myth #5: I can count.