This is the first in a series of articles about the best ways to manage scale out workloads.
Scale. We hear a lot about managing scale being one of the key challenges in a world where numbers can grow very large very fast. Whether it’s huge amounts of data being collected for analysis, websites that need to be able to handle hundreds of thousands of concurrent visitors or online applications servicing mobile users, being able to manage scale is of paramount importance.
However, there seem to be some hangovers of the traditional enterprise architecture mindset that still places an ability to scale vertically high on the list when selecting infrastructure. Issues such as how much memory an OS can address or how many CPUs can be used in a single instance are unimportant beyond a certain level. Likewise the amount of data single instances of MySQL or other databases can handle compared to industry giants such as Oracle or IBM DB2 is often irrelevant. Modern application infrastructures scale these technologies just fine using scale out. Scale out is analogous to eating an elephant.
Question: How do you eat an Elephant? Answer: One bite at a time.
Scale out handles large scale by servicing it with very many small units. Units can be added or removed to service the load and the architecture of the applications is such that requirements for data consistency and high availability are handled through application design. Of course there will certain types of application that will suit this type of environment, but they are increasingly becoming fewer a far between. Most cloud implementations use small or medium sized machine images as the most cost effective way of servicing applications. There are many examples of how online applications such as Obama for America, Instagram, Quora and others scale their infrastructure in the cloud. In these web scale cloud implementations the factors that limit scale are:
A system that relies on people to build and configure new resources is not going to be efficient at scaling. Automation is everything when it comes to dealing with scale out at scale which is why configuration management solutions such as Puppet and Chef or service orchestration solutions like Juju are becoming extremely popular. Being able to automate the deployment, scale out and scale back of infrastructure based on real-time monitoring data is the difference between success and failure in modern cloud apps. Automation isn’t easy and even when done well, there can still be challenges dealing with rapid spikes in demand such as populating caches, but this is where skilled ops earn their money, not performing routine tasks to add or remove capacity.
Flexibility is one of the key attractions of cloud. It’s also one of the key components of scalability, so things that restrict flexibility can also restrict ability to scale efficiently. Software that runs on a per-core, per-socket or per-server license model needs careful management in cloud to avoid cost spiraling out of control when additional capacity is required. Likewise software that is run on an annual subscription basis can create a large amount of cost where value may only be delivered for a very short period of time. Add in license key management, subscription codes or activations, and the cost overheads can be considerable. This is why Open Source technologies free of deployment restrictions, such as Ubuntu, memcached or MySQL, have become extremely popular components of scale out architectures.
Ability to visualise the solution to the problem
Scale out architectures can potentially involve connecting very many different components. The relationships between all the components are important, as breaks in the chain can drastically reduce efficiency. In old-style scale up architectures it was easier (although not necessarily easy) to visualise and draw up how the solution would look. With scale out, where the number of nodes can run into thousands, mapping out the relationships and dependencies requires a vision that can be beyond the scope of a single person. Service orchestration tools such as Juju can help this problem but again it is where skilled devops and architects can add enormous value.
Ubuntu is the scale out leader. We have become this by focusing on the scale areas that matter: automation, flexibility and tools to help visualise the solutions. In some cases size does matter, but it is far less important to scale out than you might think.