What is Scalability?

Let’s imagine we built an app that helps users land jobs. We woke up the following day to notice our app was an overnight success. Multiple publications are raving about our new software, and the app is even trending on Twitter. We couldn’t have asked for a more successful launch! However, we checked our email and noticed we received multiple reports of our app being incredibly slow to use. The app takes several minutes to load, and users are losing patience. After further investigation, we determine our app cannot handle the extreme increase in requests from all the new users signing up. This is the perfect time to consider the scalability of our software and how we can handle the uptick in usage.

Introduction to Scalability

Scalability, also commonly referred to as the process of “scaling”, is the ability of a system (e.g., an application, a database) to increase or decrease in performance and cost in response to demand. Thinking about a software’s ability to scale is crucial because it leads to lower maintenance costs, improved user experience, and a decrease in overall cost over the system’s lifetime. However, unlike our new app, not every software is an overnight success. This raises the question of when is the “right” time to scale a system?

The “Right” Time to Scale

The answer to when we should plan to scale a system isn’t always trivial. In our previous example, if we had known in advance that our new app would blow up in popularity overnight, our scalability planning might have started much sooner. However, most applications don’t become an overnight success, and thus, it’s important to note, not every system needs to be immediately scaled. As a general rule, when building any system, we want to avoid premature optimizations.

Premature optimization refers to the process of trying to make software more efficient when the software is at a stage that is too early to justify the optimization. Creating premature optimizations often leads to time wasted on code that will likely change later on. A famous quote from Donald Knuth (a notable computer scientist and author) sums up this notion:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

In essence, we want to utilize our resources to design and build our system with some initial optimizations (that critical 3% Donald Knuth mentions). Once completed, we can benchmark specific parts of the system to find what needs to be optimized. Keep in mind, even though we can build a system to handle millions of requests, doesn’t mean we should if our system currently only receives hundreds.

Returning to the example of our app, we made the correct move by not optimizing for millions of users initially. However, now that our application is serving a huge number of requests let’s examine strategies for scaling the software to deal with the influx.

Scaling Techniques

Before we dive into the techniques we can use, let’s consider what parts of a system we will be scaling. Whenever we want to scale a system, we usually refer to scaling a system resource (or multiple resources). A resource can be any physical or virtual component of a software system. Some examples of resources include memory, storage, or a database. Each of these resources can be part of a resource pool, a collection of resources ready to be used by the system. When a resource is used and is no longer needed, it is returned to the pool to be reused later. As a resource becomes a bottleneck (a point of congestion that reduces overall system performance), we can perform two types of scaling on resources: Vertical Scaling and Horizontal Scaling.

Vertical Scaling

Vertical scaling, also known as “scaling up”, increases the power of one particular resource in a resource pool. For example, if we are working on a financial trading platform, we might have a single server connected to a database with 10TB (terabytes) of storage. Financial information is incredibly large, and we are alerted that our database is very low on storage. In this situation, since the database (our resource) is becoming a bottleneck for our system, it makes sense to scale it. We can scale vertically by upgrading the storage to increase the database’s storage capacity. Here is what our scaling solution would look like:

Vertical Scaling

Note the increase in storage capacity does not change anything about our system architecture or code. This is an essential advantage of Vertical scaling. Here are some other common benefits of Vertical scaling:

Lower initial cost and setup: Since we start with one instance of a resource, the initial costs and of the system architecture may be lower. The initial setup time may also be lower.
Decrease in maintenance and operation costs: Maintenance only needs to be performed on a single machine (or resource).

However, we do have to be wary of the disadvantages:

Increase in resource downtime: There can be an increase in downtime when resource upgrades are implemented.
Limited scaling: All physical resources have a limit on the number of upgrades they can implement.
Increased costs: Typically, the more powerful the resource upgrade, the more expensive it is to implement.

Horizontal Scaling

The second main type of scaling is Horizontal scaling, also known as “scaling out”. Horizontal scaling is the process of increasing (or decreasing) the number of instances of a particular resource in a resource pool. For example, let’s imagine that we run an on-demand transportation app. Our application runs three different services: taxi, water taxi, and food delivery. We notice our single server is responding to requests very slowly because it can’t handle all the requests for all three services. Since the single server (our resource) is the bottleneck for the performance of the system, it makes sense to scale it. We can scale the server horizontally by purchasing three more servers so that we better distribute requests and decrease the load on the existing server. Here is what it would look like:

Horizontal Scaling

Note how the overall load of the system was decreased because we have more servers to handle requests. This is one of the main advantages of horizontally scaling a resource. Some other benefits include:

Reduced downtime: More resource instances produce a decrease in downtime during periods of outage or maintenance. If one instance goes offline, the rest will still be available.
Unlimited scaling: Since Horizontal scaling adds brand new instances, there is theoretically an infinite number of resource instances that can be added to increase system scalability.

However, be wary of the disadvantages of Horizontal scaling:

Increase in complexity of resource management: Since there are multiple instances of a resource, there is an added complexity of managing, operating, and maintaining the resource.
Increase in initial costs and setup: Horizontally scaling may initially produce higher costs in addition to increased setup time for new resource instances.

Understanding Scalability: A Beginner's Guide

What is Scalability?

The “Right” Time to Scale

Scaling Techniques

Vertical Scaling

Horizontal Scaling