Thoughts & Insights

Understanding Scalable APIs

Emil Rasmussen

CTO at Enterspeed

Thumbnail for blog post: Understanding Scalable APIs

Empowering developers to deliver scalable APIs. That’s the Enterspeed mission. But what exactly are scalable APIs, and why should you care? In this blog post, we’ll define what we mean by scalable APIs, discuss their importance, and scratch the surface of how you can build scalable APIs.

What are scalable APIs?

Scalability is a very important topic when you work to increase performance of websites. Scalability is the ability of a system to handle an increasing amount of load while still delivering fast response times. In our definition, a web API delivers a fast response time if it responds in 50 milliseconds or less.

The compelling case for scalable APIs

Nobody enjoys using a slow website. Numerous articles and studies spanning over 25 years have shown that faster websites attract more users and generate better business outcomes. Today, many websites rely on APIs; therefore, having a scalable API is essential for delivering a fast and responsive user experience. A slow or unavailable API results in a poor user experience, which could drive customers to look for alternatives online.

Additionally, cost-effectiveness is another crucial aspect to consider. Efficient resource utilisation means you can handle more load without incurring excessive costs. On a related note, it's also worth considering the CO2 emissions associated with running web servers.

In summary, scalable APIs benefit the users, the budget, and even the environment.

How do you scale an API?

Scaling an API involves two main components: performance optimisation and operational aspects, including hardware considerations.

Performance optimisation

Performance optimisation is a complex topic, covering everything from database indexing to memory usage, code optimisations, and caching techniques. At Enterspeed, we have a saying: "Less is faster." This means that the fewer computations and dependencies in each individual request, the better your performance will be.

However, how do you prioritise what to optimise? A famous saying attributed to computer science legend Donald Knuth states that "premature optimisation is the root of all evil."

You need to identify the bottlenecks in your system – be it the database or a third-party service – before diving into optimisation. This realisation often leads to the concept of observability, emphasising that you need to build and use the application before undertaking any significant performance optimisation.

What if your project doesn’t get popular enough? Or how do you know if the hotspot is mostly the database, or if it’s a third-party user review service’s API that’s slowing you down? This brings us into the world of observability, and you now realise that you have to build and use the application, before you can begin to do real performance optimisation. “But I can do load testing using the cloud” you begin to think. And yes, that's yet another thing you need to add to your plan.

Performance optimisation is a lot of fun, and a lot of hard work.

Scaling up and down

The operational side of scaling is a two-way street. A scalable API should not only be able to scale up with increased usage but also scale down when demand decreases. Scaling down makes it especially interesting – and, frankly, very impractical if you are not utilising cloud native services.

The two classical ways to scale up, is either:

1) vertically, by adding more CPU and memory to one server or

2) horizontally, by adding more servers to share the load.

Vertical scaling requires the entire server to always be provisioned to handle the maximum load. When scaling horizontally, a load balancer is used to coordinate the load between multiple servers, and in any non-cloud setup the server would be either running or ready to be booted up.

Scaling in the cloud is generally the way to go. The modern cloud technologies each come with their own set of trade-offs. For instance, an Azure App Service with automatic scaling can do both vertical and horizontal scaling but are somewhat slow going from 1 instance to 50 instances if the usage suddenly spikes (such as in a flash crowd moment). Serverless functions, on the other hand, are much more responsive to sudden spikes, but have a different runtime environment than, say, a normal .NET web API app. Kubernetes is a whole different beast that requires deep operational knowledge on setting up and running a k8n cluster.

Want to see why others choose Enterspeed as their new standard headless API for several reasons? 👉 Replacing heavy code maintenance with fast time to market

In summary

In today's fast-paced digital landscape, scalability is not just an option – it's a requirement. Whether you're just starting to build an API or looking to optimise an existing one, it's vital to incorporate scalability into your design from the .

If you want to know more about scaling on Enterspeed, you can jump to use case: Caching and scaling.