Response caching and usage quotas

1. Response caching and usage quotas

We will explore how to make our APIs lightning fast and reliably available under load using caching and quotas. In this video, we’ll learn to configure response-caching policies, apply product-level quotas and spike-arrest patterns, and measure key performance metrics like latency and cache hit ratios.

2. Setting up response caching

The purpose of response caching is to save a response from a particular endpoint so the request doesn't have to go to the endpoint every time. Imagine your most popular endpoints returning the same data for many requests. Caching that response means your backend doesn’t have to work as hard, and your users get faster replies. Round trips to the actual endpoints are more time-consuming and more computationally expensive than just retrieving a prepared response from the cache. This is what APIM can do via caching policies.

3. Configuring response caching

In the portal or via policy XML, you’ll specify a cache duration, decide whether to vary by query string or headers, and choose where the cache lives—either directly on the API Management gateway (sometimes called the 'edge'), which is the entry point for all your API traffic, or in an external cache, such as one provided by a popular caching engine called Redis. Once enabled, you’ll see repeated calls served instantly from cache, dramatically reducing round-trip times.

4. Protecting against request surges

With caching in place, it’s time to protect our APIs from sudden surges. If way too many clients try to access our APIs, it can slow the system down. It can also force us to overpay for resource usage. In the worst-case scenario, like in a distributed denial of service attack, it can bring our system down.

5. How usage quotas protect APIs

Quotas enforce a fixed number of calls over a period, which makes it perfect for protecting against sudden spikes of requests. With quotas in place, it's harder to perform a distributed denial of service attack or end up paying an excessive amount of money for abnormal resource usage.

6. Measuring resource usage

Of course, you can’t improve what you don’t measure. Application Insights integration can be enabled to track a wide variety of metrics, such as average latency, spot outliers, and so on. For example, if you see that a 'GET Products' call has a low cache hit ratio but high traffic, you might decide to increase its cache duration from 5 minutes to 30 minutes to improve performance and reduce backend load. Armed with these insights, you can fine-tune cache durations, adjust quota thresholds, and optimize your service tiers to strike the perfect balance of cost and performance.

7. Let's practice!

Let's apply what we learned!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.