According to Cloudflare, APIs represent the fastest-growing segment of traffic on their network, weighing in at 53% in 2021. Akamai put the number even higher at 83% in a less recent report. Whether your APIs are exposed to clients or restricted to use by internally developed applications, they’re the smart way to build your infrastructure and services. But, this increased emphasis on APIs means you also need to pay attention to API latency.
Your customers expect your API to perform well. That means quick responses. What’s the difference between API response time and API latency? How do you measure API latency? What aspects of API performance are under your control? Let’s talk about these API latency issues and answer these important questions.
What is API Latency?
API latency is how long it takes your infrastructure to respond to an API request. In other words, it’s the period of time between when the request arrives at your server and when the client receives the first byte of the response.
API latency is a valuable measure, but it’s not the whole picture, especially when considering what your customers see.
Consider this imaginary API request:
In this situation, the time it takes a message to travel between the client and server is a consistent 25 milliseconds. It takes the server three milliseconds to query its database and discover that it doesn’t have the item the client requested. So, it took the client 53 milliseconds to get a response.
Clients don’t care about the time it took for your server to get the request because they don’t know how long the first part of the trip took, nor do they care. They’re concerned about how long it takes to get a response, even though the time it takes for your server to see the request isn’t entirely under your control.
So, while we’re going to talk about the “dictionary definition” of API latency, we’re going to keep the customer experience in mind, too.
Network latency is the time it takes data to travel from one point to another. So in this diagram, the 25ms is the total time it takes a message to travel across all of the links between the client and server. The Round-trip time (RTT) is 50ms. As an API service provider, you don’t have much control over this number. You can ensure that your service is well-connected and use services like Cloudflare to store as much information as you can close to your clients, but your options are limited.
Response time is how long it takes for your infrastructure to respond to a request. As we’ll discuss below, you have more control over this number than any other aspect of API Latency.
Client experience isn’t a technical term. It’s simply, as noted above, how long it takes the client to see a response to their request. It’s sometimes called round-trip request time, round-trip time (which is easy to confuse with the more common networking term), and even API latency.
Regardless of what we call it, it’s the sum of RTT and response time. To improve your customer experience, you need to address both times.
What Factors Contribute to API Latency?
Let’s take a look at what creates API latency.
Where your clients are located will always affect API latency. A client in New York City will fare better than a client in St. Louis if they’re querying the same server in Virginia. While this is one of the factors that you have little control over, you can take a few steps to manage network latency.
Placing API servers in strategic locations can help make sure that your clients are never too far away from you. Of course, this can be an expensive undertaking and could be complicated if your API manages data that you have to share between every server. But, the best way to deal with distance is to reduce it.
Another option is to use a Content Delivery Network (CDN) to distribute static or relatively long-lived data as close to your clients as possible. Service providers like Cloudflare, Fastly, Amazon CloudFront, and Akamai can help close the distance between you and your clients.
Like distance, network congestion is another factor you have little control over. Depending on the nature of the congestion, you may not have any control at all. Fortunately, most congestion is transitory. Following the steps above for dealing with distance will often help, though.
A client can’t connect to a host without an address, so it needs to make at least one DNS request.
Here’s a DNS request that took 66 milliseconds to complete.
% dig api.google.com
; <<>> DiG 9.10.6 <<>> api.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15168
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;api.google.com. IN A
;; ANSWER SECTION:
api.google.com. 300 IN CNAME api.l.google.com.
api.l.google.com. 300 IN A 18.104.22.168
;; Query time: 66 msec
;; SERVER: 192.168.7.1#53(192.168.7.1)
;; WHEN: Thu Apr 07 17:02:57 EDT 2022
;; MSG SIZE rcvd: 119
You can’t make your client’s DNS requests go faster, but you can try to limit the number of times they have to perform them with TTL settings and a proxy server. Although, as we’ll see, they’re not free either.
Response time is the one factor in API latency that you have the most control over. You can work to lower it in your infrastructure, and you can easily measure and monitor it inside your server installations. While the diagram above only accounts for a database request, response time involves many moving parts, but they’re your moving parts.
Let’s look at a few:
Firewalls, API Gateways, and Proxies — Most API services sit behind network infrastructure that protects them from attacks, orchestrates failover, and implements load balancing. As we discussed in the previous section, putting your infrastructure behind a single hostname has several benefits. For example, you may be using Azure API Management or AWS API Gateway. But, the request still has to pass through this layer of your network infrastructure, and there will be a cost in terms of time.
Server code — The server at the center of your API reads network packets, translates them to the application layer, acts upon the requests, marshals the responses, and writes them back to the client. These steps take time, and they’re not the only places where your infrastructure can add unnecessary latency, but they’re among the first places to look.
API infrastructure — What kind of infrastructure do you have sitting behind your servers? Again, our example depicts a single relational database server, but most APIs have a mix of databases, static storage, and other resources. Each one has a response time.
Application Load — In addition to code and infrastructure, you also need to consider how your infrastructure reacts to high loads. Does it noticeably slow down? Will a busy day lead to cascading latency issues or failures?
API design — Your API’s design can affect performance. Are you caching frequently requested resources? If so, are you pushing them out to CDNs where possible? Are you sending data that clients don’t need and bloating message sizes? Are you forcing clients to make requests for supplemental data instead of sending it proactively? One way to lower latency is to send less data and reduce the number of requests.
API Latency and Security
How does latency affect your API’s security?
First of all, firewalls and proxy services add latency, but that doesn’t mean you should leave them out. An API with five millisecond round trip response times isn’t worth much after being hacked.
Second, we briefly touched on how your infrastructure responds to heavy loads. While you may not be able to build a system impervious to DDOS attacks, you can work to make life more difficult for your attackers. You can also use CDNs to protect you from attacks.
Considering API Latency
In this article, we looked at API latency and discussed the factors contributing to it. Latency is a complex phenomenon with many moving parts. You need to consider where your clients are, how they use your API, and how your API responds to their requests.
This post was written by Eric Goebelbecker. Eric has worked in the financial markets in New York City for 25 years, developing infrastructure for market data and financial information exchange (FIX) protocol networks. He loves to talk about what makes teams effective (or not so effective!).