Distributed systems, inefficiency, and you

Designing scalability, by being inefficient.

You ain't gonna need it

When I hear about dealing with things at scale, I often find people reach for batching of requests, caching, and other similar ideas. I'd like to challenge this thought process.

Let's say you have a Movie's API, and users can get a list of movies, a single movie by ID, or many movies in a single request. What does this look like on the backend?

Well, you could handle the batch request with a batch query to the data store. Seems efficient, right? And I'd argue it is. But what if it's 100 records, 1,000 records, 10,000 records? You'd probably want to distribute the load right? You could break that single request up into smaller batches, and query in parallel right. But what if, you send 1 HTTP request per record (with caching by ID)? Seems wasteful, right?

System outline

If you do that, however, you can get crafty. Let me give a small outline of the single object HTTP flow. Let's assume the data store API is written in C#

Front end receives request from load balancer
Front end sends request to API
API sends 1 HTTP request per ID, first going through local cache to avoid HTTP requests that can be avoided.
Data access service gets requests routed according to a minimal-disruption hash of the URI, such as api/v1/movie/{id}

Data access server recieves request. Upon receipt, using a concurrent dictionary, calls:

 
      await requestTasks.GetOrAdd(uri, u =>ProcessRequestAsync(httpContext)); 
      requestTasks.TryRemove(uri, out _);

Upon completion of the task, returns the result.

So, when you send a request, the same URI goes to the same backend. If the URI is in a dictionary, the existing task is returned. When the code completes the task, it's removed from the dictionary. What this effectively does, is says "If I'm already looking up this data, just wait for that, if not, start looking it up and wait". You can assume that somewhere within ProcessRequestAsync, caching is also used, or it may be done before this level.

Benefits

What are the benefits of this approach. I'm going to leave out the drawbacks, since I'm assuming you already see them.

Scaling. You can add hosts to distribute the load more.
Natural distributed cache
Hot records share 1 lookup for many incoming requests, until that lookup is complete.
Simplicity. The code focuses on a single record at all times.

Scaling

Adding more hosts spreads the load of requests, and due to a consistent hash algorithm, such as Ketama, request migration is minimal. A new host at most impacts 2 other hosts, all of the others retain the requests they were already serving. You can also use an unstable algorithm, such as CRC(32/64), or SHA(2/256/512). But scaling will redistribute just about all requests. That actually shouldn't have a huge impact, due to the 1 query for many requests until complete.

Natural distributed cache as a side effect

Due to the load being spread across hosts, and the same requests going to the same hosts, you can use a simple in memory cache. It's naturally distributed across the hosts, and you can skip a more traditional distributed cache (Such as Redis, Valkey, KeyDB, memcached etc).

Preventing the cache storm/stampeding herd problem

Multiple concurrent requests for a single item, result in 1 request to the data store. In the event of a stampeding herd, or a cache storm, the data store is protected from the load. Taking this further with local cache, you end up with one request until that local cache expires.

Closing thoughts

I often see people prematurely optimizing, which we all know is the root of all evil. This system design works in a lot of cases, but not all. It can also be expanded upon, as an example for the movies API, you could fetch your list of movies by page the same way, without local caching. Under load, each unique query to the data-store would be executed only once when under load.

That's all folks.

The Coffee Pots Thoughts

Search This Blog

I'm going to write my own blog software