As the title says, I'm going to create my own blog software. Why, you ask? Well, I want to be able to write my blog posts in markdown, complete with code formatting and syntax highlighting. I've managed to get the syntax highlighting to work with blogger, but it requires hand editing HTML, which is annoying to say the least. There are other options, but most of them I either strongly dislike, or have a very long history of security issues. A lot of them also do not support markdown. So, why write this post about it? What's the point, you may ask. I am a huge proponent of not reinventing the wheel. And making the decision to write software is not to be taken lightly. My rule of thumb is that if something meets 90% of your needs, you should ask the following questions: 1. Can I live without the 10%? 2. Is there a way to extend the software to add part, or all of the 10%? 3. If there is a way, will it be so painful, that the investment to write your own is worthwhile? In this ...
Designing scalability, by being inefficient.
You ain't gonna need it
When I hear about dealing with things at scale, I often find people reach for
batching of requests, caching, and other similar ideas. I'd like to challenge
this thought process.
Let's say you have a Movie's API, and users can get a list of movies, a single movie by ID, or many movies in a single request. What does this look like on the backend?
Well, you could handle the batch request with a batch query to the data store. Seems efficient, right? And I'd argue it is. But what if it's 100 records, 1,000 records, 10,000 records? You'd probably want to distribute the load right? You could break that single request up into smaller batches, and query in parallel right. But what if, you send 1 HTTP request per record (with caching by ID)? Seems wasteful, right?
Let's say you have a Movie's API, and users can get a list of movies, a single movie by ID, or many movies in a single request. What does this look like on the backend?
Well, you could handle the batch request with a batch query to the data store. Seems efficient, right? And I'd argue it is. But what if it's 100 records, 1,000 records, 10,000 records? You'd probably want to distribute the load right? You could break that single request up into smaller batches, and query in parallel right. But what if, you send 1 HTTP request per record (with caching by ID)? Seems wasteful, right?
System outline
If you do that, however, you can get crafty. Let me give a small outline of
the single object HTTP flow. Let's assume the data store API is written in C#
- Front end receives request from load balancer
- Front end sends request to API
- API sends 1 HTTP request per ID, first going through local cache to avoid HTTP requests that can be avoided.
- Data access service gets requests routed according to a minimal-disruption hash of the URI, such as api/v1/movie/{id}
-
Data access server recieves request. Upon receipt, using a concurrent
dictionary, calls:
await requestTasks.GetOrAdd(uri, u =>ProcessRequestAsync(httpContext)); requestTasks.TryRemove(uri, out _);
- Upon completion of the task, returns the result.
So, when you send a request, the same URI goes to the same backend. If the
URI is in a dictionary, the existing task is returned. When the code
completes the task, it's removed from the dictionary. What this effectively
does, is says "If I'm already looking up this data, just wait for that, if
not, start looking it up and wait". You can assume that somewhere within
ProcessRequestAsync, caching is also used, or it may be done before this
level.
Benefits
What are the benefits of this approach. I'm going to leave out the drawbacks,
since I'm assuming you already see them.
- Scaling. You can add hosts to distribute the load more.
- Natural distributed cache
- Hot records share 1 lookup for many incoming requests, until that lookup is complete.
- Simplicity. The code focuses on a single record at all times.
Scaling
Adding more hosts spreads the load of requests, and due to a consistent hash algorithm, such as Ketama, request migration is minimal. A new host at most
impacts 2 other hosts, all of the others retain the requests they were
already serving. You can also use an unstable algorithm, such as CRC(32/64),
or SHA(2/256/512). But scaling will redistribute just about all requests.
That actually shouldn't have a huge impact, due to the 1 query for many
requests until complete.
Natural distributed cache as a side effect
Due to the load being spread across hosts, and the same requests going to
the same hosts, you can use a simple in memory cache. It's naturally
distributed across the hosts, and you can skip a more traditional
distributed cache (Such as Redis, Valkey, KeyDB, memcached etc).
Preventing the cache storm/stampeding herd problem
Multiple concurrent requests for a single item, result in 1 request to the
data store. In the event of a stampeding herd, or a cache storm, the data
store is protected from the load. Taking this further with local cache, you end up with one request until that local cache expires.
Closing thoughts
I often see people prematurely optimizing, which we all know is the root of all evil. This system design works in a lot of cases, but not all. It can also be expanded upon, as an example for the movies API, you could fetch your list of movies by page the same way, without local caching. Under load, each unique query to the data-store would be executed only once when under load.
That's all folks.
Comments
Post a Comment