I don’t want to suggest that the Twitter team doesn’t know what they are doing, because I am sure they are, and have been, very smart folks.
And yet, the system keeps falling down (it seemed to be down most of yesterday), and there really is no good reason I can imagine for this to keep happening. I would offer myself up to help them solve the problem, but I suspect they probably have folks smarter than me working on it.
But I say all this to say I am sorry if I am insulting your intelligence, Twitter engineering team. It is not my intent. But I am going to suggest a simple structure that would help to solve your problem. If you have already thought of it or are implementing something better, Godspeed.
The essence of the concept is in the following diagram:
1. Automated CPU provisioning
First, in our work at Kloudshare, we use Amazon Web Services (AWS). You do not have to use AWS, but what you really *do* need is the ability to provision resources more or less on demand, or to just have enough extra server capacity lying around when you need it. The key is automated software that detects certain types of loads and brings on more computing capacity when needed. So you need, ideally, to be able to provision the CPUs on demand in software.
The critical concept behind most scaling is denormalization. If you are a database guy/gal you already know what that means, but if you are not I will just explain it in the context of Twitter. In my suggested architecture, every user’s outbound tweets should be stored in a separate database from their inbound tweets. So when I look for all the stuff I have sent, it is in one place. When I look at all the things I have received, it is in another place.
In typical normalized databases you would want to store every message sent only once and not in two places as I suggest above. But while perfectly normalized databases make for “cleaner” databases, they cannot scale. Database purists often speak of the merits of normalization. And certainly some normalization is generally necessary. But in the new database world, a perfectly normalized database is an un-scalable database.
In the above diagram you see that we have separated users into clusters. So we have not just stored “sent tweets” in a different database from “received tweets”, we have broken each of these storage types into smaller groups of users. In our simple example, we store user data in clusters of 100 users. This is called sharding.
The purpose of this sharding is that when a user sends a tweet s/he will have no bottleneck in writing that data because only 100 other users are trying to write to that physical server. It can never be overloaded. That server then has the responsibility for writing the sent tweet to all the users that are subscribers to the given senders tweets as well as to the @ recipients (Twitter jargon for those of you not yet initiated).
The beauty of this is that each given database holding “received tweets” is not overburdened either. It only receives messages that are aimed at it. And it only is queried by the users who have their inbound tweets stored on that server.
What is great about this scheme is that it totally unburdens any given physical server and it is infinitely scalable. You could have a hundred or 10,000 such servers and performance scales 100% linearly with the addition of physical machines.
4. Shard Splitting
To be able to add CPUs smoothly, you have to be able to, do something I call shard splitting. In the example I gave above every database holds data for exactly 100 users. But in the real world each database should continue to grow until it starts to be over burdened. Then you “split the shard” into two databases. Each shard should know how to “split itself” when it becomes too busy. So if a database had gotten too slow holding the inbound tweets for 500 users, after the shard split there are two databases that each hold inbound tweets for 250 users each. This self splitting of shards allows for a very organic and automated type of scaling.
5. Distributed User Lookup
Finally, In order to know what machines have what users data there is a bit of overhead in storing a lookup table for each user that indicates where their inbound and outbound data servers are. This is necessary so that each server storing “sent tweets” knows what “received tweet” servers to send each tweet to. I suggest storing this table in memory, or on disk, across all needed servers.
This is a fairly light burden even if you are storing tables for tens of millions of users. I would suggest this table be synchronized across all the servers that need it using something like Terracotta, which is a specialized kind of “plug in” for the Java Virtual Machine that makes it so standard objects can be automatically kept in-memory and synced across multiple servers. In my estimation it is a very powerful tool in the scaling arsenal.