Αν ενδιαφέρεται κανείς για db sharding, δείτε εδώI recently started using Twitter and have become a big fan of the service. I've been appalled by the downtime the service has endured, but sympathetic because I assumed the growth in usage is so fast that much might be excused. Then I read this TechCrunch post on the Twitter usage numbers and sympathy turned to bafflement - because I'm intimately familiar with SMS Gupshup, a startup in India that boasts usage numbers much, much higher than Twitter's, but has scaled without a glitch.
I'll let the numbers speak for themselves:
* Users: Twitter (1+ million), SMS GupShup (7 million)
* Messages per day: Twitter (3 million); SMS GupShup (10+ million)
Actually, these numbers don't even tell the whole story. India is a land of few PCs and many mobile phones. Thus, almost all GupShup messages are posted via mobile phones using SMS. And almost every GupShup message is posted simultaneously to the website and to the mobile phones of followers via SMS. That's why they have the SMS in the name of the service. Contrast with Twitter, where the majority of the posting and reading is done through the web. Twitter has said in the past that sending messages via the SMS gateway is one of their most expensive operations, so the fact that only a small fraction of their users use the SMS option makes their task a lot easier than GupShup's.
So I sat down with Beerud Sheth, co-founder of Webaroo, the company behind GupShup (the other founder Rakesh Mathur is my co-founder from a prior company, Junglee). I wanted to understand why GupShup scaled without a hitch while Twitter is having fits. Beerud tells me that GupShup runs on commodity Linux hardware and uses MySQL, the same as Twitter. But the big difference is in the architecture: right from day 1, they started with a three-tier architecture, with JBoss app servers sitting between the webservers and the database.
GupShup also uses an object architecture (called the "objectpool") which allows each task to be componentized and run separately - this helps immensely with reliability (can automatically handle machine failure) and scalability (can scale dynamically to handle increased load). The objectpool model allows each module to be run as multiple parallel instances - each of them doing a part of the work. They can be run on different machines, can be started/stopped independently, without affecting each other. So the "receiver", the "sender", and the "ad server" all run as multiple instances. As traffic scales, they can just add more hardware -- no re-architecting. If one machine fails, the instance is restarted on a different machine.
In read/write applications, the database is often the bottleneck. To avoid this problem, the GupShup database is sharded. So, the tables are broken into parts. For e.g., users A-F in one instance, G-K in another etc. The shards are periodically rebalanced as the database grows. The JBoss middle-tier contains the logic that hides this detail from the webserver tier.
I'm not familiar with the details of Twitter's architecture, beyond knowing they use Ruby on Rails with MySQL. It appears that the biggest difference between Twitter and GupShup is 3-tier versus 2-tier. RoR is fantastic for turning out applications quickly, but the way Rails works, the out-of-the-box approach leads to a two-tier architecture (webserver talking directly to database). We all learned back in the 90's that this is an unscalable model, yet it is the model for most Rails applications. No amount of caching can help a 2-tier read/write application scale. The middle-tier enables the database to be sharded, and that's what gets you the scalability. I believe Twitter has recently started using message queues as a middle-tier to accomplish the same thing, but they haven't partitioned the database yet -- which is the key step here.[...]
Ενδιαφέρον άρθρο για ανάπτυξη συστημάτων microblogging
Ενδιαφέρον άρθρο για ανάπτυξη συστημάτων microblogging
Πηγή (και συνέχεια): http://anand.typepad.com/datawocky/2008 ... ntime.html
- PaP
- Venus Project Founder
- Posts: 1077
- Joined: Wed Apr 21, 2004 12:06 am
- Academic status: Alumnus/a
- Location: San Francisco
- Contact:
Re: Ενδιαφέρον άρθρο για ανάπτυξη συστημάτων microblogging
http://dev.twitter.com/2008/05/twitteri ... cture.html

Και ακόμα και με RoR που είναι 2tier υπάρχουν λύσεις όπως JRuby οπότε μπορείς να πας σε 3 tier ή το επερχόμενο MagLev
Η ίσως και να φταίει... ξέρω γω αλλά σίγουρα έχει λάθος αρχιτεκτονική
Αλλά νομίζω το καλύτερο tool
www.plurk.com
Απλά να μην ακούω κακά λόγια για το RoR γιατί το αγαπάω πολύTwitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.

Και ακόμα και με RoR που είναι 2tier υπάρχουν λύσεις όπως JRuby οπότε μπορείς να πας σε 3 tier ή το επερχόμενο MagLev
Η ίσως και να φταίει... ξέρω γω αλλά σίγουρα έχει λάθος αρχιτεκτονική
Αλλά νομίζω το καλύτερο tool
www.plurk.com
Re: Ενδιαφέρον άρθρο για ανάπτυξη συστημάτων microblogging
Pap μην παρασέρνεις τα παιδια....PaP wrote:http://dev.twitter.com/2008/05/twitteri ... cture.html
Απλά να μην ακούω κακά λόγια για το RoR γιατί το αγαπάω πολύTwitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.
Και ακόμα και με RoR που είναι 2tier υπάρχουν λύσεις όπως JRuby οπότε μπορείς να πας σε 3 tier ή το επερχόμενο MagLev
Η ίσως και να φταίει... ξέρω γω αλλά σίγουρα έχει λάθος αρχιτεκτονική
Αλλά νομίζω το καλύτερο tool
http://www.plurk.com

που θα πάει θα το δουμε...