With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do. Growing at over 30% a month has not been without challenges. Some reliability problems among them.
The post is rather long (though it's worth reading in full), so let me recap it:
- Tumblr grows at over 30% a month
- 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.
- Tumblr started as a fairly typical large LAMP application
- Changed to a JVM centric approach for hiring and speed of development reasons
- Initially just 4 engineers, now 20
- Newer, non-relational data stores like HBase and Redis are being used, but the bulk of their data is currently stored in a heavily partitioned MySQL architecture.
- NY is a different environment. Lots of finance and advertising. Hiring is challenging because there’s not as much startup experience.
- Started with the philosophy that anyone could use any tool that they wanted, but as the team grew that didn’t work.
- Process is roughly Scrum like. Lightweight.
- Hiring process should find about a candidate if (s)he's smart and whether (s)he will get stuff done?
- How Tumblr went from wee to webscale (gigaom.com)
- A Closer Look at Tumblr's Architecture (datacenterknowledge.com)
- To give you an idea of Tumblr's massive scale, some quick numbers: (shortformblog.tumblr.com)