Fighting for nines

Running a 24/7 web service is about fighting for nines. Number of nines is used by Operations people to measure what is the percentage of time that a site is available to its users. Two nines (99%) indicates three days, three nines (99.9%) nine hours, and four nines (99.99%) one hour of total downtime per year. A rule of thumb used by system architects is that each additional nine increases costs by ten fold. The reason for this is, that the less downtime time you can afford, the more effort and resources are required to anticipate all the things that might happen with the system. Most people associate high-availability with the redundant hardware. But in practice high-availability is mostly about a team of highly motivated, experienced, skilled and resourceful Operations people. At Zemanta we have a very complex system in place supporting our service. Even though we have a pair of extremely dedicated and capable system administrators, we were not yet able to achieve more than two nines, which makes us approximately on par with github (see below). But we fight hard every day to conquer that additional nine that separates boys from men!

About This Week's Availability

Image representing GitHub as depicted in Crunc...

It hasn't exactly been a good week for GitHub's nines. Just under an hour of total downtime over the past week comes to 99.46% uptime - far below our standards. As you can imagine, it's been a frustrating week for us too. We've been fighting a Distributed Denial of Service (DDoS) Attack since Saturday.

We need this to understand how you use our service - you can take it out if you like. Cheers, your Blogspire team.

via: github.com