Yesterday's frenzy of tracking results of US presidential elections was yet another demonstration of the importance of timely and accurate counting. Knowing how many events has happened is the cornerstone of statistics and statistics is the foundation of all modern science and engineering. Counting has made Google successful and counting is gaining even more prominence recently with the increased use of social signals in algorithms powering many web services. It is very difficult to implement counters using traditional storage systems in a scalable way, since traditional storage systems operate in a lock-read-update-write-release sequence, thus preventing high frequency updates of counters to be performed in parallel. There were many specialized solutions devised over the years that enabled fast parallel counting, but only with the recent development of cassandra counters we got a general-purpose tool for highly available counters that could be incremented with high frequency.
At Zemanta we have started to use cassandra counters only recently, so we don't have extensive experience with it yet. But first impression is extremely positive. We are able to process ~100 increments per second effortlessly on a very modest hardware setup. Additonally, counting is very fast with an average time needed to do an increment of ~5ms and maximums rarely exceeding 30 ms.
- Distributed Algorithms in NoSQL Databases (highlyscalable.wordpress.com)
- Eventually Consistent Data Structures (from strangeloop12) (slideshare.net)
- NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra (slideshare.net)