One of the technologies that we have introduced to Zemanta's technology stack in the past year is RabbitMQ. I have first encountered message queues some ten years ago when I've led the project of toll collection software upgrade for Slovenian highway administration. Toll booths are by their nature distributed and they must operate even if the connectivity to the central server is down. By using message queues we were able to build a reliable system that is still in operation today (with some major changes, though). The same rationale of allowing disparate systems to function independently, while still communicating between each other, was also the main reason why we felt the need to introduce the message queues to our system. Zemanta's backend does a lot of data processing in the background that is mostly asynchronous and consists of a series of steps. For example, to provide thumbnails for related articles, we first extract set of potential image urls from the article, then we download images in order to identify the most suitable thumbnail, and once we find one, we upload it to S3/CloudFront. By using two message queues we have decoupled these three step process into three completely separated modules with a clear separation of responsibilities. RabbitMQ is written in only 5000 lines of Erlang and it is reliable as hell (our instance of RabbitMQ is up since February 5th). While we had some reservations initially about introducing Erland based software to our technology stack, we have found in the past year that RabbitMQ is such a low-maintenance system that lack of knowledge of Erlang programming is not a problem. Well, on the other hand, AMQP clients have a fair share of problems and we still observe some occasional hiccups in their operations.
- High Availability in RabbitMQ: solving part of the puzzle (rabbitmq.com)
- RabbitMQ Performance Measurements, part 1 (rabbitmq.com)
- What's the best thumbnail for this page? (zemanta.com)