SIGIR'12: Search is Disruption-Ready


Just 20 years ago only librarians and scholars considered themselves with the problem of identifying the right keywords to retrieve information that they were seeking. Nowadays one third of the world population is spending considerable effort in identifying just the right keywords to successfully navigate information overload of today's web. By tricking people into using this bizarre user interface Google has made it fortunes and search has become 100 billion dollars industry. Google and other search engines are built on ideas developed in 1970's and only slightly refined since then. The field of Information Retrieval has big problems in squeezing out any additional performance improvements of keywords based search and maybe it is time to give up on this quest. In today's SIGIR keynote, Norbert Fuhr has said that "the greatest improvements in information retrieval will come from better understanding of the user". I think that better understanding of the user is achievable not only by infringing on the privacy of the user (the path taken by Google in his Google+ push), but also by devising new human-computer interaction approaches that are better at capturing information needs of  the user than keywords-based search.

I think that startups have much higher chance of success in finding new interaction paradigms than either academia or established web search industry, since startups are not limited by the need to iteratively refine existing approaches. One particular instance where I've experienced the lack of useful research in information retrieval myself was while developing our experimental service for helping bloggers find inspiration called BlogSpire. This service works surprisingly well in delivering daily personalized set of inspirational articles, but only after we spend a non-trivial amount of effort in explaining to the system the topics that each individual user is interested in. While the scientific community has produced hundreds of articles on algorithms for topic modeling, I'm still searching for an article that would address the problem of capturing user's topic of interest, a problem which has turned out to be surprisingly difficult to solve.