Save time on search engines: New algorithm “grows lists”

Fri, Jun 21, 2013 21:19 CET

Have you ever tried to create a list of all upcoming events in your local area? If you live near Boston, it would be a useful list considering how bad traffic can be when there is a public event like a street festival or fundraising walk.

The problem is that although such events are planned well in advance, there is no online central list of events in Boston. Rather, there are many different sources of event listings, each of which is incomplete, such as Boston.com, Eventbrite and Yelp.

As the amount of information on the Internet continues to grow, we cannot continue to rely on individual sites to compile information for us. Instead, we need a practical and fast way to do that automatically.

In studying this issue, my colleagues Ben Letham, Katherine Heller and I created an algorithm that takes advantage of the wisdom of the crowd to instantly collect and sort information from across the Web into a list of relevant items. It starts with an initial list, a small seed of relevant examples, which the user chooses, and grows a long list of other relevant examples using many Internet experts.

Our algorithm allows the user to “grow a list” of almost anything, starting from whatever seed they want. It conducts many Internet searches and processes many sites to compile a list. It often searches for terms that weren’t even contained in the original seed. For example, if the original search terms were “Barrack Obama” and “Scott Brown,” the algorithm also conducts a search for “Nancy Pelosi” and “John McCain” while constructing its list of political figures.

The algorithm takes advantage of the wisdom of the crowd by using whatever is posted on the Internet. There are a surprising number of people who know a lot about a subject and post lists of relevant items on a site. When we ran the algorithm with the seed “challah” and “knishes” to get a list of quintessential Jewish foods, we compiled information from thousands of sources like recipe sites, supermarkets, informational sites, menus, and blog entries with lists. A good number of these sites were compiled by people who are truly experts on the topic.

As for the list of Boston events, when we used the algorithm to create that list, the results were overwhelmingly Boston events. Some of the largest public events were not found on the most popular event Websites, but they were on our list.

This is the first practical solution to the problem of “growing a list.” Using other tools, the results don’t even come close to this algorithm’s results. Indeed, when we used Google Sets to create a list of Jewish foods, the resulting list included foods like “baked chicken chimichangas” and “chicken quesadillas,” which are not exactly quintessential Jewish foods. Our list, consisting of things like “noodle kugel” and “potato latkes” was much better.

This new algorithm can locate and organize information in a way that no other tool can. Its results truly make growing a list possible and practical, whereas previous generations of this kind of tool were so poor that there was no point in trying to use them. This algorithm can help humans process the overload of information that characterizes our digital age in a way that just wasn’t possible before.

Cynthia Rudin is an assistant professor of statistics and coauthor of “Growing a List”

Hear more on WGBH-FM’s Innovation Hub

Tags:

list search engine MIT Sloan algorithm list building Internet search