February 14, 2014
10:45 AM - 11:45 AM
Computer Studies Building, Room 209

Qiang Liu

In recent years, intelligent systems have become much more powerful by exploiting "big data", incorporating massive volumes of data to improve their predictions. However, many of these data require some human intervention: labeling, rating, or otherwise curating or annotating the raw values. To accomplish this, crowdsourcing approaches outsource these human judgment tasks through the Internet. However, the (usually anonymous) crowd members are diverse in their quality and often unreliable or biased. This gives rise to a computational challenge of how to properly aggregate the results of the diverse crowd, and how to correct for bias by injecting a small amount of expert knowledge.

Probabilistic graphical models provide a powerful framework for aggregating multiple sources of information and reasoning over large numbers of variables. In this talk, I show how to approach the crowdsourcing problem using graphical model tools, which make it possible to leverage powerful inference algorithms such as belief propagation (BP) for crowd aggregation. When estimating continuous quantities such as event probabilities, point spreads and economic indicators, humans judgements are often systematically biased, which can be corrected only with extra ground truth information (e.g., qualification tests or control questions). We study the problem of how many control questions to use: more control questions evaluates the workers better, but leaves fewer questions for the unknown targets, and vice versa. We present theoretical results for this problem under different scenarios, and provide a simple rule of thumb for practice.

Qiang Liu is a Ph.D. candidate in the Bren School of Information and Computer Science at UC Irvine. His research focuses on machine learning and probabilistic graphical models, with applications to areas such as sensor networks, computational biology and crowdsourcing. He received a Microsoft Research Fellowship in 2011, and a notable paper award at the 2011 AI and Statistics conference.