February 09, 2017
03:30 PM - 04:30 PM
Meliora 203

Thursday, February 9, 2017

3:30 PM – 4:30 PM

Meliora 203

Liang Huang

Oregon State University

Linear-Time Structure Prediction in Language and Biology

Abstract:  Why are computers so bad at processing human languages while so good at programming languages? What is the key difference between English and C++ that makes the former so much harder? In this talk I'll present a linear-time (approximate) dynamic programming algorithm for incremental parsing inspired by both human sentence processing (psycholinguistics) and compiler theory (LR parsing). This algorithm, being linear-time, is much faster than, but also as accurate as, the dominant O(n^3) algorithms. It overcomes the ambiguity explosion problem by local ambiguity packing similar to those found in psycholinguistics.

More interestingly, there is a striking connection between linguistics and biology: natural language parsing and RNA secondary structure prediction use the same very slow O(n^3) algorithms. While natural language sentences are rarely over 100 words, RNA sequences can be as long as 4,000 nucleotides; so there is a critical need for faster algorithms. We can therefore adapt the same linear-time dynamic programming idea to predict secondary structures for RNA sequences in linear-time, which results in orders of magnitude faster predictions without loss of accuracy.

Bio:  Liang Huang is currently an Assistant Professor of EECS at Oregon State University, and a part-time Research Scientist with IBM's Watson Group. Before that, he was Assistant Professor for three years at the City University of New York (CUNY). He graduated from Penn in 2008 and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. Most of his work develops fast algorithms and provable theory to speedup large-scale natural language processing, structured machine learning, and computational structural biology. He has received a Best Paper Award at ACL 2008, a Best Paper Honorable Mention at EMNLP 2016, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), a Yahoo! Faculty Research Award (2015), and a University Teaching Prize at Penn (2005). DARPA, NSF, Google, and Yahoo have supported his research. He also co-authored a best-selling textbook in China on algorithms for programming contests.

Host: Dan Gildea,