Jason Corso, Associate Professor, University of Michigan Talk
May 10, 2016
Jason Corso, Associate Professor, University of Michigan
Cross-Modal Embeddings with Video, Text and Speech
Joint video-language modeling has been attracting increasing atention in recent years, signifying a return to early AI goals of cooperative cognitive systems. However, many approaches fail to leverage the complementarity across vision and language. For example, they may rely on a fixed visual model or fail to leverage the underlying compositional semantics inherent in language. In this talk, I will discuss a sequence of recent work in my group that indeed directly and holistically models vision and language in ways that not only jointly models the visual and the lingual signals but also exploits the compositionality in language to learn better representations. The first method I will discuss jointly embeds a deep video model and a compositional text model that sits on a dependency-tree structure. The joint embedding fine-tunes all three model components together under a unified cost function and affords three tasks: text generation, text retrieval and video retrieval. The second method I will discuss explicitly relates visual and speech signals in a bimodal sparse model. The bimodal model represents visual and speech signals in separate but linked dictionaries faciliting a bidrectional generative capability. Furthermore, we enforce a structure to the dictionaries that captures the compositionality of the underlying spoken language. Both approaches capture visual and lingual signals from the bottom-up and demonstrate the potential of signal-level cross-modal embeddings for realizing next generation cooperative cognitive systems.
Corso is an associate professor of Electrical Engineering and Computer Science at the University of Michigan. He received his PhD and MSE degrees at The Johns Hopkins University in 2005 and 2002, respectively, and the BS Degree with honors from Loyola College In Maryland in 2000, all in Computer Science. He spent two years as a post-doctoral fellow at the University of California, Los Angeles. From 2007-14 he was a member of the Computer Science and Engineering faculty at SUNY Buffalo. He is the recipient of a Google Faculty Research Award 2015, the Army Research Office Young Investigator Award 2010, NSF CAREER award 2009, SUNY Buffalo Young Investigator Award 2011, a member of the 2009 DARPA Computer Science Study Group, and a recipient of the Link Foundation Fellowship in Advanced Simulation and Training 2003. Corso has authored more than one-hundred peer-reviewed papers on topics of his research interest including computer vision, robot perception, data science, and medical imaging. He is a member of the AAAI, ACM, MAA and a senior member of the IEEE.
Host: Jiebo Luo