Computer Science Colloquium Series: Towards Human-like Understanding of Visual Content: Facilitating Search and Decoding Visual Media
December 11, 2017
Wegmans Hall 1400 (auditorium), River Campus
Assistant Professor of Computer Science, University of Pittsburgh
TITLE: Towards Human-like Understanding of Visual Content: Facilitating Search and Decoding Visual Media
In the first part of this talk, I will describe our work on interactive image search. We introduced a new form of interaction for search, where the user can give rich feedback to the system via semantic visual attributes (e.g., "metallic", "pointy", and "smiling"). The proposed WhittleSearch approach allows users to narrow down the pool of relevant images by comparing individual properties of the results to those of the desired target. Building on this idea, we develop a system-guided version of the method which engages the user in a 20-questions-like game where the answers are visual comparisons. To ensure that the system interprets the user's attribute-based feedback as intended, we further show how to efficiently adapt a generic model for an attribute to more closely align with the individual user's perception. Our work transforms the interaction between the image search system and its user from keywords and clicks to precise and natural language-based communication. We demonstrate the impact of this new search modality for effective retrieval on databases ranging from consumer products to human faces. This is an important step in making the output of vision systems more useful, by allowing users to both express their needs better and better interpret the system's predictions.
In the second part of my talk, I will discuss two recent projects on using computer vision to analyze images in the media, which often have persuasive intents that lie beyond the physical content. As a first step in understanding persuasion in the visual media, we released a dataset of 64,832 image ads, and a video dataset of 3,477 ads, containing rich annotations about the subject, sentiment, and rhetoric of the ads. The key task we focus on is the ability of a computer vision system to answer questions about the actions the viewer is prompted to take and the reasoning that the ad presents to persuade the viewer. To help perform this task, we focus on two challenges: decoding the symbolic references that ads make (e.g. a dove symbolizes peace), and recognizing objects in the severely non-photorealistic portrayals that some ads use. In a second media understanding project, we develop a method that captures photographers’ styles and predicts the authorship of artistic photographs. To explore the feasibility of current computer vision techniques to address photographer identification, we create a new dataset of over 180,000 images taken by 41 well-known photographers. We examine the effectiveness of a variety of features and convolutional neural networks for this task. We also use what our method has learned to generate new “pastiche” photographs in the style of an author.
Adriana Kovashka is an Assistant Professor in Computer Science at the University of Pittsburgh. She received her PhD in 2014 from The University of Texas at Austin. Her research interests primarily lie in computer vision, with some overlap in machine learning, information retrieval, natural language processing, and human computation. Her work is funded by two NSF grants and a Google Faculty Research Award. Her research has been published in the top computer vision conferences, such as Computer Vision and Pattern Recognition (CVPR) and the International Conference on Computer Vision (ICCV), as well as the annual conference of the Association for Computational Linguistics (ACL). She has served as Area Chair for CVPR 2018, Tutorial Chair for WACV 2018, and Doctoral Consortium Chair for CVPR 2015-2017.