Loading Events

CIS Seminar: “Language as a Scaffold for Grounded Intelligence:

March 13, 2019 at 11:00 AM - 12:00 PM
Details
Date: March 13, 2019
Time: 11:00 AM - 12:00 PM
Organizer
Computer and Information Science
Phone: 215-898-8560
Venue
Wu and Chen Auditorium (Room 101), Levine Hall 3330 Walnut Street
Philadelphia
PA 19104
Google Map
Abstract:
Natural language can be used to construct rich, compositional descriptions of the world, highlighting for example entities (nouns), events (verbs), and the interactions between them (simple sentences). In this talk, I show how compositional structure around verbs and nouns can be repurposed to build computer vision systems that scale to recognize hundreds of thousands of visual concepts in images. I introduce the task of situation recognition, where the goal is to map an image to a language-inspired structured representation of the main activity it depicts. The problem is challenging because it requires recognition systems to identify not only what entities are present, but also how they are participating within an event (e.g. not only that there are scissors but they are they are being used to cut). I also describe new deep learning models that better capture compositionality in situation recognition and leverage the close connection to language ‘to know what we don’t know’ and cheaply mine new training data. Although these methods work well, I show that they have a tendency to amplify underlying societal biases in the training data (including over predicting stereotypical activities based on gender), and introduce a new dual decomposition method that significantly reduces this amplification without sacrificing classification accuracy. Finally, I propose new directions for expanding what visual recognition systems can see and ways to minimize the encoding of negative social biases in our learned models.