Fall 2025 GRASP on Robotics: Alan Yuille, Johns Hopkins University, “3D Vision Language Models and Interactive World Models”
/
Wu and Chen Auditorium (Room 101), Levine Hall
3330 Walnut Street, Philadelphia, PA, United States
This event will be in-person ONLY in Wu and Chen Auditorium. ABSTRACT Vision Language Models (VLMs) are extremely successful, but their performance degrades when asked questions involving spatial relations and 3D world knowledge. Inspired by Cognitive Science, we develop 3D VLMs which are 3D-aware and 3D-explicit to help us to diagnose their failure nodes. We […]

