BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Penn Engineering Events - ECPv6.16.3//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Penn Engineering Events
X-ORIGINAL-URL:https://seasevents.nmsdev7.com
X-WR-CALDESC:Events for Penn Engineering Events
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20251024T103000
DTEND;TZID=America/New_York:20251024T114500
DTSTAMP:20260601T213541
CREATED:20250908T150154Z
LAST-MODIFIED:20250908T150154Z
UID:20966-1761301800-1761306300@seasevents.nmsdev7.com
SUMMARY:Fall 2025 GRASP on Robotics: Alan Yuille\, Johns Hopkins University\, "3D Vision Language Models and Interactive World Models"
DESCRIPTION:This event will be in-person ONLY in Wu and Chen Auditorium. \nABSTRACT\nVision Language Models (VLMs) are extremely successful\, but their performance degrades when asked questions involving spatial relations and 3D world knowledge. Inspired by Cognitive Science\, we develop 3D VLMs which are 3D-aware and 3D-explicit to help us to diagnose their failure nodes. We present two approaches which involve developing datasets with 3D annotations for training the 3D VLMs.  The first works was developed on realistic-synthetic datasets and the 3D VLM is built on a 3D Image Parser. This 3D VLMs significantly outperform conventional VLMs for questions involving 3D/6D (Xingrui Wang et al. CVPR 2025 highlight) and physical reasoning (Xingrui Wang et al.\, ICLR 2025). This work is extended to complex images taking VLMs as base models and evaluated on a 3D comprehensive reasoning benchmark (W. Ma et al. ICCV 2026). We develop a 3D-VLM which significantly outperforms conventional VLMs  when asked questions requiring 3D knowledge (Wufei Ma et al. CVPR 2025 highlight). We further extend this approach to develop a 3D-VLM which performs even better and is also 3D-explicit (Wufei Ma et al. NeurIPS. 2025). We discuss the bigger picture which involves the need for world models as illustrated by (J. Chen et al. ICLR 2025)\, analysis by synthesis (T. Zheng et al. NeurIPS 2025)\, and early detection of cancer using radiology reports (P. Bassi et al. MICCAI 2025).
URL:https://seasevents.nmsdev7.com/event/fall-2025-grasp-on-robotics-alan-yuille-johns-hopkins-university-3d-vision-language-models/
LOCATION:Wu and Chen Auditorium (Room 101)\, Levine Hall\, 3330 Walnut Street\, Philadelphia\, PA\, 19104\, United States
ORGANIZER;CN="General Robotics%2C Automation%2C Sensing and Perception (GRASP) Lab":MAILTO:grasplab@seas.upenn.edu
END:VEVENT
END:VCALENDAR