BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Penn Engineering Events - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Penn Engineering Events
X-ORIGINAL-URL:https://seasevents.nmsdev7.com
X-WR-CALDESC:Events for Penn Engineering Events
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20220313T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20221106T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20230929T103000
DTEND;TZID=America/New_York:20230929T114500
DTSTAMP:20260404T032735
CREATED:20230905T135045Z
LAST-MODIFIED:20230905T135045Z
UID:9612-1695983400-1695987900@seasevents.nmsdev7.com
SUMMARY:Fall 2023 GRASP on Robotics: Stefano Soatto\, AWS & UCLA\, "Toward Foundational Models of Physical Scenes: From Large Language Models to World Models and Back"
DESCRIPTION:This is a hybrid event with in-person attendance in Wu and Chen and virtual attendance on Zoom. \nABSTRACT\nNow that a significant fraction of human knowledge has been shared through the Internet\, scraped and squashed into the weights of Large Language Models (LLMs)\, do we still need embodiment and interaction with the physical world to build representations? Is there a dichotomy between LLMs and “large world models”? What is the role of visual perception in learning such models? Can perceptual agents trained by passive observation learn world models suitable for control? \nTo begin tackling these questions\, I will first address the issue of controllability of LLMs. LLMs are stochastic dynamical systems\, for which the notion of controllability is well established: The state (“of mind”) of an LLM can be trivially steered by a suitable choice of input given enough time and memory. However\, the space of interest for control of an LLM is not that of words\, but that of “meanings” expressible as sentences that a human could have spoken and would understand. Unfortunately\, unlike controllability\, the notions of meaning and understanding are not usually formalized in a way that is relatable to LLMs in use today. \nI will propose a simplistic definition of meaning that reflects the functional characteristics of a trained LLM. I will show that a well-trained LLM establishes a topology in the space of meanings\, represented by equivalence classes of trajectories of underlying dynamical model (LLM). Then\, I will describe both necessary and sufficient conditions for controllability in such a space of meanings. \nI will then highlight the relation between meanings induced by a trained LLM upon the set of sentences that could be uttered\, and “physical scenes” underlying sets of images that could be observed. In particular\, a physical scene can be defined uniquely and inferred as an abstract concept without the need for embodiment\, a view aligned with J. Koenderink’s characterization of images as “controlled hallucinations.” \nLastly\, I will show that popular models ostensibly used to represent the 3D scene (Neural Radiance Fields\, or NeRFs) can at most represent the images on which they are trained\, but not the underlying physical scene. However\, composing a NeRF with a Latent Diffusion Model or other inductively-trained generative model yields a viable representation of the physical scene. Such a model class\, which can be learned through passive observations\, is a first albeit rudimentary Foundational Model of physical scenes in the sense of being sufficient for any downstream inference task based on visual data.
URL:https://seasevents.nmsdev7.com/event/fall-2023-grasp-on-robotics-stefano-soatto-aws-ucla-toward-foundational-models-of-physical-scenes-from-large-language-models-to-world-models-and-back/
LOCATION:Wu and Chen Auditorium (Room 101)\, Levine Hall\, 3330 Walnut Street\, Philadelphia\, PA\, 19104\, United States
CATEGORIES:Seminar
ORGANIZER;CN="General Robotics%2C Automation%2C Sensing and Perception (GRASP) Lab":MAILTO:grasplab@seas.upenn.edu
END:VEVENT
END:VCALENDAR