BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Penn Engineering Events - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Penn Engineering Events
X-ORIGINAL-URL:https://seasevents.nmsdev7.com
X-WR-CALDESC:Events for Penn Engineering Events
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20200308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20201101T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20210314T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20211107T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20220313T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20221106T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20210312T110000
DTEND;TZID=America/New_York:20210312T120000
DTSTAMP:20260407T021958
CREATED:20210215T192651Z
LAST-MODIFIED:20210215T192651Z
UID:4287-1615546800-1615550400@seasevents.nmsdev7.com
SUMMARY:ESE Seminar: "Demystifying (Deep) Reinforcement Learning: The Optimist\, The Pessimist\, and Their Provable Efficiency"
DESCRIPTION:Coupled with powerful function approximators such as deep neural networks\, reinforcement learning (RL) achieves tremendous empirical successes. However\, its theoretical understandings lag behind. In particular\, it remains unclear how to provably attain the optimal policy with a finite regret or sample complexity. In this talk\, we will present the two sides of the same coin\, which demonstrates an intriguing duality between pessimism and optimism. \n– In the online setting\, we aim to learn the optimal policy by actively interacting with an environment. To strike a balance between exploration and exploitation\, we propose an optimistic least-squares value iteration algorithm\, which achieves a \sqrt{T} regret in the presence of linear\, kernel\, and neural function approximators. \n– In the offline setting\, we aim to learn the optimal policy based on a dataset collected a priori. Due to a lack of active interactions with the environment\, we suffer from the insufficient coverage of the dataset. To maximally exploit the dataset\, we propose a pessimistic least-squares value iteration algorithm\, which achieves a minimax-optimal sample complexity.
URL:https://seasevents.nmsdev7.com/event/ese-seminar-demystifying-deep-reinforcement-learning-the-optimist-the-pessimist-and-their-provable-efficiency/
LOCATION:Zoom – Email ESE for Link jbatter@seas.upenn.edu
CATEGORIES:Seminar,Faculty,Colloquium,Student
ORGANIZER;CN="Electrical and Systems Engineering":MAILTO:eseevents@seas.upenn.edu
END:VEVENT
END:VCALENDAR