BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Penn Engineering Events - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Penn Engineering Events
X-ORIGINAL-URL:https://seasevents.nmsdev7.com
X-WR-CALDESC:Events for Penn Engineering Events
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240424T100000
DTEND;TZID=America/New_York:20240424T110000
DTSTAMP:20260403T153914
CREATED:20240415T203631Z
LAST-MODIFIED:20240415T203631Z
UID:11343-1713952800-1713956400@seasevents.nmsdev7.com
SUMMARY:PICS Colloquium: "Exploiting time-domain parallelism to accelerate neural network training and PDE constrained optimization"
DESCRIPTION:This talk will explore methods for accelerating numerical optimization constrained by transient problems using parallelism. Two types of transient problems will be considered. In the first case training algorithms for Neural ODEs will be discussed. Neural ODEs are a class of neural network architecture where the depth of the neural network (the layers) is modeled as a continuous time domain. For the second case\, transient PDE-constrained optimization problems will be described. In either case\, simulation-based optimization requires repeated executions of the simulator’s forward and backward (adjoint) time integration schemes. Consequently\, the arrow of time creates a major sequential bottleneck in the optimization process. Second\, for performance these methods rely strongly on the available parallelization for the forward and adjoint solves. Thus\, when forward and adjoint solvers are already operating at the limit of strong scaling and hardware utilization\, the arrow-of-time bottleneck cannot be overcome by additional parallelization across the spatial grid or network layers.  \nDeep neural networks are a powerful machine learning tool with the capacity to‚ learn complex nonlinear relationships described by large data sets. Despite their success training these models remains a challenging and computationally intensive undertaking. We will present a layer-parallel training algorithm that exploits a multigrid scheme to accelerate both forward and backward propagation. Introducing a parallel decomposition between layers requires inexact propagation of the neural network. The multigrid method used in this approach stitches these subdomains together with sufficient accuracy to ensure rapid convergence. We demonstrate an order of magnitude wall-clock time speedup over the serial approach\, opening a new avenue for parallelism that is complementary to existing approaches. We also discuss applying the layer-parallel methodology to recurrent neural networks and transformer architectures.  \nThe second half of this talk focuses on PDE-constrained optimization formulations. Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new 2-level domain decomposition preconditioner to solve these linear systems when constrained by the heat equation. Our approach leverages the observation that the Schur-complement is elliptic in time\, and thus amenable to classical domain decomposition methods. Further\, the application of the preconditioner uses existing time integration routines to facilitate implementation and maximize software reuse. The performance of the preconditioner is examined in an empirical study demonstrating the approach is scalable with respect to the number of time steps and subdomains.
URL:https://seasevents.nmsdev7.com/event/pics-colloquium-exploiting-time-domain-parallelism-to-accelerate-neural-network-training-and-pde-constrained-optimization/
LOCATION:PICS Conference Room 534 – A Wing \, 5th Floor\, 3401 Walnut Street\, Philadelphia\, PA\, 19104\, United States
CATEGORIES:Colloquium
ORGANIZER;CN="Penn Institute for Computational Science (PICS)":MAILTO:dkparks@seas.upenn.edu
END:VEVENT
END:VCALENDAR