Loading Events

ASSET Seminar: “Efficient Sharing of AI Infrastructures with Specialized Serverless Computing”

January 29, 2025 at 12:00 PM - 1:15 PM
Details
Date: January 29, 2025
Time: 12:00 PM - 1:15 PM
  • Event Tags:, ,
  • Venue
    Amy Gutmann Hall, Room 414 3333 Chestnut Street
    Philadelphia
    19104
    Google Map

    Abstract:

    The efficient sharing of AI infrastructures is becoming increasingly important in both public and private data centers. This demand is driven by two key factors: the proliferation of specialized AI models tailored for different users and applications, and the highly dynamic nature of requests, which are often on-demand. Dedicated GPU allocation in such scenarios results in prohibitively high costs and inefficient resource utilization.

    In this talk, I will introduce serverless computing as a promising paradigm for addressing these challenges by enabling efficient, on-demand sharing of AI infrastructures. I will highlight its use cases and discuss key barriers to broader adoption. Following this, I will present ServerlessLLM, a state-of-the-art system designed to tackle key challenges in serverless large language model (LLM) inference, particularly cold-start latency. Specifically, I will cover ServerlessLLM’s novel contributions, including its checkpoint format design, locality-aware scheduling, and inference request live migration. Finally, I will outline open challenges beyond efficiency, such as fairness, privacy, and sustainability, which are critical for the future of serverless AI systems.

    Zoom Link (if unable to attend in-person): https://upenn.zoom.us/j/95090162762