ASSET Seminar: “Efficient Sharing of AI Infrastructures with Specialized Serverless Computing”
January 29, 2025 at 12:00 PM - 1:15 PM
Details
Abstract:
The efficient sharing of AI infrastructures is becoming increasingly important in both public and private data centers. This demand is driven by two key factors: the proliferation of specialized AI models tailored for different users and applications, and the highly dynamic nature of requests, which are often on-demand. Dedicated GPU allocation in such scenarios results in prohibitively high costs and inefficient resource utilization.
In this talk, I will introduce serverless computing as a promising paradigm for addressing these challenges by enabling efficient, on-demand sharing of AI infrastructures. I will highlight its use cases and discuss key barriers to broader adoption. Following this, I will present ServerlessLLM, a state-of-the-art system designed to tackle key challenges in serverless large language model (LLM) inference, particularly cold-start latency. Specifically, I will cover ServerlessLLM’s novel contributions, including its checkpoint format design, locality-aware scheduling, and inference request live migration. Finally, I will outline open challenges beyond efficiency, such as fairness, privacy, and sustainability, which are critical for the future of serverless AI systems.
Zoom Link (if unable to attend in-person): https://upenn.zoom.us/j/95090162762

