Loading Events

FOLDS seminar: Theory and practice of LLM quantization

October 2, 2025 at 12:00 PM - 1:00 PM
Details
Date: October 2, 2025
Time: 12:00 PM - 1:00 PM
Event Category: SeminarColloquium
  • Event Tags:, , , , ,
  • Organizer
    IDEAS Center
    Venue
    Amy Gutmann Hall, Room 306 3317 Chestnut Street
    Philadelphia
    PA 19104
    Google Map

    Zoom link: https://upenn.zoom.us/j/98220304722

     

    Modern LLMs process information by repeatedly applying a basic primitive of matrix multiplication. Estimates show that about 60-84% of the energy consumed by LLMs goes into memory load/store operations. How can we reduce this power consumption? Tokens start as about 16-bit integers but get mapped to vectors of floats of length in the 1000s, suggesting very low information density per dimension. Thus, unsurprisingly there has been much success in reducing precision of both weights and activations without much loss in LLM performance. In this talk we will present information-theoretic analysis of quantized representations and show how it lead us to creating NestQuant, a new SOTA algorithm for weight/KV-cache/activations (ICML’2025).