Name: FOLDS seminar: Theory and practice of LLM quantization
Start: 2025-10-02T12:00:00-04:00
End: 2025-10-02T13:00:00-04:00
Location: Amy Gutmann Hall, Room 306

FOLDS seminar: Theory and practice of LLM quantization

October 2, 2025 at 12:00 PM - 1:00 PM

Share this event

Add to Calendar

Details

Date: October 2, 2025

Time: 12:00 PM - 1:00 PM

Event Category: SeminarColloquium

Event Tags:ASSET, CIS, ESE, IDEAS, STAT, PENN AI

Organizer

IDEAS Center

Website: View Organizer Website

Venue

Amy Gutmann Hall, Room 306 3317 Chestnut Street
Philadelphia
PA 19104 Google Map

Zoom link: https://upenn.zoom.us/j/98220304722

Modern LLMs process information by repeatedly applying a basic primitive of matrix multiplication. Estimates show that about 60-84% of the energy consumed by LLMs goes into memory load/store operations. How can we reduce this power consumption? Tokens start as about 16-bit integers but get mapped to vectors of floats of length in the 1000s, suggesting very low information density per dimension. Thus, unsurprisingly there has been much success in reducing precision of both weights and activations without much loss in LLM performance. In this talk we will present information-theoretic analysis of quantized representations and show how it lead us to creating NestQuant, a new SOTA algorithm for weight/KV-cache/activations (ICML’2025).

FOLDS seminar: Theory and practice of LLM quantization

October 2, 2025 at 12:00 PM - 1:00 PM

Details

Organizer

Venue

Read More