FOLDS seminar: Theory and practice of LLM quantization
/
Amy Gutmann Hall, Room 306
3317 Chestnut Street, Philadelphia, PA, United States
Zoom link: https://upenn.zoom.us/j/98220304722 Modern LLMs process information by repeatedly applying a basic primitive of matrix multiplication. Estimates show that about 60-84% of the energy consumed by LLMs goes into memory load/store operations. How can we reduce this power consumption? Tokens start as about 16-bit integers but get mapped to vectors of floats of length […]

