ESE Spring Seminar – “Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning”
March 13, 2023 at 11:00 AM - 12:00 PM
Organizer
Venue
The unabated pursuit for omniscient and omnipotent AI is levying hefty latency, memory, and energy taxes at all computing scales. At the same time, the end of Dennard scaling is sunsetting traditional performance gains commonly attained with reduction in transistor feature size. Faced with these challenges, my research is building a heterogeneity of solutions co-optimized across the algorithm, memory subsystem, hardware architecture, and silicon stack to generate breakthrough advances in arithmetic performance, compute density and flexibility, and energy efficiency for on-chip machine learning, and natural language processing (NLP) in particular. I will start, in the algorithm front, by discussing award-winning work on developing a novel floating-point based data type, AdaptivFloat, which enables resilient quantized AI computations; and is particularly suitable for NLP networks with very large parameter distribution. Then, I will describe a 16nm chip prototype that adopts AdaptivFloat in the acceleration of noise-robust AI speech and machine translation tasks – and whose fidelity to the front-end application is verified via a formal hardware/software compiler interface. Towards the goal of lowering the prohibitive energy cost of inferencing large language models on TinyML devices, I will describe a principled algorithm-hardware co-design solution, validated in a 12nm chip tapeout, that accelerates Transformer workloads by tailoring the accelerator’s latency and energy expenditures according to the complexity of the input query it processes. Finally, I will conclude with some of my current and future research efforts on further pushing the on-chip energy-efficiency frontiers by leveraging specialized non-conventional dynamic memory structures for on-device training — and recently prototyped in a 16nm tapeout.

