ASSET Seminar: “Scaling Your Large Language Models on a Budget” (Atlas Wang, University of Texas at Austin)
January 17, 2024 at 12:00 PM - 1:15 PM
Details
Venue
ABSTRACT:
As the sizes of Large Language Models (LLMs) continue to grow exponentially, it becomes imperative to explore novel computing paradigms that can address the dual challenge of scaling these models while adhering to constraints posed by compute and data resources. This presentation will delve into several strategies aimed at alleviating this dilemma: (1) refraining from training models entirely from scratch, instead making use of readily available pre-trained models to optimize the training starting point of a new, larger model; (2) leveraging this concept of progressive initialization to enhance compute and data efficiency during the neural scaling process; (3) integrating hardness-aware data sampling, and more memory-efficient optimizers (work in progress). The talk will be concluded by a few (informal) thoughts and reflections.

