ASSET Seminar: “Rethinking Test-Time Thinking: From Token-Level Rewards to Robust Generative Agents”
September 10, 2025 at 12:00 PM - 1:15 PM
Organizer
We present a unified perspective on test-time thinking as a lens for improving generative AI agents through finer-grained reward modeling, data-centric reasoning, and robust alignment. Beginning with GenARM, we introduce an inductive bias for denser, token-level reward modeling that guides generation during decoding, enabling token-level alignment without retraining. While GenARM targets reward design, ThinkLite-VL focuses on the data side of reasoning. It proposes a self-improvement framework that selects the most informative samples via MCTS-guided search, yielding stronger visual reasoning with fewer labels. Taking this a step further, MORSE-500 moves beyond selection to creation: it programmatically generates targeted, controllable multimodal data to systematically probe and stress-test models’ reasoning abilities. We then interrogate a central assumption in inference-time alignment: Does Thinking More Always Help? Our findings reveal that increased reasoning steps can degrade performance–not due to better or worse reasoning per se, but due to rising variance in outputs, challenging the naive scaling paradigm. Finally, AegisLLM applies test-time thinking in the service of security, using an agentic, multi-perspective framework to defend against jailbreaks, prompt injections, and unlearning attacks–all at inference time. Together, these works chart a path toward generative agents that are not only more capable, but more data-efficient, introspective, and robust in real-world deployment.
Seminar Recording: https://drive.google.com/file/d/13jOKuou0QzqkMo9QHEdoHA1nCIxOPsbm/view?usp=drive_link

