Advancing LLM Reasoning: Exploring Reinforced Learning and Process Reward Models for Scalable Data and Test-Time Scaling
Recent advancements in large language models (LLMs) are unlocking new capabilities in structured reasoning and abstract thought, bringing us closer to artificial general intelligence (AGI). Training these models to effectively reason is challenging due to the reliance on costly human-annotated data, which limits their ability to generalize. Researchers from Tsinghua University, Emory University, and HKUST ...