Controlling AI Agents Through Resampling: Strategies for Effective AI Alignment and Decision-Making
A new paper, “Ctrl-Z: Controlling AI Agents via Resampling,” presents the most extensive study of AI control techniques to date, aiming to prevent catastrophic failures from misaligned AIs. The research introduces BashBench, a dataset featuring complex multi-step tasks, and novel resampling protocols that significantly enhance existing control measures. The findings show that these techniques can ...