Understanding the pitfalls of reward hacking in AI systems and its implications.
Yuchen Zhu, Daniel Augusto de Souza, Zhengyan Shi
― 8 min read
Cutting edge science explained simply
Understanding the pitfalls of reward hacking in AI systems and its implications.
Yuchen Zhu, Daniel Augusto de Souza, Zhengyan Shi
― 8 min read