A new method enhances AI training for safety and helpfulness.
Tong Mu, Alec Helyar, Johannes Heidecke
― 5 min read
Cutting edge science explained simply
A new method enhances AI training for safety and helpfulness.
Tong Mu, Alec Helyar, Johannes Heidecke
― 5 min read
Deliberative Alignment aims to make AI language models safer and more reliable.
Melody Y. Guan, Manas Joglekar, Eric Wallace
― 5 min read