Evaluating techniques for language models to responsibly refuse harmful queries.
Kinshuk Vasisht, Navreet Kaur, Danish Pruthi
― 5 min read
Cutting edge science explained simply
Evaluating techniques for language models to responsibly refuse harmful queries.
Kinshuk Vasisht, Navreet Kaur, Danish Pruthi
― 5 min read