Evaluating techniques for language models to responsibly refuse harmful queries.
Kinshuk Vasisht, Navreet Kaur, Danish Pruthi
― 5 min read
New Science Research Articles Everyday
Evaluating techniques for language models to responsibly refuse harmful queries.
Kinshuk Vasisht, Navreet Kaur, Danish Pruthi
― 5 min read