Examining risks of many-shot jailbreaking in Italian language models.
Fabio Pernisi, Dirk Hovy, Paul Röttger
― 4 min read
Cutting edge science explained simply
Examining risks of many-shot jailbreaking in Italian language models.
Fabio Pernisi, Dirk Hovy, Paul Röttger
― 4 min read
New dataset reveals challenges in detecting hate speech across languages and platforms.
Manuel Tonneau, Diyi Liu, Niyati Malhotra
― 7 min read