Paul Röttger

A new test suite evaluates safety behaviors in language models.

2025-10-13T08:22:00+00:00 ― 5 min read

Research challenges traditional methods of evaluating language model values and opinions.

2025-09-03T21:41:00+00:00 ― 6 min read

A new dataset aims to improve hate speech detection models for the German language.

2025-08-24T16:39:06+00:00 ― 5 min read

A review of datasets focused on enhancing LLM safety.

2025-08-21T08:04:18+00:00 ― 6 min read

Exploring the responsible use of generative AI technology in various fields.

2025-08-16T14:18:42+00:00 ― 7 min read

Examines cultural bias in hate speech datasets and its impact on detection systems.

2025-08-15T23:10:12+00:00 ― 7 min read

Study shows larger models don’t guarantee better persuasive messages.

2025-07-26T08:29:54+00:00 ― 6 min read

Examining risks of many-shot jailbreaking in Italian language models.

2025-06-30T11:01:54+00:00 ― 4 min read

New dataset reveals challenges in detecting hate speech across languages and platforms.

2025-05-11T02:40:00+00:00 ― 7 min read