Andy Zou

A new benchmark aims to measure and mitigate AI-related dangers.

2025-09-01T08:35:24+00:00 ― 5 min read

This article discusses issues and best practices for evaluating language models.

2025-08-08T10:07:42+00:00 ― 7 min read

Circuit breakers provide a new method to prevent harmful AI outputs effectively.

2025-08-01T13:32:42+00:00 ― 3 min read

A new method improves tamper resistance in open-weight language models.

2025-07-03T22:14:42+00:00 ― 7 min read