Wenkai Yang

New method improves identification of AI-generated text.

2025-10-14T11:21:30+00:00 ― 7 min read

This article examines the threat of backdoor attacks on language model agents.

2025-09-07T01:39:18+00:00 ― 5 min read

Research reveals significant security risks in chat models from backdoor attacks.

2025-08-23T12:52:12+00:00 ― 6 min read

Explores the challenges of supervising advanced AI models with weaker counterparts.

2025-07-27T15:26:24+00:00 ― 6 min read