Neel Nanda

This study investigates how circuit analysis techniques apply to a large language model.

2025-10-18T14:30:12+00:00 ― 5 min read

Examining how AI interprets and interacts with the game Othello.

2025-10-01T09:20:00+00:00 ― 6 min read

Activation patching reveals insights into language models' outputs and behaviors.

2025-09-21T09:57:48+00:00 ― 5 min read

The study investigates universal neurons in GPT-2 models and their roles.

2025-09-15T08:28:18+00:00 ― 4 min read

Researchers investigate how models adapt when components are removed.

2025-09-04T18:52:54+00:00 ― 6 min read

A closer look at causal attribution methods for large language models.

2025-09-02T11:58:36+00:00 ― 6 min read

Sparse autoencoders enhance the interpretability of AI systems and their decision-making processes.

2025-08-11T02:07:06+00:00 ― 18 min read

Learn how transcoders help clarify complex language models.

2025-07-27T21:14:00+00:00 ― 5 min read

This article examines how certain neurons affect uncertainty in language model predictions.

2025-07-25T08:47:54+00:00 ― 6 min read

This study uses sparse autoencoders to interpret attention layer outputs in transformers.

2025-07-24T13:50:18+00:00 ― 6 min read

JumpReLU SAEs improve data representation while keeping it simple and clear.

2025-07-10T09:44:36+00:00 ― 7 min read

Gemma Scope offers tools for better understanding language models and improving AI safety.

2025-06-30T01:33:06+00:00 ― 6 min read

New metrics improve understanding of Sparse Autoencoders in neural networks.

2025-05-05T05:38:40+00:00 ― 7 min read

BatchTopK sparse autoencoders improve language processing through smart data selection.

2025-03-13T09:22:29+00:00 ― 5 min read