This study uses sparse autoencoders to interpret attention layer outputs in transformers.
― 6 min read
Cutting edge science explained simply
This study uses sparse autoencoders to interpret attention layer outputs in transformers.
― 6 min read
JumpReLU SAEs improve data representation while keeping it simple and clear.
― 7 min read
Gemma Scope offers tools for better understanding language models and improving AI safety.
― 6 min read
A method to improve steering vector effectiveness in language models.
― 5 min read