Blogs

Sparse Autoencoders for Monosemanticity

Interpretability

An exploration of Sparse Autoencoders as a tool for decomposing polysemantic neural network representations into interpretable, monosemantic features.

Mar 30, 2026

Categories

Sparse Autoencoders for Monosemanticity