#mechanistic-interpretability
2 posts tagged #mechanistic-interpretability.
-
What an MLP Knows, When It's a Kernel
The transformer MLP is illegible because its primitive does not carry a kernel. Give it one and the four objects that make attention legible follow for free — for the whole network.
-
Attention is Explainable Because it is a Kernel
Self-attention in transformers is a Nadaraya–Watson kernel smoother. That fact — and not "we visualize the matrix" — is why attention heads are readable while MLPs are not.