#ml
4 posts tagged #ml.
-
Attention is Explainable Because it is a Kernel
A reading of self-attention through kernel smoothing and RKHS.
-
Not All Infinities Are Equal
The singularity structure of cross-entropy explains hallucination, the modality gap, and why contrastive losses need such big batches.
-
Opposite Is Not Different
The cosine-similarity scale has three landmarks, not two. Maximum difference is orthogonality, not opposition — and the most influential contrastive losses spent years optimizing for the wrong target.
-
Activations Are Bad for Geometry
Pointwise activations factor into the layer's Jacobian as a diagonal modulation. The same modulation that buys selectivity destroys geometric structure on the data manifold.