Convergent Thinking

I adjust numbers until other numbers get smaller.

Recent

Tabula Rasa
Jun 15, 2026 1 min read [md]
What reaches a child during formation is curated. What reaches a model is not.
The First Token
Apr 19, 2026 1 min read [md]
Not all problems decompose left to right.
Steer Before You Shrink
Mar 25, 2026 1 min read [md]
Training methods that steer optimization scale. Methods that restrict the network don't.
Bias Compounds, Variance Washes Out
Mar 12, 2026 1 min read [md]
Round-to-nearest makes the same error every time. Stochastic rounding doesn't. Over long runs, that's everything.
Trajectory
Jan 19, 2026 2 min read [md]
You won each decision and lost the trajectory.
AVnorm
Jan 10, 2026 2 min read [md]
Per-head normalization on attention outputs fixes length generalization.
The Box
Jan 1, 2026 1 min read [md]
The hardest box to escape is the one you cannot see.
Attention Normalizes the Wrong Norm
Dec 23, 2025 1 min read [md]
Softmax constrains the L1 norm to 1, but should constrain the L2 norm.
People are the new oil
Dec 13, 2025 2 min read [md]
Compute used to be the bottleneck. Now it's people.