May 03, 2026
9 min read
#coding/ai #memory
Last week I ran Weft, my homegrown memory layer, through LongMemEval and it scored 69.0% overall, 72.1% task-averaged. I was pleased. I shouldn't have been. The number was a lie, and the system that produced it was destroying the test data and calling it a win.
Read Article →
April 09, 2026
14 min read
#coding/ai #benchmarking
Anthropic's Advisor Strategy promises near-Opus intelligence at near-Sonnet cost. A server-side tool that pairs a cheap executor model with an expensive advisor.
Read Article →
January 25, 2026
12 min read
I used AI for the better part of 2025 because I had to understand it for work.
Read Article →
January 25, 2026
1 min read
#cms #python #webdev
I was recently watching a video about vibecoding where they said that you really needed to use a model for a year before you got to the point where you could trust it.
Read Article →