My Memory System Cheated To Beat LongMemEval Until I Fixed It

Last week I ran Weft, my homegrown memory layer, through LongMemEval and it scored 69.0% overall, 72.1% task-averaged. I was pleased. I shouldn't have been. The number was a lie, and the system that produced it was destroying the test data and calling it a win.

Read Article →

Making a CMS

I was recently watching a video about vibecoding where they said that you really needed to use a model for a year before you got to the point where you could trust it.

Read Article →