Blog | _mediumroast

My Memory System Cheated To Beat LongMemEval Until I Fixed It

May 03, 2026 9 min read #coding/ai #memory

Last week I ran Weft, my homegrown memory layer, through LongMemEval and it scored 69.0% overall, 72.1% task-averaged. I was pleased. I shouldn't have been. The number was a lie, and the system that produced it was destroying the test data and calling it a win.

Read Article →

I Benchmarked Anthropic's Advisor Strategy on Task Decomposition. The Expensive Model Was the Worst.

April 09, 2026 14 min read #coding/ai #benchmarking

Anthropic's Advisor Strategy promises near-Opus intelligence at near-Sonnet cost. A server-side tool that pairs a cheap executor model with an expensive advisor.

Read Article →

Why I Started Using AI Seriously In 2026

January 25, 2026 12 min read

I used AI for the better part of 2025 because I had to understand it for work.

Read Article →

Making a CMS

January 25, 2026 1 min read #cms #python #webdev

I was recently watching a video about vibecoding where they said that you really needed to use a model for a year before you got to the point where you could trust it.

Read Article →