Last week I ran Weft, my homegrown memory layer, through LongMemEval and it scored 69.0% overall, 72.1% task-averaged. I was pleased. I shouldn't have been. The number was a lie, and the system that produced it was destroying the test data and calling it a win.
View Details →Anthropic's Advisor Strategy promises near-Opus intelligence at near-Sonnet cost. A server-side tool that pairs a cheap executor model with an expensive advisor.
View Details →