- Kevin Schaul

Visual journalist/hacker covering AI

Sep 29, 2025

Just ran some evals on Claude Sonnet 4.5. It’s better than 4 on some but worse on a lot. LLM progress is so weird. You really gotta test this stuff on what you care about.