Worth a read: OpenAI released an eval for real work tasks across a bunch of industries. They didn’t release the individual results (lame), but you can replicate them from the prompts and files.
A usable nugget: If you’re outputting pdfs, xlsx or pptx, use Claude.
https://openai.com/index/gdpval/
