Must-read story on Common Crawl — the scraped internet data behind many LLMs. They tell publishers they are making progress on takedown requests, but … nope!
Glad we have journalists with tech chops like Alex Reisner who can test their claims
The Nonprofit Doing the AI Industry’s Dirty Work