In order for OpenAI’s ChatGPT to work at its best, it needs to consume as much data and information as it can. As questions swirl around the legality of doing so and proprietary rights around that information, several major media companies are blocking the web crawling from scanning their online platforms for content.
After The Guardian’s Ariel Bogle reported last week that CNN, The New York Times, and Reuters were blocking GPTBot, OpenAI’s web crawler, CNN found that many more media companies were doing the same thing, including Disney, Bloomberg, The Washington Post, The Atlantic, Axios, Insider, ABC News, ESPN, Condé Nast, Hearst, and Vox Media.
As for why these massive companies would feel the need to prevent AI companies like OpenAI from scrapping their data, it comes down to how valuable it is compared to most of what’s out there.
“Most of the internet is garbage,” an anonymous news executive told CNN’s Oliver Darcy. “Traditional media publishers, on the other hand, are fact-driven and offer quality content.”
While none of the media companies would go on the record about the blocking effort, one industry executive told CNN that it’s clear that news organizations see the data-scraping as a threat to their ability to own their content.
“I see a heightened sense of urgency when it comes to addressing the use, and misuse, of our content,” Danielle Coffey, president and chief executive of the News Media Alliance, said. “One publisher told me it is an existential threat. Another publisher told me there isn’t a business model with certain uses of AI… there is a sense of urgency to address this.”
The problem for AI-based programs like ChatGPT is that they need a constant stream of fresh, valuable content in order to continually improve. Otherwise, its accuracy and dependability will degrade rapidly. If you think AI-generated content is bad now, just wait.
For better or worse, AI isn’t going anywhere, so the logical solution to all of this is for media organizations and AI companies to come to the table and figure out how to co-exist. Perhaps that will lead to financial incentives to share content and data. Perhaps it will require some extensive legal agreements. Whatever it is, this silent blockage will only last for so long.