The LLMs have caused enough damage. It’s time to take a stand. Here’s our proposal to help publishers fight back.
TL;DR
- Most LLMs provide almost no meaningful traffic or revenue to publishers. According to Similarweb, the News and Media industry has received less than one-third of one percent of traffic share from this marketing channel.
- A strong, multi-layered bot defense system is the only way to prevent the rampant wholesale theft of publisher content from crawlers.
- Publishers must develop an LLM value framework, and use that to determine which bots are let in, and which ones are kept out.
- Many bots don’t respect robots.txt, and conducting a bot accessibility audit will define the scope of the problem, and point the way towards a solution.
- Google is moving towards an AI chatbot interface. Clicks from Google have already diminished with the introduction of AI Overviews. If their transformation from search engine to AI chatbot further collapses referrals, then publishers should consider blocking Googlebot as well.
- The infrastructure, platforms and standards are in place right now to help publishers manage bot traffic, manage access, and develop leverage to receive fair value for their original content. The LLMs have abused publishers for far too long. The time to take a stand against this abuse is now.
Enough is Enough
Generative AI is a direct threat to the open web , human expression, original journalism , and the news and media industry at large. The relationship between Google and the LLMs on the one hand and publishers on the other has been predatory and abusive for far too long. And when Google rolls out its latest changes to search - in which the ten blue links will be further demoted in favor of a conversational chatbot experience - the crisis may escalate to existential levels. The economic incentive for many publishers to continue operating could diminish, and if this occurs en masse, it will be an irredeemable loss to us all.
In this blog post, we stand aligned with publishers like People Inc. and The Atlantic , which have, among others, taken the lead in developing an aggressive bot-blocking strategy. Adopting the approach these publishers have taken will allow the wider publishing industry to:
- Develop a “block by default, allow by agreement” bot management strategy
- Prevent bad actors from gaining free and unfettered access to their content
- Create leverage to develop licensing deals with the AI Labs
- Gain technical expertise for managing the complex AI bot ecosystem
- Lay the groundwork for broader publisher collective action, via organizations like the Real Simple Licensing (RSL) Collective and News Media Alliance
Platforms like Cloudflare , Akamai , Fastly , HUMAN , DataDome and others have allied with the publishing industry, offering solutions to report on and enable bot-blocking. Still others like TollBit , Prorata , x402 and the IAB are developing licensing and monetization protocols and solutions. The market has identified a need, and an ecosystem is starting to emerge that publishers can use to protect and monetize their work.
As the web moves from a click-based to a scraper economy, using these tools to raise the drawbridge above their hard-earned content moat may be the only viable way to ensure publishers’ intellectual property is protected and valued.
Generative AI: The Gift that Keeps on Taking
According to Similarweb, in April 2026 generative AI was responsible for 0.27% of traffic share to the entire News and Media Industry:
Source: Similarweb, News and Media Industry Marketing Channels, US, April 2026
To contextualize the scale of the problem, there are currently 100M US monthly active users of all LLM chatbots:
Sources: ChatGPT US MAU — Insider Intelligence/eMarketer (2022, 2023, 2024 forecasts), industry estimates (mid-2025), Demandsage (May 2026). Market share — Statcounter North America (Jan 2026, ChatGPT 76.59%). Publisher traffic share — Similarweb US news & media category (April 2026).
Within 3 ½ years of its release, nearly ⅓ of the US population regularly uses a chatbot. And yet, despite this widespread adoption - which excludes the Google Search AI implementation - publishers have experienced very little benefit, as the chatbots have been designed to keep users within their own environment. Links and referrals to original sources are an afterthought.
This is why, compared to the two largest traffic sources (direct and organic search), the generative AI trendline looks like a flat line, barely distinguishable from the x-axis on the graph below:
Source: Similarweb, News and Media Industry Marketing Channels, Feb 2025 - April 2026
At this minuscule level of referrals, LLMs are security risks and resource hogs, acting as wholesale content thieves and providing negligible value to publishers in return.
Put a Bouncer at Your Door. Only let in VIPs.
When The Atlantic and People Inc. examined their bot traffic, they both came to the same conclusion: an open-door bot strategy was detrimental to their business. The Atlantic created a bot dashboard, assessed which bots provided value in the form of traffic and subscriptions, and blocked any that failed to meet that criteria. In an extreme case, this resulted in The Atlantic preventing a single bot trying to access the site 564,000 times over a seven-day period. Jon Roberts, People Inc.'s Chief Innovation Officer, told Digiday that bot restrictions resulted in tens of millions of blocked crawled attempts per day throughout their network.
To be clear, blocking bots is not a trivial task. Declaring bot exclusions in the robots.txt file is just the first, and most basic, line of defense. Unfortunately, as reported by Cloudflare, AI platforms like Perplexity have not respected this long-established norm , and even resorted to escalating levels of subterfuge, such as impersonating Chrome browsers, utilizing multiple, undeclared IPs, and rotating ASNs. Cloudflare had to run a sophisticated honeypot operation to catch Perplexity in the act.
Incredibly, Perplexity isn’t even the worst actor. Many bots use a variety of different techniques to evade capture, making them increasingly difficult to detect. The ones that get through are often able to bypass paywalls, so publishers' most expensive, time-consuming, and valuable content is also frequently pillaged. The companies running these bots, such as Tavily, openly document their disregard for web norms: “ The Tavily search crawler does not advertise a differentiated user agent because we must avoid discrimination from websites that allow only Google to crawl them. ” Others, like Firecrawl and Exa , are not well known, and thus are not explicitly blocked. For example, only 4% of the 220 publishers we canvassed explicitly blocked Exabot:
Source: Define Media Group AI Crawler Adoption Report, June 2026
These services exist to provide content to companies such as - you guessed it - the LLMs, offering them backdoor access even if the LLM bot was blocked from crawling. A client of ours, using TollBit, was able to determine that Exa was able to grab a full markdown of a paywalled article within 0.19 seconds - faster than the JavaScript-based mechanisms to add the paywall could render to block it. So not only is it trivial for bots to steal gated content, they are directly profiting from it by charging the AI platforms for access to restricted material. Publishers, meanwhile, may not even know this is happening.
Et tu, Google?
Google Ad, 1999, during Google’s “Don’t Be Evil” Era
In early 2024, Google was still mostly a search engine. But the launch of ChatGPT in November 2022 put the search giant on its heels. Google’s initial response to this threat was a hybrid approach, called AI Overviews (AIOs), which launched in May 2024. This generative summary was placed at the top of search results, and as with all generated AI content, it synthesized information from different sources. Despite public statements from Liz Reid, Google’s VP of Search , and Sundar Pichai, Google’s CEO claiming otherwise, the AIOs left little need for users to click through.
We see the impact directly within our publisher cohort. Since the launch of AIOs, clicks are down -49%:
Yet despite this search traffic attrition from AIOs, Similarweb reports that Search Engines have a 23% traffic share, second-only to Direct among all marketing channels:
Source: Similarweb, News and Media Industry Marketing Channels, US, April 2026
Google has a search engine monopoly, so the vast majority of the 23% Organic Search market share is coming from them.
It was clear from the beginning that Google’s AIO implementation was a half-measure, and that Google was hard at work developing a truly AI-first experience. At their Google I/O conference in May, Google announced that this agentic update will be released during the summer. Google Search is changing forever, integrating generative and agentic AI deeply into the user experience. Google is demoting the “ten blue links” which have been its hallmark , reducing the appearance of the fundamental unit that allowed the search engine and publisher relationship to be an equitable one.
If Google fully transforms from a search engine to an agentic-first chatbot, it seems inevitable that publisher traffic declines will accelerate and with it the ad-market and subsequent revenue they heavily rely on. Search share is currently at 23% - but what if that value further erodes to 15% or 10% or 5%? There is a point at which the unit economics for allowing Googlebot to access publisher content may no longer make sense. If, or when, that happens, publishers may need to consider what was once unthinkable: placing Googlebot on the banned bot list. If Googlebot’s AI search experience operates like the LLMs do, hoarding users rather than sharing them with the open web, then why should it be treated differently than any other parasitic bot? Regardless, Google’s intention is clear - they no longer want to be the doorway to the web and instead are positioning themselves to be the destination.
Imagine
Peter Fordham, Public domain, via Wikimedia Commons
Bot traffic has already exceeded human traffic on the web:
Source: Cloudflare Radar, May 29 - Jun 5, 2026
This occurred faster than Cloudflare’s CEO expected . Bot activity on the web is only going to grow, especially as agentic AI becomes mainstream. It’s not hard to imagine a scenario where 90% or more of all internet traffic comes from bots as agentic AI is integrated into and runs in the background of the most popular consumer platforms.
We are at an inflection point. Publishers have suffered immense losses. But there is still hope, because the single most valuable asset for the AI labs - the human written word - is not and by definition can never be generated by a large language model. The companies behind these bots recognize that human expression is the lifeblood of their product, which is why they’re willing to acquire it by any means necessary.
The infrastructure and organizations are in place, right now, to help publishers regulate and protect their precious asset. We can imagine a world, in the not too distant future, where bot control is standardized across the publishing ecosystem, where access to content comes at a fair value cost to the AI labs, where there is a parallel web for humans and AI bots, and where the economic incentives for real human beings to produce real human content is preserved for the next generation, and beyond. But the window to manifest that world is closing. It’s up to publishers to take a stand and make it happen, now.
No comments yet