
Blocking training bots limits model data intake, but permitting assistant bots can drive traffic from AI‑powered search results, directly affecting visibility and revenue.
The AI crawling ecosystem is polarizing into two distinct camps. Training bots, designed to harvest large swaths of web content for model improvement, are encountering mounting resistance; GPTBot’s coverage fell to a single‑digit percentage, echoing broader publisher actions documented by BuzzStream and Cloudflare. Meanwhile, assistant bots such as OpenAI’s OAI‑SearchBot, TikTok’s crawler, and Apple’s counterpart are gaining ground, targeting content only when a user query triggers a fetch. This functional split is reshaping how search visibility is earned in the era of generative AI.
For site owners, the data signals a strategic crossroads. Unrestricted training bots can generate billions of requests, inflating bandwidth costs and straining server resources—a pain point highlighted by Vercel’s report of GPTBot’s 569 million monthly hits. Conversely, allowing assistant bots can place pages in emerging AI search panels, potentially capturing new audience segments without the heavy resource toll. Publishers are therefore pruning aggressive SEO tools while fine‑tuning robots.txt directives to welcome user‑centric crawlers that promise measurable referral traffic.
Looking ahead, OpenAI advises operators to explicitly permit OAI‑SearchBot if they wish to appear in ChatGPT’s search results, while still restricting GPTBot. Implementing granular robots.txt rules, monitoring server logs, and leveraging CDN‑level blocks enable a balanced approach: protect infrastructure, control data contribution, and capitalize on AI‑driven discovery. As AI assistants become primary entry points for information, mastering this nuanced crawler management will be a competitive differentiator for digital businesses.
Comments
Want to join the conversation?
Loading comments...