Briefing · 02/07/2026

Your website needs an AI traffic policy

Cloudflare's new AI bot controls split automated traffic into Search, Agent and Training. That turns crawler settings into a business policy decision, not a technical toggle.

Cloudflare has just made the AI crawler question less vague.

The old version of the decision was simple enough to fit on a switch: block AI bots or let them through.

That was useful for a while. It gave publishers and site owners a defensive control when model companies were crawling content at scale and returning very little visible value. But it was always too blunt for normal business use.

Search crawlers, AI answer engines, browser agents, chat fetchers, training crawlers and content scrapers do not all create the same risk. They also do not all create the same value.

Cloudflare’s 1 July 2026 update moves the control from a single AI-bot category into three behaviours: Search, Agent and Training.

That is the real Signal.

Website owners now need a traffic policy, not a panic button.

Answer-engine summary

What changed? Cloudflare now lets all customers manage AI crawler traffic by behaviour: Search, Agent and Training. From 15 September 2026, new domains onboarding to Cloudflare will block Training and Agent bots by default on pages that display ads, while Search remains allowed by default.

Why does it matter? AI traffic is no longer one thing. Some bots may help people find a business. Some may act for a user. Some may take content to train or fine-tune a model. Small businesses need separate rules for each category.

The Three Categories Matter

Cloudflare’s new taxonomy is practical because it asks what the automated traffic is doing.

Search covers crawlers that index content so people can find it later. The expected bargain is still some version of discovery, referral traffic or compensation.

Agent covers automated activity acting in real time for a person. That includes chat fetch bots and browser-use agents. The visitor may not be a human clicking directly, but the activity can still be connected to a user’s immediate task.

Training covers crawlers taking content to train or fine-tune a model. That is the category most likely to feel extractive if the site owner receives no traffic, attribution or payment in return.

Those distinctions sound obvious once named. The problem is that many organisations have not named them.

They still talk about “AI bots” as if the only choices are:

let everything crawl;
block everything;
ignore the whole problem until something breaks.

That will not be good enough for long.

The Default Is Changing In September

Cloudflare says that on 15 September 2026, new domains onboarding to Cloudflare will get updated defaults.

Training and Agent bots will be blocked by default on pages that display ads. Search will remain allowed by default.

Cloudflare’s logic is simple: if a page displays ads, the site owner probably intended a person to land there and see the page. Human attention is the business model. Bots that consume, summarise or act around the page may reduce that value.

The other important change is how multi-purpose crawlers are treated.

Some crawlers do more than one thing. A crawler might support search indexing and also collect data for training. Cloudflare says multi-purpose crawlers that combine Search and Training will be affected by rules that block Training. Its examples include Googlebot, Applebot and BingBot.

That is the interesting bit for operators.

The web has lived for decades on a crawler bargain: let search engines index you, and they send people back. AI has disturbed that bargain because the crawler can now absorb the page, answer the user’s question somewhere else, and reduce the need for a visit.

So the policy decision becomes more specific:

Do we want this crawler for discovery?
Do we accept this agent acting for a user?
Do we allow this content to train a model?
Does the answer change on commercial, ad-supported, member-only or sensitive pages?
What evidence would make us change the rule?

That is a management decision, not just a webmaster setting.

Why Small Businesses Should Care

Most small businesses will not read Cloudflare’s changelog. Fair enough. There are invoices to chase and staff to manage.

But this change matters because AI traffic sits right on the boundary between marketing, privacy, intellectual property and customer acquisition.

A local clinic, advisory firm, trade business, training provider or consultancy might want AI search systems to understand and recommend its services.

It probably does not want every page, guide, pricing model, proprietary checklist or member resource scraped into a training dataset without control.

It might want a user’s AI assistant to fetch opening hours, service details, appointment instructions or a public article.

It might not want an agent submitting forms, scraping gated material, hammering booking flows or triggering workflows without clear limits.

These are not abstract publisher problems. They are ordinary business questions:

Which public pages help us get found?
Which content gives away value?
Which pages depend on human attention?
Which automated visits should be allowed because they serve a real customer?
Which automated visits should be blocked because they only extract?

The answer will not be the same for every site.

That is why the useful phrase is AI traffic policy.

The Old Robots.txt Mental Model Is Too Thin

Robots.txt still matters, but it is not enough as the whole policy layer.

It was built for a simpler web. It can express crawl preferences, but it does not carry the full business distinction between search discovery, user-directed agents, model training, monetisation, attribution, compensation and workflow abuse.

The AI crawler problem is not only “can this bot fetch the page?”

It is:

what is the bot’s purpose?
what does it store?
does the site owner get traffic, payment or attribution?
is the request connected to a user’s task?
can the bot trigger state-changing actions?
does the page contain content that should stay outside model training?
do logs show useful visitors or just extraction?

That is closer to access governance than old search-engine optimisation.

For a practical business, the first version does not need to be complicated. It can fit in a small policy table:

Traffic type	Default	Allowed examples	Blocked examples	Review owner
Search	Allow on public marketing pages	service pages, public articles, location pages	private, stale or duplicate content	marketing / owner
Agent	Allow only where user benefit is clear	opening hours, public FAQs, read-only content	forms, bookings, checkout, sensitive workflows	operations / IT
Training	Block unless there is a reason to permit	deliberately public educational material	proprietary guides, pricing logic, member resources	owner / legal / advisor

That table is not glamorous. Good.

Glamour is how businesses end up with a hundred AI tools and no operating rules.

This Connects To A Bigger Web Shift

Cloudflare is not only adding nicer bot controls. It is positioning itself inside the business model fight for the AI web.

Its Content Independence Day work, Pay Per Crawl direction, AI search initiatives and new traffic options all point at the same problem: the old referral economy is breaking.

For years, publishing on the web meant accepting crawlers because search traffic was the reward.

AI changes the economics. A model or answer engine can use a page to satisfy the user without sending that user to the source. A browser agent can complete a task without exposing the usual page journey. A training crawler can convert a company’s content into model capability with no obvious return path.

That does not mean every AI crawler is bad.

It means crawler purpose now matters commercially.

The next mature website audit will not stop at performance, SEO, accessibility and analytics. It will include:

AI crawler settings;
content that should be findable by answer engines;
content that should be excluded from training;
agent-safe pages and workflows;
ad-supported or attention-dependent pages;
log review for bot behaviour;
a record of who chose the policy and why.

That is exactly the kind of boring governance layer most businesses are missing.

What To Do Now

For most small businesses, there is no reason to panic.

There is a reason to choose.

Start with five steps:

List the pages that should help customers find you.
List the content you would not want copied into someone else’s model without permission.
Decide whether user-directed agents should be allowed to read public content.
Block or restrict automated traffic on forms, booking flows, portals and private resources.
Review the actual Cloudflare settings before 15 September 2026 instead of inheriting defaults by accident.

If the business runs ads or depends on page views, this becomes more urgent.

If the site is mostly a brochure and blog, the immediate risk is lower. The strategic question still matters: do you want AI systems to understand and recommend the business, and under what conditions?

That question belongs next to SEO, privacy and content strategy.

The Signal

AI traffic is splitting into classes.

The useful distinction is no longer human versus bot. It is closer to purpose:

discovery traffic that can bring people back;
agent traffic acting for a user;
training traffic turning content into model capability.

Cloudflare has made that distinction visible in a mainstream control surface.

That means the crawler conversation has moved from technical hygiene to business policy.

Small businesses need the same shift in thinking. Blocking everything can hurt discoverability. Allowing everything can give away value. Ignoring the distinction leaves the policy to vendors, defaults and crawlers with mixed incentives.

The adult answer is not “block AI” or “embrace AI”.

The adult answer is: write down what each kind of AI traffic is allowed to do.

Your website needs an AI traffic policy.

Sources

Was this useful?

Quick signal helps Rob sharpen future briefings.