# Meet the AI Crawlers Reading Your Site

**Published:** 2026-06-17  
**Author:** Michael Janzen

---

A field guide to the AI crawlers and answer engines (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) now reading the web — what each one is and what content formats they fetch.

---

> A field guide to the answer-engine bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — and what they actually fetch.

---

Not all bots are search engines anymore. A fast-growing share of crawler traffic now comes from **answer engines** — the systems behind ChatGPT, Claude, Perplexity, and Google's AI features — fetching your content to train models and answer questions in real time. Here's who they are and what they actually request.

## The answer engines

- **GPTBot** (OpenAI) — trains and indexes content for ChatGPT.
- **OAI-SearchBot / ChatGPT-User** (OpenAI) — real-time fetches when ChatGPT browses or grounds an answer.
- **ClaudeBot** (Anthropic) — the training crawler; **Claude-User** fetches live when someone asks Claude about a page.
- **PerplexityBot / Perplexity-User** — index and live-search fetches for Perplexity's answer engine.
- **Google-Extended** — Google's opt-in signal for Gemini training, distinct from classic Googlebot.

## What they fetch

Some bots want your rendered HTML. Increasingly, the well-behaved ones look for **machine-readable** versions first: an `llms.txt` index, clean Markdown renderings of your posts, or JSON-LD structured data. Serving those formats makes your content cheaper to parse — and more likely to be cited accurately.

## How to see it

You can't manage what you can't measure. [AI Bot Log](/) aggregates anonymized bot-visit data from sites across the network into a live dashboard, so you can see which crawlers are active, what they request, and how the trends are moving — instead of guessing from raw server logs.
