About
AI Bot Log tracks how AI crawlers and answer engines read the web. The data comes from sites running a lightweight reporter that adds structured data and machine-readable endpoints to their content, then logs which bots request them. This page explains that infrastructure — the Answer Engine Optimization (AEO) outputs the network is built on — and exactly how the data is captured.
Some outputs are served as separate URLs that bots can request independently. Others are embedded in the HTML of the page itself. The distinction matters for tracking, and for understanding the limits of what bot analytics can tell us.
Outputs served at their own URLs
Each of these is a distinct resource a bot can request. When a crawler fetches one, the reporter logs the bot name, the resource type, and the date. That per-resource granularity is what makes the network dashboard possible — it shows which bots are requesting which content formats.
llms.txt — /llms.txt
A plain-text index of the site: title, description, and a list of posts with summaries and links to their Markdown versions. Follows the llms.txt specification. AI crawlers use it as a table of contents to decide which pages to fetch in full.
How it's served: dynamically generated by a rewrite rule; no static file is written to disk. Requests to /llms.txt are intercepted and the response is rendered in-memory.
# AI Bot Log
> https://aibotlog.com
Answer Engine Optimization — FAQPage schema, entity graphs, llms.txt, and AI bot analytics.
## Posts
- [How AI Crawlers Read Your Content](https://aibotlog.com/how-ai-crawlers-work): What GPTBot, ClaudeBot, and PerplexityBot actually fetch, why structured data changes what they cite, and how to track it.
Markdown: https://aibotlog.com/how-ai-crawlers-work/?aibotlog_llm=1
- [What is llms.txt?](https://aibotlog.com/what-is-llms-txt): The emerging open standard that gives AI crawlers a structured index of your site's content.
Markdown: https://aibotlog.com/what-is-llms-txt/?aibotlog_llm=1
## Pages
- [About](https://aibotlog.com/about): The AI Bot Log network, what the reporter does, and how bot tracking works.
Markdown: https://aibotlog.com/about/?aibotlog_llm=1
Tracked as llms_txt; each bot request is counted separately from HTML page visits.
Post Markdown — /your-post/?aibotlog_llm=1
A structured Markdown rendering of a single post. Includes metadata (publish date, modified date, featured image), the AEO summary, entity list, Q&A pairs, keywords, and the full post content converted to Markdown. Gives AI crawlers a clean, parse-ready version of the content without HTML markup or theme chrome.
How it's served: by intercepting the standard post URL when the ?aibotlog_llm=1 query parameter is present. The same permalink that normally returns the HTML page returns a Markdown document instead — no extra URL or file required.
# Circuit Breakers in Practice
URL: https://example.com/circuit-breakers
Published: 2026-01-15T10:30:00Z
Modified: 2026-03-10T14:22:15Z
## Summary
Circuit breaker patterns prevent cascading failures by wrapping
remote calls in a state machine that trips open after repeated errors.
**Keywords:** circuit breaker, microservices, fault tolerance
## Entities
- Martin Fowler (Person) — Software author and ThoughtWorks chief scientist
- Hystrix (Technology) — Netflix's circuit breaker library
## Q&A
**Q: When should a circuit breaker trip open?**
After a configurable threshold of consecutive failures within
a rolling time window.
## Content
The full post body in Markdown...
Tracked as post_markdown.
Site Summary — /?aibotlog_llm=1
A Markdown overview of the site served at the home URL with the aibotlog_llm=1 parameter. Lists the five most recent posts with summaries and links to the full content index at /llms.txt and /llms-full.txt.
How it's served: the same query-parameter mechanism as Post Markdown, applied to the home URL. Returns a site-level Markdown overview rather than a single post.
Tracked as site_summary.
AEO JSON-LD — /aeo/your-post.jsonld
A standalone JSON-LD file containing the FAQPage schema, entity mentions, citations, and an associatedMedia link to the Markdown endpoint. Served only for posts that have AEO data. Gives bots direct access to the structured data without parsing the HTML page.
How it's served: a dynamic endpoint registered via a rewrite rule matching /aeo/*.jsonld. The file does not exist on disk; the request is intercepted and the JSON-LD is generated from the post's stored AEO metadata.
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "When should a circuit breaker trip open?",
"acceptedAnswer": {
"@type": "Answer",
"text": "After a configurable threshold of consecutive failures..."
}
}]
},
{
"@type": "Article",
"headline": "Circuit Breakers in Practice",
"mentions": [{
"@type": "Person",
"name": "Martin Fowler",
"sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
}],
"citation": [{
"@type": "WebPage",
"url": "https://martinfowler.com/bliki/CircuitBreaker.html",
"name": "CircuitBreaker — Martin Fowler"
}],
"associatedMedia": {
"@type": "MediaObject",
"encodingFormat": "text/markdown",
"contentUrl": "https://example.com/circuit-breakers/?aibotlog_llm=1"
}
}
]
}
Tracked as aeo_jsonld.
XML Sitemap — /sitemap.xml
A standard XML sitemap with one addition: each post entry includes an xhtml:link alternate pointing to its Markdown endpoint. Bots that understand alternate links can discover the structured version without a separate crawl of /llms.txt.
<url>
<loc>https://aibotlog.com/how-ai-crawlers-work</loc>
<lastmod>2026-03-10</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
<xhtml:link rel="alternate" type="text/markdown"
href="https://aibotlog.com/how-ai-crawlers-work/?aibotlog_llm=1"/>
</url>
Tracked as sitemap.
robots.txt additions — /robots.txt
The reporter appends a Sitemap directive and an LLMs-Txt directive to the generated robots.txt. The LLMs-Txt line signals to AI crawlers that a structured content index is available.
Sitemap: https://aibotlog.com/sitemap.xml
# AI content index
LLMs-Txt: https://aibotlog.com/llms.txt
Tracked as robots_txt.
RSS+AEO Feed — /feed/
The standard RSS 2.0 feed, enriched with an xmlns:aeo namespace and per-item AEO elements: <aeo:summary>, <aeo:entity>, and <aeo:qa>. AI crawlers that consume RSS feeds receive the full AEO metadata (structured summaries, named entities, and Q&A pairs) alongside post content. Purely additive: existing feed elements including content:encoded are not modified.
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:aeo="https://aibotlog.com/ns/rss/1.0">
<channel>
<item>
<title>Circuit Breakers in Practice</title>
<link>https://example.com/circuit-breakers</link>
<content:encoded><![CDATA[...full post HTML...]]></content:encoded>
<aeo:summary>Circuit breaker patterns prevent cascading failures
by wrapping remote calls in a state machine that trips open
after repeated errors.</aeo:summary>
<aeo:entity type="Person" sameAs="https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)">
Martin Fowler
</aeo:entity>
<aeo:qa>
<aeo:question>When should a circuit breaker trip open?</aeo:question>
<aeo:answer>After a configurable threshold of consecutive failures
within a rolling time window.</aeo:answer>
</aeo:qa>
</item>
</channel>
</rss>
Tracked as rss_aeo when AEO enrichment is enabled; rss_feed when disabled. Both are counted separately from HTML page visits.
Outputs that live inside the page
These outputs are injected into the HTML <head> of each post. They are present when any bot (or person) loads the page. There is no separate URL to request; the data rides along with the HTML.
When a bot visits an HTML page, the reporter checks at that moment whether the post has AEO metadata stored. If it does, the visit is logged as aeo_post. If not, it logs as html. The visit is still a single HTML page request either way, but the distinction matters: since most sites have a mix of AEO-enriched and plain posts, this split reveals which bots are landing on enriched content and which are only reaching plain pages. Over time, patterns emerge at the network level — AI answer engines tend to show higher aeo_post ratios than SEO crawlers, which suggests they are finding and returning to structured content rather than treating all HTML equally.
What the tracking cannot tell you is which specific embedded element a bot looked at. A visit to an AEO-enriched post could involve the FAQPage schema, the entity mentions, the citation list, or just the post text. The reporter records that enriched content was present; it cannot record what the bot did with it.
FAQPage JSON-LD — embedded in <head>
Generated from the Q&A pairs stored in the post's AEO metadata. Each question becomes a Question node with an acceptedAnswer. This is the same schema format Google uses for FAQ rich results; AI crawlers also parse it to extract question-answer pairs as citable facts.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [{
"@type": "FAQPage",
"@id": "https://example.com/circuit-breakers/#faqpage",
"mainEntity": [{
"@type": "Question",
"name": "When should a circuit breaker trip open?",
"acceptedAnswer": {
"@type": "Answer",
"text": "After a configurable threshold of consecutive failures within a rolling time window."
}
}]
}]
}
</script>
Entity mentions with sameAs
Each entity stored in the post's AEO metadata becomes a typed mentions entry in the Article JSON-LD node. The sameAs URL links to an authoritative reference (typically Wikipedia or an official site), giving AI systems a way to disambiguate the entity — for example "Martin Fowler the software author" versus any other Martin Fowler.
"mentions": [
{
"@type": "Person",
"name": "Martin Fowler",
"description": "Software author and ThoughtWorks chief scientist",
"sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
},
{
"@type": "SoftwareApplication",
"name": "Hystrix",
"sameAs": "https://github.com/Netflix/Hystrix"
}
]
Citation JSON-LD
External links in the post content are extracted and added as citation entries in the Article JSON-LD node. Each citation includes the URL and the link's anchor text — giving AI systems a structured list of the post's sources without parsing the HTML body.
"citation": [
{
"@type": "WebPage",
"url": "https://martinfowler.com/bliki/CircuitBreaker.html",
"name": "CircuitBreaker — Martin Fowler"
},
{
"@type": "WebPage",
"url": "https://netflix.github.io/Hystrix/",
"name": "Hystrix Wiki"
}
]
Meta tags
The reporter injects description, Open Graph, and Twitter Card meta tags derived from the post's AEO summary (falling back to the excerpt if no summary exists). These are suppressed when an active SEO plugin is detected and the conflict toggle is enabled.
<meta name="description" content="Circuit breaker patterns prevent cascading failures by wrapping remote calls in a state machine.">
<meta property="og:title" content="Circuit Breakers in Practice">
<meta property="og:description" content="Circuit breaker patterns prevent cascading failures...">
<meta property="og:type" content="article">
<meta name="twitter:card" content="summary_large_image">
How the data is captured and reported
On every incoming request, the reporter checks the user-agent string against a list of known bot signatures: AI answer engines (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), training crawlers (CCBot, Bytespider, DeepSeekBot), search engines (Googlebot, Bingbot), and SEO tools (SemrushBot, AhrefsBot). Matching is case-insensitive substring comparison.
When a match is found, the reporter records three things: the canonical bot name, the resource type requested (one of html, llms_txt, post_markdown, site_summary, sitemap, robots_txt, aeo_jsonld, rss_feed, rss_aeo, or aeo_post), and the date. RSS visits are split: rss_aeo when AEO feed enrichment is enabled, rss_feed when it is not. Counts are stored locally in a daily summary table; no per-request log is kept.
For content-bearing requests (HTML pages), the reporter also records content signals: word-count bucket, content freshness, fact density, and URL depth. These are the metrics displayed in the Crawl Intelligence section of the network dashboard.
Network participation is opt-in. A site functions perfectly well without contributing. Enabling network intelligence sends a daily count summary: bot name, resource type, visit total, and content-signal distributions. No URLs, no content, no user data.
The site identifier sent with each payload is a one-way SHA-256 hash of the site URL and an instance ID. It cannot be reversed to recover the domain. It exists only to deduplicate contributions from the same site across days.
IndexNow pings
When a post is published or updated, the reporter sends an IndexNow ping to notify search engines that the URL has changed. The ping includes a verification key stored as a static file at the site root. Pings are throttled to one burst per 30 minutes to avoid rate-limiting. IndexNow is supported by Bing, Yandex, and other participating search engines. Google does not currently participate but receives a separate sitemap ping.
Every metric comes from a real site
The intelligence dashboard isn't scraped, modelled, or estimated. Every number on it — bot visit counts, resource breakdowns, signal distributions — comes from a site owner who installed the reporter and chose to opt in to the AI Bot Log network.
When a site opts in, it sends one anonymized payload per day: aggregated bot counts and content-signal distributions for that site's 24-hour window. No URLs, no post content, no visitor data — just counts. The site's identity is a one-way hash, irreversible to a domain name. Individual sites are never identifiable in what the dashboard shows.
What that means in practice: the picture gets sharper the more people join. A network of ten sites can tell you which bots are active. A network of a hundred can tell you how crawl patterns differ by content type. A network of thousands can tell you whether a behavioural shift is real or noise. The people who opt in aren't just using the data — they're making it.
This is what "watching the watchers" looks like in practice. AI crawlers and search bots watch the web. Site owners who run the reporter watch them back — and by sharing what they see, anonymously and voluntarily, they build a public record that belongs to no single company.
Watching the Watchers
Quis custodiet ipsos custodes? Who will guard the guards themselves? — Juvenal, Satires VI, c. 2nd century AD
The phrase comes from Juvenal, a Roman poet writing satirical verse in the late 1st and early 2nd centuries AD. It appears in Satire VI, his longest and most acerbic work, in a passage about the futility of keeping a wife faithful: you cannot trust the guards you hire to watch over her, because the guards themselves need watching. The original meaning was less about political theory and more about the impossibility of reliable oversight at all.
The phrase outlasted its domestic context entirely. By the time it entered political philosophy, it had become a foundational challenge to any system of power: who holds the overseers accountable? It now appears in arguments about police oversight, intelligence agencies, judicial review, and, increasingly, the systems that monitor digital behaviour.
AI Bot Log borrows it literally. AI crawlers and search bots are themselves watchers; they index, train on, and cite the web. The network watches them back — logging which bots visit, which content they read, and how their behaviour shifts over time — and turns that data into a public record. We don't decide how AI systems should behave. We just watch the watchers, and publish what we see.
Built by Janzen Works
AI Bot Log is built by Janzen Works. The network intelligence dashboard is open to anyone. Feedback, bug reports, and data questions: michael@janzenworks.com.