{"title":"About","slug":"about","type":"page","excerpt":"How AI Bot Log works: the AEO endpoints it tracks (llms.txt, Post Markdown, JSON-LD, RSS+AEO), how bot visits are captured, and the open network behind the dashboard.","content":"AI Bot Log tracks how AI crawlers and answer engines read the web. The data comes from sites running a lightweight reporter that adds structured data and machine-readable endpoints to their content, then logs which bots request them. This page explains that infrastructure — the Answer Engine Optimization (AEO) outputs the network is built on — and exactly how the data is captured.\n\nSome outputs are served as separate URLs that bots can request independently. Others are embedded in the HTML of the page itself. The distinction matters for tracking, and for understanding the limits of what bot analytics can tell us.\n\n## Outputs served at their own URLs\n\nEach of these is a distinct resource a bot can request. When a crawler fetches one, the reporter logs the bot name, the resource type, and the date. That per-resource granularity is what makes the [network dashboard](/) possible — it shows which bots are requesting which content formats.\n\n### llms.txt — `/llms.txt`\n\nA plain-text index of the site: title, description, and a list of posts with summaries and links to their Markdown versions. Follows the [llms.txt specification](https://llmstxt.org). AI crawlers use it as a table of contents to decide which pages to fetch in full.\n\n**How it's served:** dynamically generated by a rewrite rule; no static file is written to disk. Requests to `/llms.txt` are intercepted and the response is rendered in-memory.\n\n```\n# AI Bot Log\n\n> https://aibotlog.com\n\nAnswer Engine Optimization — FAQPage schema, entity graphs, llms.txt, and AI bot analytics.\n\n## Posts\n\n- [How AI Crawlers Read Your Content](https://aibotlog.com/how-ai-crawlers-work): What GPTBot, ClaudeBot, and PerplexityBot actually fetch, why structured data changes what they cite, and how to track it.\n  Markdown: https://aibotlog.com/how-ai-crawlers-work/?aibotlog_llm=1\n\n- [What is llms.txt?](https://aibotlog.com/what-is-llms-txt): The emerging open standard that gives AI crawlers a structured index of your site's content.\n  Markdown: https://aibotlog.com/what-is-llms-txt/?aibotlog_llm=1\n\n## Pages\n\n- [About](https://aibotlog.com/about): The AI Bot Log network, what the reporter does, and how bot tracking works.\n  Markdown: https://aibotlog.com/about/?aibotlog_llm=1\n```\n\n*Tracked as `llms_txt`; each bot request is counted separately from HTML page visits.*\n\n### Post Markdown — `/your-post/?aibotlog_llm=1`\n\nA structured Markdown rendering of a single post. Includes metadata (publish date, modified date, featured image), the AEO summary, entity list, Q&A pairs, keywords, and the full post content converted to Markdown. Gives AI crawlers a clean, parse-ready version of the content without HTML markup or theme chrome.\n\n**How it's served:** by intercepting the standard post URL when the `?aibotlog_llm=1` query parameter is present. The same permalink that normally returns the HTML page returns a Markdown document instead — no extra URL or file required.\n\n```\n# Circuit Breakers in Practice\n\nURL: https://example.com/circuit-breakers\nPublished: 2026-01-15T10:30:00Z\nModified: 2026-03-10T14:22:15Z\n\n## Summary\n\nCircuit breaker patterns prevent cascading failures by wrapping\nremote calls in a state machine that trips open after repeated errors.\n\n**Keywords:** circuit breaker, microservices, fault tolerance\n\n## Entities\n\n- Martin Fowler (Person) — Software author and ThoughtWorks chief scientist\n- Hystrix (Technology) — Netflix's circuit breaker library\n\n## Q&A\n\n**Q: When should a circuit breaker trip open?**\nAfter a configurable threshold of consecutive failures within\na rolling time window.\n\n## Content\n\nThe full post body in Markdown...\n```\n\n*Tracked as `post_markdown`.*\n\n### Site Summary — `/?aibotlog_llm=1`\n\nA Markdown overview of the site served at the home URL with the `aibotlog_llm=1` parameter. Lists the five most recent posts with summaries and links to the full content index at `/llms.txt` and `/llms-full.txt`.\n\n**How it's served:** the same query-parameter mechanism as Post Markdown, applied to the home URL. Returns a site-level Markdown overview rather than a single post.\n\n*Tracked as `site_summary`.*\n\n### AEO JSON-LD — `/aeo/your-post.jsonld`\n\nA standalone JSON-LD file containing the FAQPage schema, entity mentions, citations, and an `associatedMedia` link to the Markdown endpoint. Served only for posts that have AEO data. Gives bots direct access to the structured data without parsing the HTML page.\n\n**How it's served:** a dynamic endpoint registered via a rewrite rule matching `/aeo/*.jsonld`. The file does not exist on disk; the request is intercepted and the JSON-LD is generated from the post's stored AEO metadata.\n\n```json\n{\n  \"@context\": \"https://schema.org\",\n  \"@graph\": [\n    {\n      \"@type\": \"FAQPage\",\n      \"mainEntity\": [{\n        \"@type\": \"Question\",\n        \"name\": \"When should a circuit breaker trip open?\",\n        \"acceptedAnswer\": {\n          \"@type\": \"Answer\",\n          \"text\": \"After a configurable threshold of consecutive failures...\"\n        }\n      }]\n    },\n    {\n      \"@type\": \"Article\",\n      \"headline\": \"Circuit Breakers in Practice\",\n      \"mentions\": [{\n        \"@type\": \"Person\",\n        \"name\": \"Martin Fowler\",\n        \"sameAs\": \"https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)\"\n      }],\n      \"citation\": [{\n        \"@type\": \"WebPage\",\n        \"url\": \"https://martinfowler.com/bliki/CircuitBreaker.html\",\n        \"name\": \"CircuitBreaker — Martin Fowler\"\n      }],\n      \"associatedMedia\": {\n        \"@type\": \"MediaObject\",\n        \"encodingFormat\": \"text/markdown\",\n        \"contentUrl\": \"https://example.com/circuit-breakers/?aibotlog_llm=1\"\n      }\n    }\n  ]\n}\n```\n\n*Tracked as `aeo_jsonld`.*\n\n### XML Sitemap — `/sitemap.xml`\n\nA standard XML sitemap with one addition: each post entry includes an `xhtml:link` alternate pointing to its Markdown endpoint. Bots that understand alternate links can discover the structured version without a separate crawl of `/llms.txt`.\n\n```xml\n<url>\n  <loc>https://aibotlog.com/how-ai-crawlers-work</loc>\n  <lastmod>2026-03-10</lastmod>\n  <changefreq>weekly</changefreq>\n  <priority>0.8</priority>\n  <xhtml:link rel=\"alternate\" type=\"text/markdown\"\n    href=\"https://aibotlog.com/how-ai-crawlers-work/?aibotlog_llm=1\"/>\n</url>\n```\n\n*Tracked as `sitemap`.*\n\n### robots.txt additions — `/robots.txt`\n\nThe reporter appends a `Sitemap` directive and an `LLMs-Txt` directive to the generated `robots.txt`. The `LLMs-Txt` line signals to AI crawlers that a structured content index is available.\n\n```\nSitemap: https://aibotlog.com/sitemap.xml\n\n# AI content index\nLLMs-Txt: https://aibotlog.com/llms.txt\n```\n\n*Tracked as `robots_txt`.*\n\n### RSS+AEO Feed — `/feed/`\n\nThe standard RSS 2.0 feed, enriched with an `xmlns:aeo` namespace and per-item AEO elements: `<aeo:summary>`, `<aeo:entity>`, and `<aeo:qa>`. AI crawlers that consume RSS feeds receive the full AEO metadata (structured summaries, named entities, and Q&A pairs) alongside post content. Purely additive: existing feed elements including `content:encoded` are not modified.\n\n```xml\n<rss version=\"2.0\"\n  xmlns:content=\"http://purl.org/rss/1.0/modules/content/\"\n  xmlns:aeo=\"https://aibotlog.com/ns/rss/1.0\">\n  <channel>\n    <item>\n      <title>Circuit Breakers in Practice</title>\n      <link>https://example.com/circuit-breakers</link>\n      <content:encoded><![CDATA[...full post HTML...]]></content:encoded>\n\n      <aeo:summary>Circuit breaker patterns prevent cascading failures\nby wrapping remote calls in a state machine that trips open\nafter repeated errors.</aeo:summary>\n\n      <aeo:entity type=\"Person\" sameAs=\"https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)\">\n        Martin Fowler\n      </aeo:entity>\n\n      <aeo:qa>\n        <aeo:question>When should a circuit breaker trip open?</aeo:question>\n        <aeo:answer>After a configurable threshold of consecutive failures\nwithin a rolling time window.</aeo:answer>\n      </aeo:qa>\n    </item>\n  </channel>\n</rss>\n```\n\n*Tracked as `rss_aeo` when AEO enrichment is enabled; `rss_feed` when disabled. Both are counted separately from HTML page visits.*\n\n## Outputs that live inside the page\n\nThese outputs are injected into the HTML `<head>` of each post. They are present when any bot (or person) loads the page. There is no separate URL to request; the data rides along with the HTML.\n\nWhen a bot visits an HTML page, the reporter checks at that moment whether the post has AEO metadata stored. If it does, the visit is logged as `aeo_post`. If not, it logs as `html`. The visit is still a single HTML page request either way, but the distinction matters: since most sites have a mix of AEO-enriched and plain posts, this split reveals which bots are landing on enriched content and which are only reaching plain pages. Over time, patterns emerge at the network level — AI answer engines tend to show higher `aeo_post` ratios than SEO crawlers, which suggests they are finding and returning to structured content rather than treating all HTML equally.\n\nWhat the tracking cannot tell you is which specific embedded element a bot looked at. A visit to an AEO-enriched post could involve the FAQPage schema, the entity mentions, the citation list, or just the post text. The reporter records that enriched content was present; it cannot record what the bot did with it.\n\n### FAQPage JSON-LD — embedded in `<head>`\n\nGenerated from the Q&A pairs stored in the post's AEO metadata. Each question becomes a `Question` node with an `acceptedAnswer`. This is the same schema format Google uses for FAQ rich results; AI crawlers also parse it to extract question-answer pairs as citable facts.\n\n```html\n<script type=\"application/ld+json\">\n{\n  \"@context\": \"https://schema.org\",\n  \"@graph\": [{\n    \"@type\": \"FAQPage\",\n    \"@id\": \"https://example.com/circuit-breakers/#faqpage\",\n    \"mainEntity\": [{\n      \"@type\": \"Question\",\n      \"name\": \"When should a circuit breaker trip open?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"After a configurable threshold of consecutive failures within a rolling time window.\"\n      }\n    }]\n  }]\n}\n</script>\n```\n\n### Entity mentions with `sameAs`\n\nEach entity stored in the post's AEO metadata becomes a typed `mentions` entry in the Article JSON-LD node. The `sameAs` URL links to an authoritative reference (typically Wikipedia or an official site), giving AI systems a way to disambiguate the entity — for example \"Martin Fowler the software author\" versus any other Martin Fowler.\n\n```json\n\"mentions\": [\n  {\n    \"@type\": \"Person\",\n    \"name\": \"Martin Fowler\",\n    \"description\": \"Software author and ThoughtWorks chief scientist\",\n    \"sameAs\": \"https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)\"\n  },\n  {\n    \"@type\": \"SoftwareApplication\",\n    \"name\": \"Hystrix\",\n    \"sameAs\": \"https://github.com/Netflix/Hystrix\"\n  }\n]\n```\n\n### Citation JSON-LD\n\nExternal links in the post content are extracted and added as `citation` entries in the Article JSON-LD node. Each citation includes the URL and the link's anchor text — giving AI systems a structured list of the post's sources without parsing the HTML body.\n\n```json\n\"citation\": [\n  {\n    \"@type\": \"WebPage\",\n    \"url\": \"https://martinfowler.com/bliki/CircuitBreaker.html\",\n    \"name\": \"CircuitBreaker — Martin Fowler\"\n  },\n  {\n    \"@type\": \"WebPage\",\n    \"url\": \"https://netflix.github.io/Hystrix/\",\n    \"name\": \"Hystrix Wiki\"\n  }\n]\n```\n\n### Meta tags\n\nThe reporter injects `description`, Open Graph, and Twitter Card meta tags derived from the post's AEO summary (falling back to the excerpt if no summary exists). These are suppressed when an active SEO plugin is detected and the conflict toggle is enabled.\n\n```html\n<meta name=\"description\" content=\"Circuit breaker patterns prevent cascading failures by wrapping remote calls in a state machine.\">\n<meta property=\"og:title\" content=\"Circuit Breakers in Practice\">\n<meta property=\"og:description\" content=\"Circuit breaker patterns prevent cascading failures...\">\n<meta property=\"og:type\" content=\"article\">\n<meta name=\"twitter:card\" content=\"summary_large_image\">\n```\n\n## How the data is captured and reported\n\nOn every incoming request, the reporter checks the user-agent string against a list of known bot signatures: AI answer engines (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), training crawlers (CCBot, Bytespider, DeepSeekBot), search engines (Googlebot, Bingbot), and SEO tools (SemrushBot, AhrefsBot). Matching is case-insensitive substring comparison.\n\nWhen a match is found, the reporter records three things: the canonical bot name, the resource type requested (one of `html`, `llms_txt`, `post_markdown`, `site_summary`, `sitemap`, `robots_txt`, `aeo_jsonld`, `rss_feed`, `rss_aeo`, or `aeo_post`), and the date. RSS visits are split: `rss_aeo` when AEO feed enrichment is enabled, `rss_feed` when it is not. Counts are stored locally in a daily summary table; no per-request log is kept.\n\nFor content-bearing requests (HTML pages), the reporter also records content signals: word-count bucket, content freshness, fact density, and URL depth. These are the metrics displayed in the [Crawl Intelligence](/) section of the network dashboard.\n\n> **Network participation is opt-in.** A site functions perfectly well without contributing. Enabling network intelligence sends a daily count summary: bot name, resource type, visit total, and content-signal distributions. No URLs, no content, no user data.\n\nThe site identifier sent with each payload is a one-way SHA-256 hash of the site URL and an instance ID. It cannot be reversed to recover the domain. It exists only to deduplicate contributions from the same site across days.\n\n## IndexNow pings\n\nWhen a post is published or updated, the reporter sends an IndexNow ping to notify search engines that the URL has changed. The ping includes a verification key stored as a static file at the site root. Pings are throttled to one burst per 30 minutes to avoid rate-limiting. IndexNow is supported by Bing, Yandex, and other participating search engines. Google does not currently participate but receives a separate sitemap ping.\n\n## Every metric comes from a real site\n\nThe intelligence dashboard isn't scraped, modelled, or estimated. Every number on it — bot visit counts, resource breakdowns, signal distributions — comes from a site owner who installed the reporter and chose to opt in to the AI Bot Log network.\n\nWhen a site opts in, it sends one anonymized payload per day: aggregated bot counts and content-signal distributions for that site's 24-hour window. No URLs, no post content, no visitor data — just counts. The site's identity is a one-way hash, irreversible to a domain name. Individual sites are never identifiable in what the dashboard shows.\n\nWhat that means in practice: the picture gets sharper the more people join. A network of ten sites can tell you which bots are active. A network of a hundred can tell you how crawl patterns differ by content type. A network of thousands can tell you whether a behavioural shift is real or noise. The people who opt in aren't just using the data — they're making it.\n\nThis is what \"watching the watchers\" looks like in practice. AI crawlers and search bots watch the web. Site owners who run the reporter watch them back — and by sharing what they see, anonymously and voluntarily, they build a public record that belongs to no single company.\n\n## Watching the Watchers\n\n> **Quis custodiet ipsos custodes?**\n> *Who will guard the guards themselves?*\n> — Juvenal, Satires VI, c. 2nd century AD\n\nThe phrase comes from Juvenal, a Roman poet writing satirical verse in the late 1st and early 2nd centuries AD. It appears in *Satire VI*, his longest and most acerbic work, in a passage about the futility of keeping a wife faithful: you cannot trust the guards you hire to watch over her, because the guards themselves need watching. The original meaning was less about political theory and more about the impossibility of reliable oversight at all.\n\nThe phrase outlasted its domestic context entirely. By the time it entered political philosophy, it had become a foundational challenge to any system of power: who holds the overseers accountable? It now appears in arguments about police oversight, intelligence agencies, judicial review, and, increasingly, the systems that monitor digital behaviour.\n\nAI Bot Log borrows it literally. AI crawlers and search bots are themselves watchers; they index, train on, and cite the web. The network watches them back — logging which bots visit, which content they read, and how their behaviour shifts over time — and turns that data into a public record. We don't decide how AI systems should behave. We just watch the watchers, and publish what we see.\n\n## Built by Janzen Works\n\nAI Bot Log is built by [Janzen Works](https://janzenworks.com). The network intelligence [dashboard](/) is open to anyone. Feedback, bug reports, and data questions: [michael@janzenworks.com](mailto:michael@janzenworks.com).\n","publishedAt":"2026-06-16T20:10:17.213Z","updatedAt":"2026-06-17T02:54:43.868Z","author":{"name":"Michael Janzen"},"categories":[],"tags":[],"featuredImageUrl":null,"aeo":{"summary":"AI Bot Log's About page: a technical guide to the Answer Engine Optimization endpoints the network tracks (llms.txt, Post Markdown, AEO JSON-LD, sitemap, robots.txt, RSS+AEO, embedded FAQPage/entity/citation JSON-LD), how anonymized bot-visit data is captured, and the opt-in network that powers the public dashboard."},"site":{"name":"AI Bot Log","url":"https://aibotlog.com"},"_links":{"canonical":"https://aibotlog.com/post/about","markdown":"https://aibotlog.com/post/about/llm.txt","json":"https://aibotlog.com/post/about/data.json"}}