How to Add an LLM-Powered Help Bot to Your Marketing Site

A support ticket lands at 2 a.m.: "Where's the integration doc for the thing you shipped last week?" Your help bot answers confidently, with the version of the doc from three releases ago, because nobody re-indexed after the last content update. The customer follows broken instructions, files a bug that isn't a bug, and your team burns an afternoon untangling it. That's the real failure mode of bolting an LLM help bot onto a marketing site: not that the model is dumb, but that it's answering from stale, unstructured, ungoverned content.

Most "add a help bot" tutorials stop at the easy 80%, scrape the site, embed it, wire up a chat widget. The hard 20% is everything that determines whether the bot is trustworthy in production: keeping the index fresh as content changes, preserving the structure of your docs through chunking, governing what the bot is allowed to say, and evaluating its answers before they reach a customer.

This guide treats the help bot as a content problem, not just a model problem. The protagonist isn't the LLM, it's the CMS feeding it. We'll walk through the architecture, where freshness and structure break, and why a CMS whose embeddings, retrieval, and governance are wired into the data model changes the calculus.

What a Help Bot Actually Needs From Your Content Layer

Strip away the chat UI and a help bot is a retrieval pipeline with a language model on the end. A user asks a question, the system finds the most relevant passages from your content, and the model composes an answer grounded in those passages. The quality ceiling of the whole system is set long before the model runs, it's set by what your content layer can hand over.

Three properties matter more than model choice. First, retrievability: can you find the right passage for a fuzzy, natural-language question, or only for exact keyword matches? That's the difference between lexical search and semantic search over embeddings. Second, structure preservation: when a 4,000-word setup guide gets chunked into retrieval-sized pieces, does each chunk still know it's a step in a numbered procedure under a specific heading, or does it become an orphaned blob of text? Third, freshness: when an editor fixes a wrong price or deprecates a feature, how long until the bot stops repeating the old answer?

Most teams discover these requirements in production, after the demo impressed everyone. The demo used a clean, frozen snapshot of ten docs. Production has 800 pages that change weekly, marketing copy mixed with technical reference, and an editor who just renamed a product. A help bot that can't track that change is worse than no bot, it launders stale information with the confidence of a chat interface.

The reframe for the rest of this guide: every one of those three properties is a CMS responsibility. If you treat the bot as an app sitting outside your content, you rebuild freshness, structure, and governance yourself in glue code. If the content layer owns them, the bot inherits them.

The Naive Architecture and Where It Breaks

The standard starting point looks deceptively complete. You run a crawler over your marketing and docs site, split the HTML into chunks, send each chunk to an embeddings API, and store the vectors in a standalone vector database. At query time you embed the user's question, pull the nearest chunks, stuff them into a prompt, and let the model answer. Frameworks like LangChain.js and LlamaIndex make this an afternoon of work, and a vector store like Pinecone makes it scale.

The breakage is operational, and it shows up on a delay. The crawler captured rendered HTML, so navigation chrome, cookie banners, and footer boilerplate got embedded alongside real content, diluting retrieval. The chunker split on character count, so a numbered procedure got cut in half and a 'do not do X' warning got separated from the step it warned about. And critically, the index is a snapshot. The moment an editor publishes a change, your vectors are wrong, and nothing tells the pipeline to catch up. You're now running a cron job that re-crawls the whole site nightly, re-embedding thousands of unchanged pages to catch the three that moved.

Each gap is patchable, but you're now maintaining a parallel content system: a crawler, a cleaning step, a chunking strategy, an embeddings pipeline, a vector DB, and a sync job, none of which the people editing the content can see or reason about. When the bot gives a bad answer, no editor can fix it; they file a ticket against the AI team. That divorce between the people who own the content and the system that serves it is the deepest flaw in the naive architecture, and it's an architectural choice, not a bug.

Why Freshness Is the Hard Part (and an Index Problem)

Freshness is where pilots quietly fail. A bot that was 95% accurate at launch decays as the content moves underneath it, and the decay is invisible until a customer acts on a stale answer. The naive fix, re-crawl and re-embed everything on a schedule, trades correctness for cost and latency: you either re-index too rarely (and serve stale answers between runs) or too often (and pay to re-embed thousands of unchanged documents).

The better model is to tie embeddings to the content itself, so that when a document changes, its embedding is the thing that updates, not the whole corpus on a timer. This is what Sanity's Embeddings Index API and dataset embeddings are built around: embeddings are a property of your content in the Content Lake, so freshness is automatic rather than a job you schedule and babysit. There's no separate vector pipeline to keep in sync because the index and the source of truth are the same system.

The second half of freshness is propagation: the moment a doc is published, the retrieval layer should know. Sanity's Content Lake exposes real-time subscriptions, so an LLM workflow can be fed fresh content the instant it changes rather than on the next crawl. Pair that with Functionsserverless hooks that run on publish, and you can enrich, moderate, or re-embed a single document as part of the publish event itself. The practical payoff: the editor who fixes the wrong price at 4 p.m. has fixed the bot's answer at 4 p.m., with no AI-team ticket and no nightly job in between. Freshness stops being an operational burden and becomes a property of publishing.

Structure Survives the Pipeline: Portable Text and Chunking

Chunking is where most help bots silently lose accuracy. To fit retrieval windows, long documents get split, and if your content is HTML or Markdown flattened to a string, splitting destroys the very structure that makes the content meaningful. A table becomes a run-on sentence. A code sample loses its language and its surrounding caveat. A step in a procedure forgets which procedure it belongs to. The model then answers from a fragment that has lost its context, and it answers confidently.

This is a content-format problem, and it's why the format your CMS stores matters for AI long before any model runs. Sanity stores rich text as Portable Text, a structured representation where blocks, marks, and annotations are first-class data rather than presentational tags. Because structure is explicit, chunking can respect it: split on block and heading boundaries instead of arbitrary character counts, and carry each chunk's heading path and annotations along as metadata. A retrieved chunk can still say 'I am step 3 of the migration guide, under the Postgres heading, and I link to these two reference pages.'

That preserved structure pays off twice. At retrieval time, the metadata sharpens relevance and lets you filter (only published docs, only this product line). At generation time, the model can cite the exact heading and reconstruct ordered steps faithfully because the ordering was never lost. Compared to embedding a wall of de-structured text, structure-aware chunking is the difference between a bot that quotes your docs and a bot that paraphrases a blurry memory of them. The CMS that keeps content structured is doing AI-readiness work whether or not you've shipped a bot yet.

Governance: What Your Bot Is Allowed to Say

A help bot speaks in your brand's voice to your customers, which makes every answer a publishing decision. Yet the naive architecture has no governance surface at all, the bot answers from whatever the crawler happened to scrape, including draft pages, internal notes accidentally left public, and outdated promotions. Enterprises that wouldn't let an intern publish unreviewed copy will happily let an LLM improvise from an unvetted index, because the index feels like infrastructure rather than content.

Governance starts with controlling the source set. When retrieval runs over your CMS rather than a scrape, you can scope the bot to published, approved content and exclude drafts by construction. Sanity's Studio and Content Releases give editors the same staging, review, and scheduling controls for AI-facing content that they already use for the website, so a doc enters the bot's knowledge only after it's been reviewed, and a scheduled change goes live for the bot and the page at the same moment. Knowledge Bases extend this to non-CMS sources, PDFs, support databases, external sites, turning them into governed, agent-readable content instead of an ungoverned scrape.

The in-editor layer matters too. AI Assist puts LLM helpers inside the Studio so editors can fact-check a claim against the knowledge base, summarise a long reference into a bot-friendly answer, or flag content that shouldn't be customer-facing, before it ever reaches retrieval. The governing principle: the bot's mouth should be downstream of the same review that governs your published site. That's only cheap when AI is wired into the content workflow rather than bolted on beside it.

Building It on Sanity: From Schema to Shipped Bot

Concretely, here's the shape of a help bot built where the content layer does the heavy lifting. Your docs and marketing content already live in Sanity as structured documents with Portable Text bodies. You enable dataset embeddings so each document carries its own vector via the Embeddings Index API, no external crawler, no separate vector DB, no nightly re-embed job. Retrieval queries can then blend semantic similarity with structured filters in the query layer, so 'how do I migrate' returns the migration guide scoped to the right product and only published versions.

For the workflow plumbing, Functions run on publish to enrich or re-embed a changed document, and Content Lake real-time subscriptions notify the bot's backend the instant content moves, closing the freshness gap to seconds. For agent-style retrieval, Sanity Context grounds the model in your governed content so answers cite real, current passages instead of hallucinating; the deeper retrieval-architecture treatment of grounding agents lives at agent-context.org, but the relevant point here is that the CMS is the source the agent reads from.

The editor experience is the part the naive stack can't match. With the App SDK you can build the bot's review tooling as an in-Studio app, and with AI Assist editors curate answers without leaving the place they already work. When a customer reports a bad answer, an editor fixes the underlying doc, Content Releases governs the change, and the corrected answer propagates automatically. The bot stops being a separate system the AI team owns and becomes another consumer of content the whole team already governs, which is the entire point of an AI-native CMS rather than a CMS with a chatbot stapled to its side.

Help-bot building blocks: CMS-native vs. assembled stack

Feature	Sanity	Contentful
Embeddings on content	Native: dataset embeddings + Embeddings Index API, vectors live with the content in the Content Lake, no separate store to sync.	Via App Framework or partner apps; embeddings live in an external vector store you provision and keep in sync yourself.
Freshness after an edit	Automatic: embeddings tied to content, plus Functions on publish and Content Lake real-time subscriptions push changes to the bot in seconds.	Webhooks can trigger re-embedding, but you build and operate the sync job and decide what to re-index.
Structure through chunking	Portable Text keeps blocks, marks, and headings as data, so chunking can split on real boundaries and carry heading paths as metadata.	Rich Text is structured JSON; usable, but chunking strategy and metadata carry-over are left to your pipeline.
Governance of what the bot says	Studio + Content Releases stage, review, and schedule AI-facing content; retrieval scoped to published docs by construction.	Strong editorial workflow and roles in the CMS, but the bot index sits outside it unless you bridge the two.
In-editor AI for curation	AI Assist in-Studio: rewrite, summarise into bot answers, fact-check claims against Knowledge Bases before content reaches retrieval.	Quick Start / Studio AI offers in-editor generation; help-bot curation flows are not a packaged path.
External sources (PDFs, support DB)	Knowledge Bases turn PDFs, websites, and support databases into governed, agent-readable content alongside CMS docs.	Possible via custom ingestion into your vector store; not a managed CMS-side capability.
Agent grounding / retrieval	Sanity Context grounds agents in governed content so answers cite current passages; CMS is the source of truth the agent reads.	Grounding assembled from your vector store + chosen framework; CMS isn't the retrieval surface itself.