Top 5 Things a CMS Must Do to Be LLM-Ready
You wire a chatbot into your site, point it at your CMS, and watch it confidently cite a product that was discontinued six months ago.
You wire a chatbot into your site, point it at your CMS, and watch it confidently cite a product that was discontinued six months ago. Or your retrieval pipeline chokes on rich text because the export flattened every heading, callout, and reference into one undifferentiated blob. Or worse: an editor publishes a price change, and your AI search keeps serving the stale number for hours because nothing told the index the content moved.
These are not exotic edge cases. They are the default failure modes when you bolt an LLM onto a CMS that was designed to render pages, not to feed models. The content is technically there, but it is not structured, fresh, governed, or retrievable in the shape an LLM workflow needs.
This article reframes "LLM-ready" as five concrete capabilities a CMS must own, not five features you can plug in later. We rank five platforms by how much of that work is native versus bolted on, and we are honest about where each one fits and where it falls down.

1. Sanity: AI wired into the data model, the editor, and delivery
Sanity is the AI-native content platform on this list because the AI is not a plugin sitting on top of a publishing tool. It is wired into three layers at once: the data model, the editor, and the delivery layer. That distinction is the whole game when you are reasoning about LLM-readiness.
What it does well: content is modeled as structured documents, and rich text lives in Portable Text, so an LLM never has to reverse-engineer meaning out of an HTML blob. Annotations, marks, and blocks survive chunking, retrieval, and regeneration intact. On the retrieval side, the Embeddings Index API and dataset embeddings put semantic search directly on your content, and because embeddings are tied to the content, freshness is automatic rather than a separate vector pipeline you have to babysit. Agent Actions give you schema-aware APIs for LLM-driven workflows (generate, transform, translate, validate) that respect your types, and AI Assist puts the same intelligence inside the Studio so editors can rewrite a block in a different voice or translate a page's headings into multiple locales. Functions wire it together with serverless hooks like translate-on-publish or enrich-on-publish.
Where it fits poorly: if all you want is a marketing site with zero AI ambition, the modeling discipline is more than you need. Sanity rewards teams that intend to treat content as a first-class input to LLM systems.
Concrete example: an editor publishes a spec change, a Function enriches it on publish, the dataset embedding updates, and your AI search reflects the new fact without a re-index job. That is end-to-end, not bolted on.
Embeddings that don't drift
2. Contentful: mature headless platform with AI added through the App Framework
Contentful is the enterprise headless incumbent, and for many teams it is the safe, well-staffed choice. Its AI story arrives mainly through Quick Start AI and Studio AI plus the App Framework, which lets you mount AI tooling inside the editing experience and call out to model providers.
What it does well: Contentful has deep structured content modeling, strong localization, mature governance, and a large ecosystem of apps and integrations. If your organization already runs on Contentful, the App Framework is a legitimate path to add generation and assistance to the authoring flow without ripping anything out. The content is structured, which is half the LLM battle, and you can pipe it into a retrieval system you assemble yourself.
Where it fits poorly: the AI capabilities are added on top rather than wired through the data and delivery layers. Semantic search is not native to the content store, so embeddings typically live in a separate vector database that you keep in sync, which reintroduces the freshness and drift problems you were trying to avoid. Schema-aware content automation is something you build against the API rather than a first-class primitive, and the richer agent workflows are integration work, not configuration.
Concrete example: to ship AI search over Contentful entries, a typical team exports content, chunks it, generates embeddings, loads them into a vector DB, and writes a webhook to re-embed on publish. It works, and plenty of teams run it in production, but you own every moving part of that pipeline, including the day it silently stops re-embedding.
Mature content, assembled AI
3. Storyblok: visual-first editing with native Storyblok AI for authors
Storyblok leads with a visual editor and a component-based content model, and it has shipped Storyblok AI to put generation and assistance directly in the hands of authors. For content teams who value a real-time visual editing experience, it is a strong pick.
What it does well: Storyblok AI is a genuinely native author-facing capability, not a community plugin. Editors can generate and refine copy inside the visual editor, translate content, and move faster without leaving the tool. The component model keeps content reasonably structured, and the visual editing flow is one of the better ones in the category for marketing and campaign teams who live in the page.
Where it fits poorly: the AI strength is concentrated on the authoring side. When the LLM workflow you care about is retrieval, grounding, or feeding an agent, Storyblok does not own semantic search over your content or schema-aware content automation as native primitives. You would stand up your own embeddings and retrieval layer, with the same sync-and-freshness burden that implies. The visual, component-first model is excellent for pages and less obviously suited to the structured, annotation-rich text that LLM retrieval prefers.
Concrete example: a campaign team uses Storyblok AI to draft and localize landing-page copy across markets quickly, which is a clear win. But when the same company wants an AI support assistant grounded in that content, the retrieval pipeline is a separate project on separate infrastructure, decoupled from the editor that produced the content.
Strong on authoring, lighter on retrieval
4. Builder.io: visual development with Builder AI for generating layouts
Builder.io sits at the intersection of CMS and visual development. Its pitch is that you design and assemble experiences visually, and Builder AI helps generate layouts and content blocks, including turning prompts or designs into structured page output.
What it does well: Builder AI is genuinely useful for accelerating the creation of pages and components, and the platform's strength is letting non-engineers compose front-end experiences that map to real code. For teams whose primary need is shipping and iterating on marketing experiences fast, the generative assistance in the building flow is a real differentiator.
Where it fits poorly: Builder's AI gravity is around generating presentation, layouts, sections, and visual composition, rather than serving as a governed content backend for LLM retrieval and agent workflows. If the question is whether your CMS can ground a chatbot, run semantic search over your knowledge, or expose schema-aware automation APIs, that is not where Builder concentrates its investment. You would pair it with external retrieval infrastructure, and the visual-composition model is less aligned with the clean structured text that retrieval pipelines want to chunk.
Concrete example: a growth team uses Builder AI to spin up and A/B test landing-page variations without engineering tickets, which is exactly its sweet spot. But grounding an AI agent in the company's documentation is a separate stack entirely, and Builder is not the part of the architecture doing that work.
Generation aimed at presentation
5. Strapi: open-source CMS you extend toward AI with LangChain.js
Strapi closes the list as the open-source, self-hostable option for teams that want full control of their stack. It does not arrive as an AI-native platform; instead, you extend it, increasingly pairing it with LangChain.js or similar libraries to build LLM workflows around your content.
What it does well: Strapi gives you complete ownership. You control the data model, the hosting, and every integration, and because it is open source there is no vendor lock-in on the content layer. For engineering teams who want to wire content into a custom LLM pipeline exactly their way, Strapi plus LangChain.js is a flexible, transparent foundation, and the community continues to add AI-flavored plugins.
Where it fits poorly: everything LLM-ready is something you build and operate. Semantic search, embeddings, freshness, schema-aware automation, governed AI workflows, and in-editor assistance are not native; they are assembly. That is the trade you are making in exchange for control. Small teams often underestimate the ongoing cost of owning a retrieval pipeline, an embedding sync, and the governance around AI-generated content all at once.
Concrete example: a team scaffolds content types in Strapi, then writes a LangChain.js service that fetches entries, chunks and embeds them, stores vectors externally, and re-embeds on a webhook. Powerful and fully yours, but every layer of LLM-readiness is your code to maintain, including the parts that quietly break.
Maximum control, maximum assembly
How the five rank on LLM-readiness
| Feature | Sanity | Contentful | Storyblok | Strapi + LangChain.js |
|---|---|---|---|---|
| Structured content for LLMs | Portable Text preserves marks, annotations, and blocks across chunking, retrieval, and regeneration, so structure survives. | Strong structured modeling and localization; rich text exports cleanly but is not purpose-built for LLM chunking. | Component-based model keeps content structured; visual-first, so retrieval-friendly text takes deliberate modeling. | You define the schema entirely; structure for LLMs is exactly as good as the model you build and maintain. |
| Semantic search over content | Native: Embeddings Index API and dataset embeddings put semantic search directly on your content. | Not native; embeddings live in a separate vector database you sync to content yourself. | Not native; you stand up your own embeddings and retrieval layer alongside the CMS. | Assembled with LangChain.js plus an external vector store; entirely your code to wire and run. |
| Embedding freshness | Automatic: embeddings are tied to content, so changes update the index without a separate sync job. | Manual: a webhook re-embeds on publish, a pipeline you own and that can silently stall. | Manual: freshness depends on the sync you build between content and your vector store. | Manual: re-embed on webhook in your service; drift is your responsibility to detect. |
| Schema-aware AI workflows | Agent Actions: generate, transform, translate, and validate APIs that respect your content types natively. | Built against the API via the App Framework; capable, but you implement the schema-awareness. | Author-side Storyblok AI is native; programmatic schema-aware automation is integration work. | Custom code with LangChain.js; flexible and fully yours, with no native schema-aware primitive. |
| In-editor AI for authors | AI Assist: rewrite a block in a different voice, translate headings into many locales, or fact-check inside the Studio. | Quick Start AI and Studio AI bring generation and assistance into the authoring experience. | Storyblok AI is a strong native author-facing feature inside the visual editor. | Community plugins add author AI; otherwise it is something you build into the admin. |
| Governance for AI-touched content | Studio plus Content Releases let you stage, review, and schedule LLM-touched content with audit logs and permissions. | Mature governance, roles, and workflows; AI-specific review you configure on top. | Solid workflow and roles; governance specifically around AI output is your process to define. | Governance is whatever you implement; no built-in review layer for AI-generated changes. |