How to Use Functions to Run AI at Publish Time

Your editor publishes a product page in French, and forty minutes later support is fielding tickets because the German, Japanese, and Spanish versions never got translated, the page shipped without alt text, and a competitor's name slipped through unredacted. The work that was supposed to happen "on publish" happened in someone's head, on someone's calendar, in a backlog ticket that aged out. That gap between the publish button and the content actually being ready is where AI operations quietly fail at scale.

Sanity is the AI Content Operating System, an intelligent backend that treats publish time as a programmable event rather than a dead end. The lever for this is Functions: serverless content automation hooks that fire when content changes, so the LLM work you want, translation, enrichment, moderation, summarization, runs deterministically inside the editorial loop instead of as a manual afterthought.

This guide walks through how to run AI at publish time with Functions, where the governance boundaries belong, and how that compares to bolting an external pipeline onto a traditional CMS.

The failure mode: AI work that depends on a human remembering

Most teams already have AI in their content stack. The problem is where it lives. A writer opens ChatGPT in another tab, generates a meta description, pastes it back, and hopes they did it for every page. A localization manager exports a spreadsheet, runs it through a translation model, and re-imports it days later. A trust-and-safety reviewer eyeballs user-generated content when they have time. Each of these is real AI usage, and each one is a manual step glued to a human's attention. The moment volume rises, the glue fails. Pages publish without the enrichment, locales drift out of sync, and unmoderated content reaches production because the one person who checks was on vacation.

The deeper issue is that publishing in a legacy CMS stops at the publish event. The system's job, as it understands it, is to store the document and serve it. Anything that should happen because content changed, translate it, summarize it, score it for policy violations, has to be wired up outside the CMS as a cron job, a webhook listener, or a queue worker that the content team neither owns nor can see. This is the silo problem: the editorial system and the automation system are different worlds, and the seam between them is where content quality leaks out.

Reframing publish time as an event you can attach logic to changes the economics. Instead of scaling the number of people who remember to do the AI step, you scale the output by making the step run itself. That is the shift from people-driven AI to pipeline-driven AI, and it is the difference between AI that demos well and AI that holds up at a million documents.

What Functions actually are, and why publish is the right trigger

Functions are serverless content automation hooks that run in response to content events inside Sanity. When a document is created, updated, or published, a Function can fire and do work: call an LLM, transform the document, write derived fields back, or hand off to another service. Because they run on Sanity's infrastructure rather than a server you maintain, there is no queue worker to babysit and no webhook endpoint that silently 404s after a deploy.

Publish is the right trigger for AI work for a specific reason: it is the moment content crosses from draft intent into something real users will see. Running enrichment on every keystroke is wasteful and noisy; running it nightly means stale or missing derivations during the window that matters most. Publish-time execution means the AI work is bound to the exact event that changes what the world sees, so a translate-on-publish Function guarantees the locales exist the instant the source goes live, and a moderate-on-publish Function guarantees nothing reaches production without being scored first.

The canonical patterns map cleanly to Functions. Translate-on-publish takes the source document, calls a model to render each target locale, and writes the localized documents or fields back into Content Lake. Enrich-on-publish generates the meta description, the summary, the alt text, or the embeddings the page was missing. Moderate-on-publish scores user-generated or AI-generated content against your policy and either flags it for review or blocks it. Because Functions are schema-aware, they operate on structured fields, not a blob of HTML, so the model's output lands in the right place and stays queryable with GROQ afterward. The work is connected to the content model, not stapled to its surface.

Illustration for How to Use Functions to Run AI at Publish Time

Designing a publish-time AI pipeline that does not betray your editors

A naive publish-time Function is a foot-gun: it lets an LLM write directly to production with no human in the loop, and the first hallucinated price or off-brand rewrite teaches everyone to distrust the automation. The design question is not whether to run AI at publish, but where the review boundary sits relative to the publish event.

There are three honest patterns. First, derive-then-publish: the Function produces low-risk derived content (embeddings, internal tags, search keywords) that never faces a customer, so full automation is safe. Second, generate-into-draft: the Function generates customer-facing content but writes it as a draft or into a Content Release rather than straight to production, so an editor stages, reviews, and ships it. This is where Studio and Content Releases earn their place: the AI does the labor, the human owns the decision, and the whole thing is reviewable and schedulable. Third, gate-on-policy: a moderation Function scores content and blocks or flags it before it can publish, inverting the relationship so AI is the guard rather than the author.

The mistake teams make is collapsing all three into one risky pattern: let the model write to prod and hope. The better mental model is that Functions handle the mechanical labor while the governance surface, drafts, Content Releases, Roles and Permissions, decides what is allowed to become live. AI Assist covers the in-editor case where a person wants help generating or rewriting a block on demand; Agent Actions and Functions cover the pipeline case where the work should happen because content changed, not because someone clicked. Choosing the right boundary per content type is the actual engineering work, and it is what separates a pipeline editors trust from one they route around.

Keeping derived content fresh without a second system to maintain

Publish-time AI is not only about generating new text. A large share of AI operations is keeping derived artifacts in step with source content, and the classic version of this is embeddings for semantic search. The trap is the two-system architecture: your content lives in the CMS, your vectors live in a separate vector database, and a brittle sync job tries to keep them aligned. Every time a document changes, something has to re-embed it and upsert the vector, and when that job lags, your semantic search returns answers based on content that no longer exists.

The cleaner model is to keep the derivation tied to the content itself. With the Embeddings Index API and dataset embeddings, embeddings are a property of the content rather than a copy living in an external store, so freshness is automatic: when the content changes, the index reflects it without a separate pipeline to monitor. A publish-time Function can also feed downstream LLM workflows directly, and Content Lake real-time subscriptions mean other systems can react the moment content changes rather than polling on a schedule.

Structure is what makes this hold up. Portable Text keeps rich content as structured blocks, marks, and annotations rather than opaque HTML, which matters enormously for AI: when an LLM chunks, retrieves, or regenerates content, the structure survives instead of being flattened into a string that loses its headings and links. So a publish-time Function that re-embeds a Portable Text body is embedding something the model can actually reason over chunk by chunk. The result is one system of record where the AI-derived layer stays consistent with the source, instead of two systems and a sync job you pray about.

Governance, compliance, and the audit trail AI operations demand

The fastest way to lose an enterprise's trust in publish-time AI is to make AI-touched content indistinguishable from human-authored content with no record of what happened. When a regulator, a legal team, or an internal reviewer asks why a page said what it said, "an LLM generated it at publish, we are not sure with what input" is not an acceptable answer. AI operations at scale require the same auditability as any other production system.

This is where a Content Operating System differs from a bolt-on. Because Functions run inside Sanity rather than as an external script, the work is observable in the platform: Audit logs capture changes, Roles and Permissions constrain who can configure automation and who can approve its output, and Content Releases give a staged, reviewable container for AI-generated batches before they go live. The governance is not a separate compliance layer you build; it is the same governance that already covers your editorial workflow, extended to cover the AI steps.

On the platform side, the controls enterprises ask for are present: SOC 2 Type II, GDPR compliance, regional hosting and data residency options, and a published sub-processor list so you know which services touch your content, which matters specifically because AI workflows route content through model providers. The discipline this enables is concrete. Mark documents with which Function and model version produced a derived field, keep the generation reviewable in a draft or Release rather than auto-publishing customer-facing claims, and you get AI operations that survive an audit instead of triggering one. The point is not that AI is risky and should be caged; it is that governed AI is the only kind that scales past a pilot.

From pattern to production: a publish-time rollout that sticks

The teams that succeed with publish-time AI do not flip on every Function at once. They start with the derive-then-publish category, the low-risk work where the output never faces a customer directly, because it builds confidence without putting the brand at stake. Enrich-on-publish for embeddings, internal tags, and search metadata is a good first Function: it is genuinely useful, it is safe to fully automate, and it demonstrates that publish time can carry logic reliably.

The second wave is the customer-facing generation work, deliberately routed through review. Translate-on-publish that writes locales into a Content Release, summary and meta-description generation that lands in a draft for an editor to approve, alt-text generation a human can correct. Here the Function does the volume work that no team could staff manually, while Studio keeps a person on the decision. This is the pillar in practice: automate everything that can be automated, power the workflows that consume the output, and model the business so the structured fields the Functions write to actually mean something downstream.

The third wave is the gating work: moderate-on-publish for user-generated or AI-generated content, fact-check passes against a knowledge base before publish, policy scoring that blocks rather than generates. Sequenced this way, each wave earns the trust the next one needs. The reframe to carry out of this guide is simple: publish is not the end of the content lifecycle, it is a programmable event, and the CMS that treats it as one lets you scale output instead of scaling the number of people who remember to do the AI step. That is what it means for AI to be wired into the platform rather than glued onto it.

Running AI at publish time: native pipeline versus bolt-on automation

Feature	Sanity	Contentful	Strapi + LangChain.js	Webflow
Publish-time AI trigger	Native: Functions fire on create, update, and publish events inside the platform, no external worker to host or monitor.	App Framework apps and webhooks can call out to your own services, so the automation runs on infrastructure you operate, not inside Contentful.	You wire LangChain.js to Strapi lifecycle hooks yourself; flexible, but the pipeline is custom code you build and maintain.	Webflow AI assists in-editor generation; event-driven publish-time pipelines rely on external automation via webhooks and Zapier-style tools.
Schema-aware output	Agent Actions and Functions operate on structured fields, so model output lands in typed fields that stay queryable with GROQ.	Output structure depends on how you map model responses back to fields in your own app code; not handled by the CMS.	Mapping LLM output to content-type fields is your responsibility in the integration layer; nothing schema-aware out of the box.	AI features target page and copy generation in the designer rather than writing into a strongly typed content model.
Embeddings freshness	Embeddings Index API and dataset embeddings tie vectors to content, so changes reflect without a separate sync job.	Typically pair with an external vector database; you own the re-embed and upsert pipeline keeping it in sync on publish.	LlamaIndex or a vector store plus your own re-embed logic; freshness depends on the sync code you write and run.	No native embeddings or semantic-search layer; semantic features require an external search or vector service.
Structure preserved for LLMs	Portable Text keeps blocks, marks, and annotations intact through chunking and retrieval rather than flattening to HTML.	Rich Text is structured JSON, though preserving it cleanly through chunking and retrieval is left to your pipeline.	Content format depends on your fields; rich text often serialized to HTML or Markdown, which can lose structure on chunking.	Content is design-bound HTML, which tends to flatten structure when chunked for LLM retrieval.
Review boundary for AI output	Content Releases and drafts stage AI-generated content for human review before it reaches production.	Workflows and scheduled publishing exist; routing AI output through them requires building the integration yourself.	Draft and publish states exist; staging AI batches for review is custom workflow you implement.	Staging and review for automated content is limited; AI output generally lands in the editor directly.
Governance and audit of AI steps	Audit logs, Roles and Permissions, plus SOC 2 Type II, GDPR, data residency, and a published sub-processor list cover the AI work.	Enterprise governance and audit features exist for the CMS; auditing AI steps that run in your external services is on you.	Self-hosted, so governance and audit are whatever you build and operate around the integration.	Platform-level controls exist; auditing external AI automation depends on the tools you connect.