Embeddings & Semantic Search7 min read

How to Use the Embeddings Index API to Power AI Search

A shopper types "trail runners under $150, in stock at the Portland warehouse, men's size 11" into your AI search box, and the model returns nothing useful, or worse, confidently invents a product that does not exist.

A shopper types "trail runners under $150, in stock at the Portland warehouse, men's size 11" into your AI search box, and the model returns nothing useful, or worse, confidently invents a product that does not exist. This is the most common failure mode in AI-powered search, and it is almost never a model problem. It is a retrieval problem. The query carried a structural component (a price ceiling, a stock status, a size) that pure vector similarity simply cannot resolve, because nearest-neighbor math does not respect a price filter or an inventory flag.

Most products marketed as "AI search" are pure embeddings: encode the content as vectors, encode the query as a vector, return the nearest neighbors. That works for "find me something like a trail runner," and it falls over the moment a real constraint enters the sentence. Sanity, the AI-native content platform and the Content Operating System for the AI era, treats search differently: structured predicates do the filtering that has to hold, then a score pipeline blends keyword and semantic ranking, all in one query over the content store.

This guide walks through how to power AI search the way it actually holds up in production: hybrid retrieval, embeddings tied to your content, and freshness you do not have to babysit.

Why pure embeddings break on real queries

Embeddings search is seductive because the happy path demos beautifully. Encode every product description as a vector, encode the user's phrase as a vector, return the documents whose vectors sit closest in high-dimensional space. Ask for "something cozy for autumn" or "a shoe like a Hoka" and the results feel like magic, because the vibe of the query and the vibe of the content are exactly what cosine similarity is good at comparing.

The trouble is that production queries are rarely pure vibe. They smuggle in constraints that have to hold: a price ceiling, a category, a version number, an "in stock" flag, a warehouse location. Vector similarity does not respect any of those. A $200 shoe and a $140 shoe can sit right next to each other in embedding space, so a filter like "under $150" gets silently ignored. The query comes back empty or wrong, and depending on how the prompt is written, the model either hallucinates a plausible-sounding answer or hedges into uselessness. As Sanity's field guide on agent retrieval puts it bluntly, "Most AI-powered search products are this and only this."

The instinct to fix this with a bigger model is misplaced. Retrieval is where most agents fail, including Sanity's own. A smarter reasoning model cannot recover a product that the retrieval step never returned. The fix lives one layer down, in how the search itself is composed, which is why the rest of this guide is about query construction rather than prompt engineering. Get retrieval right and a mid-tier model looks brilliant. Get it wrong and the best model on the market still invents inventory.

The opposite trap: pure structured query

If pure embeddings fail on constraints, the obvious overcorrection is to go fully structured. Write a predicate in GROQ, SQL, or GraphQL, and you get exactly what you asked for. No fuzziness, no surprises, every filter respected. For a query like "trail runners under $150, men's size 11, in stock," a structured query is precise and fast, and it will never hand you a $200 shoe by accident.

But structured query has a mirror-image weakness. It falls over the moment the user says "something like X," or "the cozy one," or anything else that lives in vibes rather than fields. You cannot write a WHERE clause for "feels premium." Worse, structured query assumes the user already knows the exact shape of what they want, which is precisely the assumption that breaks down when someone is exploring, browsing, or chatting with an agent. Conversational search is a discovery interface, and discovery is mostly imprecise by nature.

So the two naive approaches fail in complementary ways. Pure embeddings handle the vibe and drop the constraints. Pure structured query handles the constraints and drops the vibe. Real queries, the ones your users actually type, contain both at once: "trail runners under $150 like a Hoka" is half filter, half feeling. Any search architecture that can only do one of those will disappoint on a large fraction of real traffic. The lesson Sanity drew from its own production telemetry is the same one Anthropic's researchers reached independently: no single retrieval layer is enough. The answer is not to pick one. It is to blend them.

Illustration for How to Use the Embeddings Index API to Power AI Search
Illustration for How to Use the Embeddings Index API to Power AI Search

Hybrid retrieval, and why it wins

Hybrid retrieval is the discipline of running three layers together: keyword search (BM25) for literal matches, embeddings for semantic ranking, and structured predicates for the filters that have to hold. Each layer covers the others' blind spots. BM25 catches exact terms like a model name or SKU that an embedding might smear away. Embeddings catch the intent behind "cozy" or "like a Hoka." Predicates enforce the price ceiling and the stock flag that neither of the other two can be trusted with.

The gains are not theoretical. Anthropic's contextual retrieval research measured the layering directly: contextual embeddings cut top-20 retrieval failures by 35%, adding contextual BM25 took that to 49%, and adding reranking on top brought it to 67%. The shape of that result is the whole argument. None of the three layers alone was enough; each one stacked measurable improvement on the last. You do not have to read the paper closely to absorb the lesson, you just have to notice that the best number came from combining, not choosing.

Sanity's own production data tells a parallel story from a different angle. When you look at how agents actually call the Sanity Context MCP endpoint, structured retrieval dominates: the heavy majority of calls are GROQ queries and schema lookups, with semantic search a small slice on top. Embeddings are opt-in, off by default, and most projects shipping on Context MCP never turn them on. The takeaway the team states plainly is that "We have embeddings is not a retrieval strategy." Embeddings are one ingredient. Hybrid is the recipe, and the structured layer is doing more of the work than the marketing around vector search would suggest.

What hybrid search looks like in GROQ

Here is where the abstract gets concrete. Sanity expresses hybrid search natively in GROQ, using the text search operators documented in the Sanity docs, so you do not assemble it from three separate systems. The shape is: a predicate filters down to documents that satisfy the constraints that must hold, then a score() pipeline ranks what survives, then you order by the blended score and take the top slice.

The score pipeline is the interesting part. It blends a BM25 keyword match on the title, written as boost([title] match text::query($queryText), 2), with a semantic similarity score across the document via text::semanticSimilarity($queryText). The title match is weighted 2x with boost() because a hit in the title matters more than a hit buried in body copy. Then order(_score desc) [0...10] returns a small, ranked list that matches both the structural constraints and the vibe. One query, one place, three retrieval strategies cooperating.

To be clear and fair, you do not need GROQ to do this. PostgreSQL can, with pgvector and full-text search. Elasticsearch can. Algolia is built for the structured-plus-relevance case. Pinecone plus a metadata filter layer can. What none of them lets you do is pure-vector your way out of the empty-result problem, and each of them asks you to stand up and maintain a separate search index alongside your content store. The difference with the Sanity approach is not that hybrid is possible, it is that hybrid lives inside the content layer, expressed in the same query language you already use to read content, so there is no second system to keep in sync.

Freshness: the line item nobody budgets for

The demo never shows you the hard part. The hard part is keeping the search index fresh after launch, and it is where most home-grown AI search projects quietly bleed engineering time. The moment your content changes, every layer of your retrieval stack has to change with it: the keyword index has to reindex, the embeddings have to re-embed, deleted documents have to actually disappear, and schema changes have to be backfilled across everything already stored.

When search lives in a separate vector database plus glue code, all of that is your problem. Building it yourself means incremental indexing, re-embedding on change, deletion handling, eventual-consistency reasoning, and backfill for schema changes. As the Sanity docs put it, that is "a real project and a class of bug all its own." Freshness stops being a one-time setup task and becomes a permanent line item on your roadmap, a standing tax on every team that touches the content model. Stale embeddings are insidious precisely because they fail silently: search keeps returning results, they are just subtly wrong, pointing at last quarter's catalog.

This is the part Content Lake handles on your behalf. Because embeddings are tied to the content rather than maintained as a separate copy, freshness is automatic; when the content changes, the index that powers search changes with it. This maps directly to two of Sanity's pillars: Automate everything, in that the freshness pipeline you would otherwise hand-build is simply absent, and Power anything, in that the same governed content store feeds your website, your apps, and your AI search from one source of truth. The work you do not have to do is the whole point.

Putting it into production with Sanity Context

Powering AI search well is less about a single API call and more about giving your retrieval layer durable, governed access to structured content. Sanity Context is the product that does this: a way to give agents structured, governed access to your content. Context MCP is one surface of it, a hosted read-only endpoint that any agent loop can connect to, but the mental model to keep is that Sanity Context has an MCP, a knowledge base, and an ingest path. It is not only an MCP. That matters because real AI search systems pull from more than your live dataset; they pull from documentation, support content, and ingested sources too.

What does "good" look like in practice? When Sanity ran schema exploration against Sonos's product catalog, blending structured exploration with reasoning, it landed around 83% accuracy on a mix of difficulties, using Sonnet 4.5 with roughly 40 seconds of thinking per hard question. That number is honest, not a rounding-up to 99%, and it tells you something important: even with strong retrieval, hard mixed-constraint queries are genuinely hard, and the path to better numbers runs through better retrieval composition, not a better model alone.

For enterprises, the governance story is inseparable from the search story, because AI search is only as trustworthy as the content store behind it. Sanity's verifiable footprint here includes SOC 2 Type II, GDPR, regional hosting and data residency, and a published sub-processor list. As the intelligent backend for companies building AI content operations at scale, Sanity treats AI search not as a bolt-on plugin but as a first-class query over governed, structured, fresh content, which is the only version of AI search that survives contact with production traffic.

AI search architectures: where the embeddings and freshness work actually lives

FeatureSanityPineconeAlgoliapgvector / Elasticsearch
Hybrid query (keyword + vector + filters)Native in one GROQ query: score(boost([title] match text::query($q), 2), text::semanticSimilarity($q)) blended with structured predicates that must hold.Vector-native with a metadata filter layer for structural constraints; keyword and lexical relevance live in a separate system you wire in.Built for the structured-plus-relevance case and strong at ranked keyword search with filters; vector similarity supported alongside it.Hybrid lexical plus vector is achievable with full-text search and the vector extension, but you compose and tune the blend yourself.
Where the index lives relative to contentInside the content layer: embeddings are tied to content in Content Lake, queried in the same language you already read content with.Separate vector database you keep in sync with your content backend through your own ingestion pipeline.Separate search index synced from your content store rather than search living inside the content layer.A standing database or cluster you operate alongside the content store, with sync logic you own.
Index freshness on content changeContent Lake keeps the index fresh on your behalf; because embeddings are tied to content, re-embedding on change is automatic, not a roadmap item.You maintain re-embedding, metadata sync, and deletion handling; freshness becomes a permanent line item on your roadmap.You operate the sync pipeline that pushes content changes into the index and handles deletions and reindexing.You own incremental indexing, re-embedding, deletion handling, eventual consistency, and backfill for schema changes.
Empty-result / hallucination protectionStructured predicates enforce constraints like price and in-stock before ranking, so vibe-only matches cannot bypass filters that must hold.Metadata filters can enforce constraints, but the architecture leans vector-first, so filter discipline is on you to apply.Strong filter support makes constraint enforcement straightforward within its query model.Predicates available via SQL or query DSL; you assemble the guardrails that keep vector similarity from ignoring constraints.
Agent / MCP access to retrievalSanity Context exposes a hosted read-only Context MCP endpoint any agent loop can connect to, plus a knowledge base and ingest path.Accessed as a vector API; agent integration and grounding logic are assembled in your application layer.Accessed via search API and clients; agent-facing retrieval is wired up in your own stack.Database access via drivers and query APIs; any agent retrieval surface is something you build.
Governance and compliance footprintSearch runs over the governed content store with SOC 2 Type II, GDPR, regional hosting and data residency, and a published sub-processor list.Carries its own platform compliance posture; governing the content feeding it remains a separate concern in your backend.Carries its own platform compliance posture; content governance lives in the system of record you sync from.Governance and compliance depend on how you host and operate the database or cluster yourself.