What Is an Agentic Semantic Layer?

An agentic semantic layer is a metrics layer built for AI agents. It defines business logic once and exposes it via MCP or API so agents query governed definitions, not raw SQL.

15 min read

An agentic semantic layer is a metadata layer between AI agents and a data warehouse that defines business metrics, enforces access control, and exposes governed query interfaces. Instead of writing raw SQL, agents query metric definitions through protocols like MCP (Model Context Protocol) or REST APIs. Every agent gets the same answer because the business logic is defined once, not interpreted per query.

The "agentic" distinction matters. Traditional semantic layers were built for BI dashboards and human analysts. An agentic semantic layer is built for programmatic consumers: LLMs, AI agents, SDKs, and applications. The interface, security model, and deployment patterns are different.

The problem with AI analytics today

AI agents are becoming a primary interface to data. Executives ask Claude for quarterly numbers. Product managers ask Cursor for usage metrics. Customer success teams ask chatbots for account health scores. The agent is the new dashboard.

But most data infrastructure wasn't built for this. Two problems dominate.

Text-to-SQL breaks in production

The default approach: give an agent access to your warehouse and let it write SQL. It works in demos. In production, three failure modes show up every time.

Inconsistent answers. Ask Claude "What was Q1 revenue?" It writes a query summing amount from orders where created_at falls in Q1. Reasonable. Ask GPT the same question. It filters on status = 'completed' first, then sums. Different number. Also reasonable. Neither agent knows that your finance team excludes refunds and trial conversions. That logic lives in a Notion doc that got updated six months ago. The agents don't have access to it. They generate plausible SQL from column names and return plausible numbers that don't match anything your team reports.

This isn't a model quality problem. It's a context problem. The agent sees schema, not semantics. Better models produce better-looking SQL that's still wrong in the same ways.

No access control. You're building a B2B product. Customer A asks your agent a question. The agent writes SQL against the warehouse. Nothing in the text-to-SQL pipeline enforces that Customer A only sees Customer A's data. You can patch this: add tenant filters to prompts, write middleware that rewrites queries, build a validation layer. Each patch is a new surface for bugs. One missed filter is a data leak in production.

No audit trail. When your CFO asks "where did this number come from?", the answer is "an LLM generated some SQL." You can't trace which metric definition was used because there wasn't one. You can't reproduce the result because the prompt context has changed. You can't prove the number was correct because correctness wasn't defined.

Legacy semantic layers weren't built for agents

Traditional semantic layers solve the consistency problem for dashboards. Looker's LookML, Tableau's semantic model, Power BI's DAX measures. They define metrics once and serve them to BI consumers. That's genuinely useful.

But they were designed for a world where a human browses a catalog, selects metrics, and views a dashboard. That interaction model doesn't translate to AI agents.

Agents don't browse catalogs. They call tools at runtime. They need programmatic discovery (what metrics exist?), programmatic querying (give me revenue by region for Q1), and programmatic access control (this agent represents Tenant X and can only see Tenant X's data). They need this over a standardized protocol, not a proprietary query language embedded in a BI tool.

A BI-embedded semantic layer can't serve an MCP client. It can't issue publishable keys per tenant. It can't serve a React SDK, a TypeScript SDK, and a markdown dashboard from the same definitions. The interface is wrong.

That's what an agentic semantic layer changes.

How it differs from a traditional semantic layer

Traditional semantic layer Agentic semantic layer
Primary consumer BI dashboards, analysts AI agents, LLMs, applications
Interface SQL, proprietary query language (LookML, DAX) MCP, REST API, SDK
Discovery Human browses a catalog in a GUI Agent calls explore_schema at runtime
Multi-tenancy Often manual or absent Per-query enforcement, publishable keys per tenant
Access control Dashboard-level or role-based Row-level, per-consumer, structural
Query volume Dozens of dashboard refreshes per hour Hundreds of agent queries per minute
Caching Dashboard-level refresh Pre-aggregation with automatic invalidation
Deployment GUI-configured, click-ops YAML in Git, CLI-driven, CI/CD
Schema management GUI editor or proprietary file format Version-controlled code, PR reviews

The core shift: an agentic semantic layer treats programmatic access as the primary use case, not an afterthought. MCP support, multi-tenant keys, and row-level security aren't add-ons. They're the foundation.

Core capabilities

An agentic semantic layer needs five capabilities to work in production. Missing any one of them and you'll end up rebuilding it later.

1. Machine-readable metric definitions

Metrics defined in YAML or a similar declarative format that both humans and machines can read. Not trapped in a GUI. Not embedded in a proprietary tool.

cubes:
  - name: orders
    sql_table: public.orders
    measures:
      - name: total_revenue
        sql: "CASE WHEN status != 'refunded' AND type != 'trial' THEN amount ELSE 0 END"
        type: sum
        description: "Total revenue excluding refunds and trials"
      - name: order_count
        type: count
    dimensions:
      - name: status
        sql: status
        type: string
      - name: category
        sql: category
        type: string
      - name: created_at
        sql: created_at
        type: time

total_revenue is now a governed definition with a description the agent can read. Deploy this schema and any MCP-compatible agent (Claude, Cursor, or any client supporting the protocol) can discover and query these metrics at runtime.

The description field matters more than it looks. When an agent calls explore_schema, descriptions are what it uses to decide which metric answers the user's question. Good descriptions are the difference between an agent that picks the right metric and one that guesses.

2. Programmatic discovery and querying

Agents need a standardized way to find out what metrics exist and query them. This is what MCP (Model Context Protocol) provides.

The agent calls explore_schema to see available cubes, measures, and dimensions. It reads descriptions to understand what each metric represents. Then it calls query with the right measures, dimensions, and filters. It never writes SQL. The semantic layer handles query generation, execution, caching, and access control.

Agent: explore_schema → "orders cube has total_revenue, order_count, status, category, created_at"
Agent: query(measures: [total_revenue], dimensions: [status], filters: [{created_at: last 90 days}])
Semantic layer: generates SQL, executes against warehouse, returns structured result

This is fundamentally different from text-to-SQL. The agent isn't interpreting column names. It's selecting from governed definitions. The query isn't generated by the LLM. It's generated by the semantic layer's query engine, which knows how to write correct SQL for your specific warehouse dialect.

Five tools make this work:

  • explore_schema: Discover available cubes, measures, and dimensions with descriptions
  • query: Fetch aggregated data using governed metric definitions
  • sql_query: Run queries for edge cases that need custom SQL (still governed by access controls)
  • describe_field: Get detailed metadata about a specific measure or dimension
  • visualize: Render charts directly in the conversation from query results

3. Structural multi-tenancy

If you're building a B2B product, every agent query needs to be scoped to a specific tenant. This can't be a prompt-level instruction ("only show data for customer X"). It needs to be structural: impossible to bypass regardless of what the agent does.

    security_context:
      - name: tenant_filter
        sql: "{SECURITY_CONTEXT.tenant_id} = customer_id"

Every query for a given tenant automatically includes this filter. The consumer can't skip it. The agent can't override it. Generate a publishable key per tenant, and that key carries the security context. The agent authenticates with the key and every query is automatically scoped.

This is how you ship AI-powered analytics to B2B customers without building a custom access control layer. The semantic layer handles isolation. You handle the product.

4. Pre-aggregation for agent-scale query volume

AI agents make more queries than humans. A human refreshes a dashboard once. An agent might make 10 queries to answer one question: exploring the schema, trying different dimensions, following up on anomalies.

Pre-aggregation handles this. Define which metric combinations to pre-compute:

    pre_aggregations:
      - name: daily_revenue
        measures:
          - total_revenue
          - order_count
        dimensions:
          - status
          - category
        time_dimension: created_at
        granularity: day
        refresh_key:
          every: 1 hour

The semantic layer builds and maintains these rollups. Hot queries hit the cache (single-digit milliseconds). Cold queries fall through to the warehouse. Without this, agent workloads can overwhelm your warehouse, especially when multiple customers' agents are querying simultaneously.

Pre-aggregation isn't a performance optimization. At agent scale, it's a requirement.

5. Schema-as-code with version control

Metric definitions should live in Git. Changes should go through pull requests. Rollbacks should be git revert.

This isn't just good engineering practice. It's how you maintain trust in an agentic system. When someone asks "why did the revenue number change?", the answer is a Git commit, not "someone clicked something in the admin UI." When a definition is wrong, you revert it and every consumer immediately gets the corrected version. No agent retrained. No dashboard patched. One diff in your schema repo.

git diff HEAD~1 schema/orders.yml  # see what changed
git revert abc123                   # undo a bad metric definition
bon deploy                          # every consumer gets the fix

Who needs an agentic semantic layer?

Different teams interact with the agentic semantic layer differently. The common thread: everyone benefits from one source of truth for metric definitions.

Data engineers and analytics engineers define the metrics. They write the YAML, review changes in PRs, and manage pre-aggregation strategies. The agentic semantic layer gives them a single place to define business logic instead of maintaining it across dashboards, notebooks, and custom API endpoints.

Engineering leads and product teams consume the metrics. They embed charts in their product with the React SDK, connect AI agents via MCP, and build features on the REST API. The semantic layer means they don't need the data team to build a custom endpoint for every new feature.

Data leaders govern the metrics. They ensure definitions are correct, access controls are appropriate, and audit trails are maintained. The semantic layer centralizes governance instead of distributing it across tools.

Customers (in B2B products) query their own data. Through embedded dashboards in your product, through AI agents connected via publishable keys, or through APIs. The semantic layer ensures they see only their data, with the same metric definitions your internal teams use.

How it fits the modern data stack

An agentic semantic layer doesn't replace your existing infrastructure. It sits on top of it.

[Data Sources] → [Ingestion (Fivetran, Airbyte)] → [Warehouse (Snowflake, BigQuery)]
                                                            ↓
                                                    [dbt transformations]
                                                            ↓
                                                  [Agentic Semantic Layer]
                                                    ↙    ↓    ↘
                                            [MCP Agents] [React SDK] [REST API]
                                            [Dashboards] [TypeScript SDK]

Your ingestion pipeline feeds raw data into the warehouse. dbt transforms it into clean tables. The agentic semantic layer defines business metrics on those tables and serves them to every consumer through the appropriate interface.

The semantic layer connects to your warehouse (Snowflake, BigQuery, Databricks, PostgreSQL (including Supabase, Neon, and RDS), Redshift, DuckDB (including MotherDuck)) and generates the appropriate SQL dialect. Swap warehouses without changing your metric definitions or consumer integrations.

This is the key architectural property: the semantic layer decouples metric definitions from both the warehouse and the consumer. Change your warehouse and consumers don't notice. Add a new consumer and the warehouse doesn't change. The semantic layer is the stable interface between infrastructure and applications.

Agentic semantic layer tools compared

Tool Agent support Multi-tenancy Pre-aggregation Open source Primary use case
Cube REST API (no MCP) Manual Yes (battle-tested) Yes Headless BI, API-first analytics
dbt MetricFlow None None None Yes Metric documentation in dbt
Looker (LookML) None Manual Limited No Enterprise dashboard BI
AtScale Partial Enterprise Virtual caching No Enterprise BI compatibility
ThoughtSpot Proprietary ("Spotter") Enterprise Proprietary No AI-powered BI platform
Bonnard MCP native Built-in (publishable keys) Yes (Cube engine) Yes (Apache 2.0) Agent-native, multi-surface analytics

The right choice depends on whether AI agents are your primary consumer or a secondary integration. If agents are an afterthought, most semantic layers can be retrofitted with API access. If agents are the primary interface, you need a layer designed for programmatic consumers from the ground up.

Getting started

The fastest path from zero to a working agentic semantic layer:

Cloud:

npm install -g @bonnard/cli
bon init
bon deploy

Self-hosted:

npm install -g @bonnard/cli
npx @bonnard/cli init --self-hosted
docker compose up -d
bon deploy

bon init also generates agent configs (rules and skills for Claude Code, Cursor, and Codex) so your AI coding assistants understand your semantic layer from the first prompt. Run bon mcp to output MCP connection configs for your agents. Use bon diff to preview changes before deploying, bon schema to explore deployed measures and dimensions, or bon pull to download deployed models to your local project. Connect your agent by adding the MCP server URL to your client's config.

Cloud (OAuth 2.0 with PKCE, auto-discovery):

{
  "mcpServers": {
    "bonnard": {
      "type": "http",
      "url": "https://mcp.bonnard.dev/mcp"
    }
  }
}

Self-hosted (Bearer token):

{
  "mcpServers": {
    "bonnard": {
      "url": "https://bonnard.example.com/mcp",
      "headers": {
        "Authorization": "Bearer your-secret-token-here"
      }
    }
  }
}

The MCP server handles tool discovery, schema introspection, and query execution. Your agent sees available metrics the same way it sees any other MCP tool. Full walkthrough with working code: How to Connect an AI Agent to Your Data Warehouse.

For a broader look at semantic layers beyond the agentic use case, see What Is a Semantic Layer?.

Self-host free under Apache 2.0, or use Bonnard Cloud for managed infrastructure.

Frequently asked questions

What is a semantic layer in simple terms?

A semantic layer is a translation layer between raw data and the people or tools that query it. It defines what business terms like "revenue" or "active user" mean in terms of actual database columns and calculations. Instead of every consumer writing its own SQL, they all reference the same definition. See What Is a Semantic Layer? for the full guide.

What makes a semantic layer "agentic"?

Three things: programmatic discovery (agents find metrics via API, not a GUI catalog), programmatic querying (agents call tools, not SQL), and structural multi-tenancy (access control enforced per query, not per dashboard). A traditional semantic layer might have an API, but an agentic one is designed for agents as the primary consumer.

How is an agentic semantic layer different from RAG?

RAG (Retrieval-Augmented Generation) feeds unstructured documents to an LLM for context. An agentic semantic layer provides structured metric definitions for data queries. RAG answers "What does our refund policy say?" A semantic layer answers "What was Q1 revenue?" They solve different problems and are often used together: RAG for knowledge, semantic layer for data.

How is an agentic semantic layer different from text-to-SQL?

Text-to-SQL lets an agent generate arbitrary SQL from natural language. The agent interprets column names and guesses business logic. An agentic semantic layer defines metrics once and lets agents query those definitions. The difference: text-to-SQL produces plausible answers. A semantic layer produces correct, governed, auditable answers. See text-to-SQL for more on why the raw approach breaks in production.

Do I need an agentic semantic layer if I use dbt?

dbt defines transformations: how raw data becomes clean tables. An agentic semantic layer defines metrics: what "revenue" means on top of those tables, and serves them to agents with caching and access control. They're complementary. dbt gets data into shape. The semantic layer defines the business logic on top and serves it. Most agentic semantic layers can import dbt models directly.

What is MCP and why does it matter?

MCP (Model Context Protocol) is a standard protocol for connecting AI agents to external tools and data sources. It defines how agents discover available tools, call them, and receive results. For an agentic semantic layer, MCP is the interface that lets any compatible agent (Claude, Cursor, Claude Code, and others) discover and query your metrics without custom integration code.

Can any AI model use an agentic semantic layer?

Yes. A semantic layer with MCP support works with any MCP-compatible client. For other agents, REST APIs and SDKs provide model-agnostic access. The semantic layer sits between the agent and the warehouse, not inside the model. It doesn't matter whether the agent runs Claude, GPT, Gemini, or an open-source model.

What is the performance impact?

With pre-aggregation, queries get faster, not slower. The semantic layer caches rollups so agents query pre-computed results instead of running full aggregations on every request. Hot queries resolve in single-digit milliseconds. Without pre-aggregation, agent-scale query volumes (hundreds of queries per minute across multiple tenants) will overwhelm most warehouses.

How does an agentic semantic layer handle hallucinations?

It eliminates the primary source of data hallucinations: ad-hoc SQL generation. The agent never writes SQL. It selects from governed metric definitions and the semantic layer generates correct SQL. The agent can still hallucinate its interpretation of the results (that's a model problem), but the underlying data is always correct and traceable to a versioned definition.

Ready to ship a customer-ready MCP?

Turn your semantic layer, dbt, or warehouse into a governed, per-customer MCP for your customers' agents.