How to Connect an AI Agent to Your Data Warehouse

Define metrics in YAML, expose them via MCP, and let AI agents query governed data instead of raw SQL. Full tutorial with Bonnard in under 30 minutes.

9 min read

Most teams connecting AI agents to their data warehouse start with text-to-SQL. The agent generates SQL from natural language, runs it against the warehouse, and returns results. It works until it doesn't: hallucinated JOINs, inconsistent aggregations, no access control, no audit trail.

There's a better approach. Define your business metrics in a semantic layer, expose them via MCP (Model Context Protocol), and let any AI agent query governed definitions instead of raw tables. This tutorial shows how to set it up with Bonnard in under 30 minutes.

Why does text-to-SQL break in production?

The agent sees column names but not business logic. It doesn't know that your company excludes refunds from revenue. It doesn't know that status = 'completed' means something different in orders than in subscriptions. It doesn't know that marketing and finance defined "active user" differently three years ago and never reconciled.

So the agent writes plausible SQL and returns plausible numbers. Ask the same question twice with different phrasing and you get different answers. Ask two different agents and you get two different numbers. Neither matches the number your finance team reports.

Beyond consistency, there's no row-level security. No multi-tenancy. No audit trail showing which agent queried what, when, and for whom. In production, with real customers, that's a non-starter.

Text-to-SQL gives you speed. It doesn't give you trust.

What is the semantic layer approach?

Instead of letting agents write arbitrary SQL, define your metrics once in YAML: cubes, measures, dimensions, access rules. Then expose those definitions via MCP so agents query governed metrics, not raw tables.

The difference: every agent gets the same answer because the metric definition is fixed. total_revenue isn't a column the agent interprets. It's a pre-defined calculation with agreed-upon filters and aggregations. When your finance team updates the revenue definition to exclude trial conversions, that change propagates to every consumer instantly. No agent retrained. No dashboard patched. One diff in your schema repo.

This architecture also decouples the query interface from the warehouse dialect. Swap BigQuery for Snowflake and your agents don't notice. The semantic layer abstracts the SQL generation, so consumers stay stable while infrastructure evolves underneath.

Approach Governance Consistency Multi-tenant Access Control Setup Time
Direct SQL None Varies by query Manual Manual Minutes
Text-to-SQL None Varies by prompt Manual Manual Hours
Semantic Layer via MCP Full Guaranteed Built-in Row-level Under 30 min

The semantic layer is the control plane between your warehouse and every consumer, whether that's a human analyst, a React component, or an AI agent.

Step 1: Install and initialize

npm install -g @bonnard/cli
bon init --self-hosted
docker compose up -d

npm install -g @bonnard/cli installs the Bonnard CLI globally. bon init --self-hosted scaffolds a new project with Docker Compose configs, a sample schema, and default settings. docker compose up -d starts the Bonnard server locally. The server boots in under 10 seconds on most machines and exposes a REST API on port 4000 by default.

Bonnard supports BigQuery, Snowflake, Redshift, ClickHouse, DuckDB, and PostgreSQL out of the box. You configure your warehouse connection in bonnard.yml or pass credentials via environment variables.

If you want to explore without connecting your own warehouse, run:

bon datasource add --demo

This provisions a sample PostgreSQL database with orders, customers, and products data so you can follow along with the rest of this tutorial. The demo dataset includes roughly 10,000 rows across three tables, enough to test aggregations, filters, and multi-dimensional queries.

Step 2: Define your metrics

Create a cube that maps to a table in your warehouse and defines the metrics your agents will query.

cubes:
  - name: orders
    sql_table: public.orders
    measures:
      - name: total_revenue
        sql: amount
        type: sum
      - name: count
        type: count
    dimensions:
      - name: status
        sql: status
        type: string
      - name: created_at
        sql: created_at
        type: time

Measures are the numbers you aggregate: sums, counts, averages. Dimensions are the columns you filter and group by: status, date, category. One definition. Every tool, dashboard, and AI agent that queries total_revenue gets the same number.

You can also define pre_aggregations in the same file to cache expensive computations. For example, a daily rollup of total_revenue by status can cut query times from seconds to single-digit milliseconds. Bonnard rebuilds these rollups on a configurable schedule and invalidates stale caches automatically.

Schemas are version-controlled alongside your application code. Review metric changes in pull requests. Roll back a bad definition with git revert. Your data contracts get the same CI/CD workflow as your product.

To control what's exposed to specific consumers, define a view:

views:
  - name: order_metrics
    cubes:
      - join_path: orders
        includes:
          - total_revenue
          - count
          - status
          - created_at

Views act as a curated interface. Your agents see order_metrics with four fields instead of navigating the full schema.

Step 3: Deploy and connect your agent

Deploy your schema and start the MCP server:

bon deploy
bon mcp

bon mcp outputs connection configs for Claude Desktop, Cursor, Claude Code, and any MCP-compatible client. The output looks like this:

{
  "mcpServers": {
    "bonnard": {
      "command": "npx",
      "args": ["-y", "@bonnard/cli", "mcp"],
      "env": {
        "BONNARD_API_URL": "http://localhost:4000"
      }
    }
  }
}

Paste this into your MCP client's config file and restart. For Claude Desktop, that's ~/Library/Application Support/Claude/claude_desktop_config.json. For Cursor, it's .cursor/mcp.json in your project root. Claude Code reads from .mcp.json in your project directory.

The MCP server handles tool discovery, schema introspection, and query execution over stdio. Your agent sees available metrics the same way it sees any other MCP tool. No custom integration code required.

For customer-facing use cases, generate publishable keys per tenant with bon keys create. Each key scopes the connection to a specific tenant's data with row-level security enforced automatically. Keys can be rotated or revoked from the CLI or the REST API without redeploying your schema.

Step 4: Query from your agent

Once connected, your AI agent can discover and query your metrics using natural language. Behind the scenes, the agent calls MCP tools.

Ask: "What's our total revenue this quarter?"

The agent calls explore_schema to discover available metrics, then calls query with the right measures and time filters. The response comes back as structured data, not raw SQL results.

Ask: "Break down order count by status for the last 30 days."

Same flow. The agent uses query with count as the measure, status as the dimension, and a date filter on created_at.

Bonnard exposes four MCP tools:

  • explore_schema: Discover available cubes, measures, and dimensions
  • query: Fetch aggregated data using governed metric definitions
  • sql_query: Run queries for cases that need custom SQL (still governed by access controls)
  • describe_field: Get metadata about a specific measure or dimension

Every query runs through the semantic layer. The agent never touches raw tables. Read more about the architecture in our agentic analytics guide.

What do you get beyond MCP?

Bonnard isn't only an MCP server. The same semantic layer powers multiple surfaces:

  • React SDK: Drop BarChart, LineChart, and BigValue components into your product with @bonnard/react
  • Markdown dashboards: Author dashboards in markdown, deploy with bon deploy, per-tenant access built in
  • REST API and TypeScript SDK: Type-safe queries from @bonnard/sdk for custom integrations
  • Pre-aggregation caching: Define rollups in YAML for sub-second query performance at scale
  • RBAC and audit logging: Track every query by agent, app, or user with full access control

One schema definition. Every consumer gets governed, consistent data. This matters most at scale: teams using Bonnard report cutting metric discrepancies across tools to zero, because there is only one source of truth. Read more about embedded analytics use cases.

From raw SQL to governed metrics

You started with a warehouse and an AI agent that writes its own SQL. Now you have governed metrics defined in YAML, exposed via MCP, queryable from any AI tool your team or customers use. Setup took under 30 minutes.

The shift from raw SQL to governed metrics pays off immediately: consistent numbers across every consumer, row-level access control per tenant, and a full audit trail for every query. As your team adds more agents and surfaces, the semantic layer scales with you. No duplicate logic. No drift between dashboards and AI answers.

Self-host free under Apache 2.0 or use Bonnard Cloud for managed infrastructure. Check the pricing page for details on Cloud plans and self-hosted support tiers. The full source is on GitHub. Read the documentation to go deeper.

Frequently asked questions

Do I need to retrain my AI model?

No. Bonnard works with any MCP-compatible AI agent out of the box. Your agent discovers available metrics through the MCP protocol at runtime. There is no fine-tuning, prompt engineering, or model modification required.

What data warehouses does Bonnard support?

Bonnard connects to BigQuery, Snowflake, Redshift, ClickHouse, DuckDB, and PostgreSQL. You configure the connection in bonnard.yml or via environment variables. Adding a new warehouse takes a single CLI command.

How is this different from text-to-SQL?

Text-to-SQL lets an AI agent generate arbitrary SQL from natural language. It has no governance, no consistency guarantees, and no access control. Bonnard's semantic layer defines metrics once in YAML and exposes them as governed APIs. Every consumer gets the same answer because the calculation is fixed, not interpreted per query.

Is Bonnard open source?

Yes. Bonnard is licensed under Apache 2.0 and the full source is available on GitHub. You can self-host with no license fees. Bonnard Cloud is available for teams that want managed infrastructure.

How long does setup take?

Under 30 minutes for most teams. Install the CLI, connect your warehouse, define a few metrics in YAML, and run bon mcp. The tutorial above walks through each step with working code examples.

From zero to PoC in a week.

Define your metrics, plug in your agent, and ship governed AI analytics to every customer.