AI Data Analysis: Why Governed Metrics Beat Raw SQL Generation
AI data analysis tools generate plausible SQL but unreliable numbers. Here's why governed metrics through a semantic layer produce trustworthy results at scale.
AI data analysis tools are everywhere. Upload a spreadsheet to Julius AI and get a chart. Ask ChatGPT's Data Analyst to find trends. Connect Databricks AI/BI to your warehouse and let it generate queries. The pitch: anyone can analyze data without writing SQL.
The pitch works for one-off exploration. It breaks when you need trustworthy numbers at scale: consistent results across tools, governed access per customer, and an audit trail for every answer. That's where most AI analytics tools stop and where a semantic layer starts.
What is AI data analysis?
AI data analysis uses large language models and machine learning to automate parts of the data analysis workflow: generating queries, identifying patterns, creating visualizations, and summarizing findings in natural language. Instead of writing SQL by hand, you describe what you want and the AI produces it.
The tools fall into three categories:
Spreadsheet AI. Upload a CSV and ask questions. Julius AI, Formula Bot, ChatGPT Data Analyst. Good for ad-hoc exploration of small datasets. No governance, no multi-tenancy, no audit trail. Your data leaves your infrastructure.
BI tool AI. Built into an existing BI platform. Tableau's AI analytics, Power BI Copilot, Domo AI, ThoughtSpot's natural language search. Better than spreadsheet tools because they query your warehouse. Still limited by the BI tool's ecosystem and licensing model.
Text-to-SQL agents. An LLM generates SQL against your warehouse schema. DataGPT, custom GPT agents, Databricks AI/BI. Most flexible, most dangerous. The agent writes arbitrary SQL. No guardrails on what it queries or how it calculates metrics.
All three share the same fundamental problem: the AI interprets your data on every query. There's no shared definition of what "revenue" means. Different tools, different phrasings, different numbers.
The accuracy problem nobody talks about
AI data analysis demos are impressive. The tool generates a chart from a natural language question in seconds. But run the same question through two different tools and compare the numbers. They don't match.
Why? Because each tool independently interprets your schema. "Revenue" might be SUM(amount) in one tool and SUM(total_price) WHERE status = 'completed' in another. Both are plausible. Neither knows that your finance team excludes refunds and trial conversions.
This isn't a model quality issue. GPT-4, Claude, and Gemini all exhibit this behavior. The model does what it's designed to do: generate plausible SQL from schema context. But "plausible" and "correct" are different things.
The accuracy problem compounds at scale:
- One analyst, one question, one tool — accuracy is "good enough." You eyeball the result and know if it's in the right ballpark.
- Multiple analysts, same question, different tools — the numbers diverge. Debates about "which dashboard is right" start.
- Customers querying their own data — you can't tell customers "the number is approximately right." They expect precision. One wrong number and trust is gone.
- AI agents querying autonomously — no human in the loop to sanity-check. Wrong numbers propagate through workflows without anyone noticing.
Text-to-SQL gives you speed. It doesn't give you trust. For production analytics, you need both.
What makes AI analytics production-grade?
The gap between a demo and a production system is five things. Most AI data analysis tools have one or two. None of the spreadsheet or text-to-SQL tools have all five.
1. Governed metric definitions
Every metric has one definition. "Revenue" is SUM(amount) WHERE status != 'refunded' AND type != 'trial'. This definition is fixed, versioned in Git, and referenced by every consumer. The AI doesn't interpret what revenue means. It queries the governed definition.
cubes:
- name: orders
sql_table: public.orders
measures:
- name: total_revenue
sql: "CASE WHEN status != 'refunded' AND type != 'trial' THEN amount ELSE 0 END"
type: sum
description: "Total revenue excluding refunds and trials"
- name: order_count
type: count
- name: average_order_value
sql: amount
type: avg
dimensions:
- name: status
sql: status
type: string
- name: category
sql: category
type: string
- name: created_at
sql: created_at
type: time
This is a semantic layer: a metadata layer between your warehouse and every consumer. The AI agent calls explore_schema to discover available metrics, then query to fetch governed data. It never generates SQL from scratch.
2. Consistency across every surface
The same metric definition serves dashboards, React components in your product, REST APIs, AI agents, and markdown reports. Ask for "revenue" from any surface and you get the same number. Not because the AI happened to generate the same SQL, but because the definition is fixed.
This matters for AI specifically because AI tools generate different SQL on every call. Even the same tool, same question, different session can produce different queries. Governed definitions eliminate this variance.
3. Multi-tenant access control
For B2B products, Customer A's AI agent should never see Customer B's data. This can't be a prompt instruction ("only query data for customer A"). It needs to be structural: enforced on every query regardless of what the AI does.
security_context:
- name: tenant_filter
sql: "{SECURITY_CONTEXT.tenant_id} = customer_id"
Every query through a tenant's key automatically includes this filter. The AI agent can't bypass it. See RBAC for more on how this works.
4. Performance at AI scale
AI agents make more queries than humans. A human asks one question. An agent might make 10 queries to answer it: exploring the schema, trying different dimensions, following up on anomalies. Multiply by hundreds of customers' agents querying simultaneously.
Pre-aggregation caching pre-computes common rollups. Hot queries return in single-digit milliseconds. Without it, AI-scale query volumes overwhelm most warehouses.
5. Audit trail
When the number is wrong, you need to trace it. With a semantic layer, every result traces back to a versioned metric definition in Git. You can point to the exact commit that defined the calculation, see when it changed, and who approved the change. With raw text-to-SQL, the trail is "an LLM generated some SQL" with no reproducibility.
AI data analysis tools compared
| Tool | Approach | Governed metrics | Multi-tenant | Audit trail | Best for |
|---|---|---|---|---|---|
| Julius AI | Spreadsheet upload + LLM | No | No | No | Ad-hoc exploration of CSVs |
| ChatGPT Data Analyst | File upload + code interpreter | No | No | No | One-off analysis, prototyping |
| Databricks AI/BI | Text-to-SQL + Unity Catalog | Partial (table-level) | Partial | Partial | Teams already on Databricks |
| Power BI Copilot | Natural language in Power BI | DAX measures (locked to PBI) | Complex setup | Within PBI | Microsoft ecosystem |
| Tableau AI | Natural language in Tableau | Tableau semantic model | Enterprise | Within Tableau | Salesforce ecosystem |
| ThoughtSpot | Natural language BI | Proprietary semantic layer | Enterprise | Yes | Enterprise AI-powered BI |
| DataGPT | Conversational analytics agent | No | No | No | Autonomous data analysis |
| Domo AI | BI platform with AI features | Within Domo | Within Domo | Within Domo | End-to-end data platform |
| Semantic layer (Bonnard) | Governed metric serving | Yes (YAML, versioned) | Structural | Full (Git-backed) | B2B analytics, AI agents, embedded |
The tools in the top half are AI-first but governance-second. The ones in the bottom half (enterprise BI) have governance but lock it to their ecosystem. A standalone semantic layer provides governance without ecosystem lock-in.
When each approach makes sense
Use spreadsheet AI (Julius, ChatGPT) when:
- You're exploring a dataset for the first time
- Nobody else needs to reproduce the result
- The data isn't sensitive
- You want speed over accuracy
Use BI tool AI (Power BI Copilot, Tableau AI) when:
- You're already in that BI ecosystem
- Internal teams are the primary consumer
- You don't need to serve AI agents or embed in products
- Enterprise licensing fits your budget
Use a semantic layer when:
- Multiple consumers need the same metrics (dashboards + API + agents)
- Customers see the results (B2B embedded analytics)
- AI agents query data autonomously
- You need auditability (which definition produced this number?)
- Metrics must be consistent across every tool and surface
How a semantic layer works with AI agents
The semantic layer doesn't replace AI. It governs what AI can query.
An AI agent connects via MCP or REST API. It calls explore_schema to discover available metrics and their descriptions. It calls query to fetch governed data with measures, dimensions, and filters. The semantic layer generates the SQL (correctly, for your specific warehouse dialect), executes it, and returns structured results.
Five tools are available to agents:
explore_schema: Discover cubes, measures, dimensions, segmentsquery: Fetch aggregated data using governed definitionssql_query: Custom SQL for edge cases (still governed by access controls)describe_field: Get metadata about a specific measure or dimensionvisualize: Render charts directly in the conversation
The agent doesn't write SQL. It selects from governed definitions. The result is deterministic: same question, same metric definition, same answer. Always.
For a full tutorial on connecting AI agents to governed data: How to Connect an AI Agent to Your Data Warehouse.
The shift from AI-generated SQL to AI-queried metrics
The first generation of AI analytics gave agents raw warehouse access and hoped for the best. It worked in demos. It failed in production because the agent interprets business logic on every query.
The next generation separates the concerns. Data teams define governed metrics in a semantic layer. AI agents query those definitions instead of raw tables. The data team's expertise is encoded in the schema. The AI's role shifts from "guess what revenue means" to "pick the right governed metric and present the result."
This is a better division of labor. The data team defines what's true. The AI handles the interface. Trustworthy numbers with a natural language frontend.
Getting started
Cloud:
npm install -g @bonnard/cli
bon init
bon deploy
Self-hosted:
npm install -g @bonnard/cli
npx @bonnard/cli init --self-hosted
docker compose up -d
bon deploy
bon init generates agent configs for Claude Code, Cursor, and Codex so AI assistants understand your semantic layer from the first prompt. Define your metrics, run bon deploy, and bon mcp to configure AI agent connections.
Full tutorial: How to Connect an AI Agent to Your Data Warehouse. For background: What Is a Semantic Layer?.
Self-host free under Apache 2.0, or use Bonnard Cloud for managed infrastructure.
Frequently asked questions
What is AI data analysis?
AI data analysis uses large language models and machine learning to automate data querying, pattern detection, and insight generation. Instead of writing SQL manually, you ask questions in natural language and the AI produces queries, charts, and summaries. Tools range from spreadsheet analyzers (Julius AI) to enterprise BI platforms with AI features (Power BI Copilot, Tableau AI).
Is AI data analysis accurate?
For ad-hoc exploration, accuracy is "good enough." For production analytics where customers see the numbers, accuracy varies per query because the AI generates SQL from scratch each time. Two tools generating SQL for "revenue" may produce different numbers because they interpret the calculation differently. A semantic layer solves this by defining each metric once and letting AI query the definition instead of generating its own.
What is the best AI tool for data analysis?
It depends on the use case. For one-off CSV exploration: Julius AI or ChatGPT Data Analyst. For enterprise BI: Power BI Copilot or Tableau AI. For governed, multi-tenant analytics served to AI agents and B2B customers: a semantic layer approach. See the comparison table above.
How is AI analytics different from traditional BI?
Traditional BI requires analysts to write queries and build dashboards. AI analytics automates the query generation. The risk: AI-generated queries are inconsistent. Traditional BI with a semantic layer gives you consistent definitions. AI analytics with a semantic layer gives you both: consistent definitions with a natural language interface.
Can AI replace data analysts?
No. AI can generate queries and create visualizations faster than humans. It can't define what metrics should mean, decide which questions to ask, or judge whether an insight is actionable. Data analysts bring domain knowledge and judgment. AI handles the mechanical parts. A semantic layer encodes the analyst's expertise so AI can query it reliably.
What is the difference between AI analytics and text-to-SQL?
Text-to-SQL is a technique where an LLM generates SQL from natural language. AI analytics is broader: it includes text-to-SQL plus pattern detection, anomaly identification, forecasting, and visualization generation. Both share the same governance gap: the AI interprets your data on every query. A semantic layer addresses this for both.
What warehouses work with AI data analysis?
Most AI analytics tools work with common warehouses: Snowflake, BigQuery, Databricks, PostgreSQL (including Supabase, Neon, and RDS), Redshift, and DuckDB (including MotherDuck). The semantic layer generates the appropriate SQL dialect for each warehouse.
Ready to ship a customer-ready MCP?
Turn your semantic layer, dbt, or warehouse into a governed, per-customer MCP for your customers' agents.