What is Text-to-SQL? Definition, Limits, and Alternatives
Text-to-SQL converts natural language into SQL queries using an LLM. Learn why it fails in production and how semantic layers provide a governed alternative.
Text-to-SQL converts natural language questions into SQL queries using an LLM. It provides fast answers but no governance: the model guesses at JOINs, invents aggregations, and has no access control. For production analytics, a semantic layer constrains what the agent can query. The approach gained popularity with GPT-based coding assistants, but most implementations fail when deployed to real users at scale. The gap between a working demo and a production-grade analytics system is governance, consistency, and tenant isolation, none of which text-to-SQL addresses on its own.
How does text-to-SQL work?
The LLM receives your database schema (table names, column types, relationships) as context. When a user asks "what was revenue last quarter?", the model generates a SQL query: SELECT SUM(amount) FROM orders WHERE created_at >= '2025-10-01'. The system executes that query against the warehouse and returns the result.
Some implementations add few-shot examples or retrieval-augmented generation to improve accuracy. The model still writes raw SQL. It still has full access to whatever tables you expose in the schema prompt. More advanced setups use schema filtering to reduce the context window, only passing relevant tables. But even with these optimizations, the model is generating SQL from scratch on every query. There is no reuse of known-good metric definitions, and no guarantee that the same question produces the same SQL twice.
Why is text-to-SQL unreliable for production?
Hallucinated aggregations. The model decides how to calculate "revenue." Maybe it sums amount. Maybe it sums total_price. Maybe it subtracts refunds. There is no authoritative definition. Different phrasings of the same question can produce different SQL and different numbers.
No metric governance. There is no central place that says what "revenue" means. Each query is a one-off interpretation. Two users asking the same question might get different answers depending on how the model generates the SQL.
No access control. Text-to-SQL gives the model access to raw tables. There is no row-level security, no tenant isolation, no role-based filtering. Every user can query every table the model can see.
No audit trail. Queries are generated on the fly with no record of what definition was used. When a number looks wrong, there is no way to trace it back to an authoritative metric.
Text-to-SQL vs semantic layer queries
| Text-to-SQL | Semantic layer | |
|---|---|---|
| Metric definitions | None (LLM guesses) | Centralized in YAML |
| Consistency | Varies per query | Same result every time |
| Multi-tenancy | Not built in | Row-level security per tenant |
| Access control | None | RBAC with governance groups |
| Audit trail | No | Full query logging |
| AI agent support | Direct SQL generation | Governed tools via MCP |
How Bonnard handles natural language queries
Bonnard replaces raw text-to-SQL with governed metric queries. AI agents connect through MCP and call structured tools: explore_schema to discover available metrics, query to execute against defined measures and dimensions, describe_field to understand what a metric means. The agent picks the right tool based on the user's question, but it never constructs raw SQL.
The agent queries the semantic layer, which enforces metric definitions, access control, and tenant isolation at query time. The result is the same number every time, regardless of how the user phrases the question. A user asking "what's our MRR?" and another asking "show me monthly recurring revenue" both resolve to the same governed metric with the same calculation. No ambiguity, no hallucinated aggregation logic.
Related terms
Governed metrics for every surface.
Define your semantic layer in YAML. Query it from any AI agent, dashboard, or SDK. Ship analytics your customers can trust.