How AI Generates SQL for Salesforce Marketing Cloud

A look under the hood at what it takes for AI to write SQL that actually runs inside Salesforce Marketing Cloud, not just SQL that compiles on a blank page.

‍

If you have spent any time inside Salesforce Marketing Cloud, you have written or watched someone write a SQL Query Activity.

It is the backbone of almost every non-trivial audience: reactivation cohorts, suppression lists, behavioral journeys, anything that needs to join engagement data with profile attributes.

And it is the part of the platform that has resisted the easiest abstractions for the longest.

‍

For years the trade-off was clear. Marketers who knew SQL could move fast and own their own audience logic. Everyone else opened a ticket and waited.

The native filtering tools, Data Filters, Filtered Data Extensions, and Audience Builder, covered a slice of the work but tapped out quickly once a request involved more than one or two joins or a moving date window.

‍

Generative AI has changed the shape of that trade-off, but not in the way most people first assume.

The interesting part is not that AI can write SQL. Plenty of tools can do that on a blank page.

The interesting part is what it takes to make the SQL correct, safe, and actually executable inside SFMC's Automation Studio, against the right Data Views, with the right joins, the right field types, and within the platform's hard limits.

‍

This post walks through what a production-grade AI-to-SQL pipeline for SFMC actually has to do, where the difficult parts hide, and how a system gets from a sentence like "customers who opened in the last 30 days but did not click" to a Query Activity you can schedule with confidence.

‍

The naive version, and why it breaks

‍

What goes wrong on the first try

The simplest version of this idea is what most people try first. Take a marketer's request, hand it to a language model with a prompt that says "write SQL for Salesforce Marketing Cloud," and run whatever comes back.

This works exactly long enough to be misleading.

The query that comes back is usually plausible, often readable, and almost always wrong in one or more ways that only show up once you try to execute it.

The model invents column names. It joins on SubscriberID when it should join on SubscriberKey. It writes SELECT * from _Sent and runs into the 30-minute Automation Studio timeout. It uses a SQL function that the SFMC query engine does not support.

‍

Why generic SQL knowledge is not enough

These are not edge cases. They are the default behavior of a model that has been trained on the wider SQL Server documentation but does not know which subset of T-SQL the SFMC engine actually accepts, which Data Views are scoped to which business unit, or what the schema of your custom Data Extensions looks like.

The first job of any serious AI-to-SQL system for SFMC is not generating SQL. It is constraining the model to the small, specific dialect and schema that SFMC will actually run.

‍

Grounding the model in the real schema

‍

Before generation can happen, the system has to know what tables and fields actually exist in the account it is writing for. There are two halves to this.

‍

The Salesforce-managed surface

The first half is the Salesforce-managed surface: Data Views like _Sent, _Open, _Click, _Bounce, _Subscribers, and _Job.

Their schema is documented and stable across accounts, so a good system can ship with this knowledge built in.

It knows that _Click has an IsUnique flag, that _Bounce carries BounceCategory and BounceType, that _Job is the right place to look up EmailName and DeliveredTime.

‍

The account-specific surface

The second half is the account-specific surface: every custom Data Extension the team has created, with its fields and types. This is where the real complexity lives.

A model that does not know the difference between your Purchases DE and your OrderHistory DE, or that your customer key is called CustomerNumber rather than ContactID, will hallucinate joins that look reasonable but reference fields that do not exist.

The system has to introspect the account's Data Extensions and pass that schema into the model's context for every request. Without this grounding step, accuracy collapses on anything beyond a handful of generic Data View queries.

‍

Translating natural language into a query plan

‍

Plan first, SQL second

Once the model has the right schema in front of it, the next problem is the gap between how marketers describe audiences and how SQL expresses them.

Marketers describe audiences in terms of behavior over time: "customers who bought in the last 60 days but have not opened anything since."

SQL expresses audiences in terms of joins, date predicates, and exclusions. Bridging that gap is not a single translation step.

It works better as a two-stage process. First the model produces a query plan: which Data Views and Data Extensions are involved, what the join keys are, which filters apply, what gets excluded. Then it produces SQL that implements that plan.

‍

Why splitting the work matters

Splitting the work matters because the failure modes at each stage are different.

Plan errors are usually about misunderstanding the request: the model picked the wrong Data View, missed an exclusion, used the wrong date window.

SQL errors at the second stage are usually mechanical: a misnamed column, a missing alias, a function the engine does not accept.

Showing the plan to the user before generating SQL also lets them correct misunderstandings cheaply, before any query is written or executed. It is much faster to fix "I meant the last 60 days, not the last 30" at the plan stage than after a Query Activity has already run against the wrong window.

‍

Respecting the platform's hard limits

‍

The 30-minute timeout

The SFMC query engine has rules that are not negotiable, and a generated query has to respect all of them or it will fail at execution time.

The 30-minute timeout is the most obvious. A query that joins three large Data Views without selective filters can chew through that limit and abort with the target Data Extension left untouched.

A reliable generator does not just write a query that is logically correct. It writes one that is selective enough to finish: explicit field lists instead of SELECT *, date predicates on the largest tables, intermediate Data Extensions when a single statement would do too much.

‍

The smaller rules that still bite

There are smaller rules too:

Field names that contain spaces need to be wrapped in square brackets.
Stored procedures, cursors, and temporary tables are not allowed.
GETDATE() returns Central Time rather than UTC, which matters every time a query expresses a window in local-market terms.
Query Activities only support Append, Update, or Overwrite as write modes, and the target Data Extension schema has to match the SELECT list exactly.

The system has to encode these rules so the model does not have to rediscover them every time. The reward for getting this right is queries that pass validation and run inside the limit on the first try, not after three rounds of debugging in Automation Studio.

‍

Validating before execution

‍

Mechanical static checks

Even with grounded schema and platform rules baked in, the generator should not be the last check. A small static analysis pass before any query touches SFMC catches the residual errors that slip through.

The useful checks are mostly mechanical:

Does every referenced table exist in the account?
Does every referenced column exist on the table it is joined to?
Are the join keys the same type on both sides?
Does the SELECT list match the schema of the target Data Extension?
Are there date filters on the largest tables?

These are the kinds of issues that would otherwise surface as Query Activity errors hours later, in an automation log.

‍

Audience size preview

A good system also previews the audience size before writing anything.

A query that returns zero rows, or three million, when the marketer expected a few thousand is almost always a sign that something is off in the logic.

Catching that gap in the chat, before the Data Extension is created, is much cheaper than catching it after a campaign has been queued.

‍

From a working query to a production Data Extension

‍

The deliverable is not a query

The last step is the one most demos skip. A query is not the deliverable.

The deliverable is a Data Extension, populated with the right rows, refreshed on a schedule, and ready to be the source for a Send or a Journey.

‍

Naming, scheduling, and the audit trail

A production-ready system handles this whole tail. It creates the target Data Extension with the correct schema. It wires the SELECT statement into a Query Activity.

It attaches that activity to an Automation, and sets the schedule the marketer described in natural language ("refresh every Monday at 7am Central").

It names things consistently so the audience is discoverable later. And it leaves a trail (the original prompt, the plan, the generated SQL, the validation results) that a future teammate can read to understand what the audience is and how it was built.

That handoff between the AI layer and the SFMC platform is where most of the practical value lives. Generating SQL is the visible part. Turning that SQL into something a CRM team can rely on, audit, and hand off is what makes the workflow actually replace the ticket-and-wait pattern.

‍

What this changes day to day

‍

The teams that adopt this kind of workflow do not stop writing SQL.

The advanced architecture queries, the bespoke optimisations, the one-off data clean-ups, those still belong to the people who know the platform deeply.

What changes is the volume of routine audience work that no longer needs a ticket.

A campaign manager who can describe a reactivation cohort in a sentence and get back a validated Query Activity, an audience preview, and a Data Extension is no longer waiting in a queue.

The technical team gets time back for the work that genuinely benefits from their expertise, and the campaign team owns more of the loop end to end.

That redistribution of work is the real reason this matters. Faster audiences are nice. A change in who can build them is structural.

‍

See QAiry in action

‍

This is exactly the shape of pipeline we built QAiry around.

A marketer describes the audience in plain language, QAiry grounds the request in the account's real schema, drafts a plan, generates and validates the SQL, previews the count, and creates the Data Extension and the Query Activity once everything checks out.

You can see the whole loop on real account data in the qairy.com/product-demos walkthroughs, or try it directly on your own account at qairy.com/try-it-free.