How to Join Purchase Data with Engagement Data in SFMC
Buyers who open your emails are worth more. Here is how to join purchase data with SFMC engagement data views in SQL, step by step.
How to Join Purchase Data with Engagement Data in SFMC
Most marketing teams in Salesforce Marketing Cloud sit on two valuable datasets that rarely talk to each other.
One describes what people bought. The other describes how they reacted to your emails.
Kept apart, each tells half a story. Joined together, they let you target the customers who actually open your messages and spend money.
The catch is that purchase data and engagement data usually live in different places inside SFMC, with different structures.
Connecting them means understanding where each dataset sits and writing a query that lines them up correctly.
This post walks through where the two datasets live, how to find a key that links them, how to write the join in SQL, and the mistakes that quietly corrupt the result.
Why purchase and engagement data live apart in SFMC
Two different homes for two different signals
Purchase data almost always arrives from an external system: an ecommerce platform, an order management tool, or a CRM sync.
It lands in SFMC as a data extension that you or an integration created.
Engagement data is generated by SFMC itself every time an email is sent, opened, or clicked.
That activity is captured automatically in system data views, which you do not create and cannot edit.
Why this split slows campaigns down
Because the two datasets are structured and owned differently, marketers often treat them as separate worlds.
A purchase report comes from one place, an engagement report from another, and nobody stitches them together.
The result is broad campaigns that ignore whether a buyer is even paying attention to your email program.
Joining the data is what turns customers who bought a product into customers who bought a product and still open our emails.
The building blocks: data extensions and system data views
Where purchase data usually sits
Purchase records typically live in a standard data extension you control.
A common shape includes a SubscriberKey, an OrderDate, a ProductID, and a Revenue or amount field.
Because you own this extension, you decide its fields, its primary key, and how often it refreshes.
That control matters later, because the join depends on having a reliable matching field.
What the engagement data views hold
SFMC exposes engagement history through system data views such as _Sent, _Open, _Click, _Bounce, and _Job.
These views are queryable in Automation Studio but do not appear in the normal data extension list.
The _Open view, for example, records a row for each open event with fields like SubscriberKey, EventDate, and JobID.
The _Sent and _Click views follow the same idea, each capturing one row per send or per click with a shared SubscriberKey and a JobID tying the event back to a specific email.
One practical detail to remember: these views hold a rolling window of roughly the last six months of activity, not your full history.
Finding a key that links the two sources
SubscriberKey as the common thread
A join is only as good as the field both sides share.
In most accounts, SubscriberKey is that field, because it identifies a contact consistently across sends and across data extensions.
If your purchase extension already stores each buyer's SubscriberKey, you are most of the way there.
The engagement views use the same key, so the two datasets can be matched person by person.
When your join key is not SubscriberKey
Sometimes purchase data lands with an email address or an external customer ID instead of the subscriber key.
In that case you have two options.
You can add a step that maps the external ID to SubscriberKey before joining, or you can join on email address if that field is reliable and unique on both sides.
The safer pattern is almost always to resolve everything down to SubscriberKey first.
Writing the join in SQL
A basic inner join
An inner join returns only the rows that match on both sides, which is exactly what you want when you need buyers who also engaged.
A starting query looks like this:
SELECT p.SubscriberKey, p.ProductID, o.EventDate FROM Purchases p INNER JOIN _Open o ON p.SubscriberKey = o.SubscriberKey
This pulls every buyer who has at least one recorded open event in the available window.
Aliasing the tables, here p for purchases and o for opens, keeps the query readable as it grows.
From here you can add filters, such as an OrderDate range or a minimum revenue threshold.
Choosing the right join type
The join type controls which records survive, so it deserves a deliberate choice rather than a default.
The three you will reach for most often are:
INNER JOINfor buyers who also engaged.LEFT JOINto keep every buyer and flag whether they engaged.- A
NOT EXISTSpattern to find buyers who never opened.
A left join is useful when you want to score or segment your whole buyer base by engagement, not just the active slice.
The exclusion pattern matters too, since buyers who stopped opening are often the ones worth a different message or a suppression rule.
Common pitfalls when joining purchase and engagement data
Duplicate rows from one-to-many events
Engagement views store one row per event, so a single buyer can have dozens of open rows.
Join naively and a customer who bought once but opened twenty emails appears twenty times.
The fix is to aggregate the engagement side before joining, for example counting opens per subscriber, or to select distinct keys when you only need membership.
Deciding whether you want one row per buyer or one row per event up front prevents inflated audience counts.
Date ranges and the six-month window
Because the views hold only recent activity, a buyer who engaged last year can look completely inactive today.
That is a property of the data, not a sign the customer is gone.
Always state the time window you are measuring, and use functions like DATEADD() against EventDate to make the range explicit in your query.
Turning the joined result into an audience
Writing to a target data extension
Once the query returns the rows you want, a query activity in Automation Studio writes the output to a target data extension.
That extension becomes a sendable audience of buyers filtered by engagement.
Schedule the query so the audience refreshes on the cadence your campaigns need, whether that is daily or weekly.
Keeping the query maintainable
Joins tend to grow as new conditions get bolted on, and a query nobody understands becomes a liability.
Keep field selection tight, comment the intent, and avoid SELECT * so the output schema stays predictable.
A clear, well-scoped join is far easier to trust than a sprawling one that happens to run.
See QAiry in action
Joining purchase and engagement data is a solved problem in SQL, but writing and maintaining those queries is still where most marketing teams get stuck.
QAiry lets you describe the audience you want in plain language and generates the SFMC-ready query for you. You can watch it work at qairy.com/product-demos or build your first audience at qairy.com/try-it-free.

