How to Deduplicate Contacts in SFMC
Duplicates inflate sends and skew reports. Here is how to find and remove duplicate SFMC contacts with SQL and a solid key strategy.
How to Deduplicate Contacts in SFMC
Duplicate contacts are one of those problems that stay invisible until they are not.
A campaign goes out, the same person gets two emails, and the bounce report suddenly looks worse than the work deserved.
By then the duplicates have been quietly multiplying for months.
Salesforce Marketing Cloud makes it easy to create records and harder to keep them unique. Different channels, imports, and syncs all add rows, and none of them ask whether the person already exists.
The good news is that most duplication is predictable, which means it is fixable with a clear key strategy and a few targeted queries.
This post walks through why duplicates appear, what they cost you, and how to clean them up with SQL and send-time settings without breaking your data model.
Why duplicate contacts pile up in SFMC
The same person, many identifiers
A single human can enter your account through several doors. Email Studio, MobileConnect, and synced records from Sales Cloud can each carry a different identifier for the same person.
When one channel keys off an email address and another keys off a mobile number, SFMC has no reliable way to know they are the same contact.
The result is two records that look distinct to the system but represent one person to you.
Imports and syncs that overwrite each other
Repeated imports are the other common source. A list gets uploaded, then re-uploaded a week later with slightly different formatting, and rows that should match end up as separate entries.
Synchronized Data Sources add their own layer. If a contact exists from a Marketing Cloud import and again from a Sales Cloud sync, you can end up with one record per source.
None of this is a bug. It is what happens when several processes write to the same place without agreeing on a single key.
Salesforce groups the usual causes into a short, recognizable list:
- Data models that let the same person live in multiple data extensions.
- Channels that key off different identifiers, such as an email address versus a mobile number.
- Records syncing in from other clouds with their own keys.
- Processes that reimport contacts you already loaded or deleted.
What duplicates actually cost you
Inflated send volume and bounce logging
Duplicates cost you super messages before they cost you anything else. Two rows for one person can mean two sends, and if both point at the same bad address, you log a bounce for each attempt.
That inflates your bounce numbers and can nudge your sender reputation in the wrong direction over time.
Skewed reporting and engagement metrics
Reporting is the quieter casualty. Open and click rates are calculated against the rows you sent to, so duplicated audiences quietly distort every percentage.
Engagement-based segmentation then inherits the distortion. A reactivation audience built on skewed opens will target the wrong people.
Clean counts are the foundation everything else sits on.
The hidden tax on your team
There is a time cost too. Every duplicate is a record someone eventually has to investigate, reconcile, or explain in a reporting review.
Left alone, that work compounds. The longer duplicates sit, the more downstream automations and journeys depend on them, and the more careful the cleanup has to be.
Get your keys right before you dedupe
One stable SubscriberKey
Deduplication starts with the key, not the query. Salesforce recommends a SubscriberKey of data type text, set to a stable, globally unique identifier rather than an email address or mobile number.
Email addresses and phone numbers change, and when the key changes the system treats the contact as new. A durable identifier keeps one person mapped to one record across channels.
A generated unique identifier, carried consistently from your source system, is the safest choice here. It does not change when someone updates their contact details.
Aligning Contact Key and Subscriber Key
The same value should drive both SubscriberKey and ContactKey. Keeping them identical is what gives you compatibility across email, mobile, and the broader contact model.
It also has to line up with what arrives through Synchronized Data Sources. If your Sales Cloud key and your Marketing Cloud key disagree, you have manufactured a duplicate before any send goes out.
Fixing the key strategy first means the cleanup you do next actually holds.
Finding duplicates with SQL
Counting duplicates by attribute
Before you delete anything, see the shape of the problem. A grouped count over the attribute you suspect is duplicated tells you how widespread it is.
Group your data extension by EmailAddress (or whichever field you are auditing), apply COUNT(*), and filter with HAVING COUNT(*) > 1. The rows that come back are your duplicate clusters.
Run this as a read-only query first. You want a clear picture before anything gets removed.
The count also tells you which attribute to trust. If email duplicates dwarf key duplicates, your problem is identity resolution, not the import itself.
Keeping one row per contact
To resolve clusters down to a single survivor, a window function does the work cleanly. ROW_NUMBER() with a PARTITION BY on your key lets you number the rows inside each duplicate group.
Order the partition so the record you want to keep lands at row one, then write the rows numbered greater than one into a separate data extension. That gives you a reviewable list of removals instead of an irreversible delete.
Salesforce cautions against deleting purely on SubscriberKey, since that can remove viable contacts. Relating back to the subscriber on its own ID keeps you from cutting records that still matter.
Deduplicating sends and building a clean audience
The send-time dedupe option
Not every duplicate needs a data fix on day one. Email Studio can de-duplicate a send by email address, and checking that option stops the same address from receiving multiple copies of one send.
Leave it unchecked and SFMC attempts a send for every row in the targeted data extension, which is exactly how one bad address turns into several logged bounces.
Treat send-time dedupe as a safety net, not the cleanup itself.
It protects a single send, but it does nothing for your stored data, your reporting, or the next campaign that targets the same messy extension.
A single sendable data extension
The durable fix is structural. Send from one well-keyed, sendable data extension that holds only the segmentation and personalization fields you actually use.
Pull your deduplicated survivors into that extension and point campaigns at it. Fewer sources writing to your send audience means fewer chances to reintroduce duplicates.
It also makes audits faster. When one extension is the source of truth for sends, checking it for duplicates is a single query rather than a hunt across the account.
Then keep it honest with a scheduled query that re-runs the duplicate check, so a clean audience stays clean instead of drifting back.
See QAiry in action
Deduplication is mostly a keys-and-queries problem, and both are easier to get right when you can describe what you want in plain language instead of hand-writing every join.
That is what QAiry is built for. Tell it find duplicate contacts by email and keep the most recent record and it drafts the SQL against your data extensions, ready to review. See it work at qairy.com/product-demos, or start with your own data at qairy.com/try-it-free.

