Advertising

How Dirty Data Wastes Your Ad Spend (And How to Stop)

If 40% of your customer list does not match on ad platforms, you are building audiences on incomplete data and paying for targeting that does not work. Here is how to fix your match rates and stop wasting ad dollars.

March 2026·11 min read

How Custom Audiences Actually Work

Custom audiences are one of the most powerful tools in digital advertising. The concept is simple: you upload a list of your customers, which typically includes email addresses, phone numbers, and names, to an ad platform like Facebook, Google, or TikTok. The platform then matches your list against its own user database to find your customers within its ecosystem. Once matched, you can serve ads directly to those customers, build lookalike audiences based on their characteristics, or exclude them from prospecting campaigns.

The matching process is where dirty data becomes expensive. When you upload your customer list, the ad platform hashes each record and compares it against the hashed versions of its user profiles. The matching is exact. If the email address in your file does not match the email address a user registered with on the platform, character for character, there is no match. If the phone number in your file uses a different format than the platform expects, there is no match. If the name has extra spaces, inconsistent casing, or a different spelling than the user's profile, the match quality drops.

This means that your match rate, the percentage of uploaded records that successfully match to platform users, is directly determined by the quality of your data. A clean, properly formatted list might achieve a 60-80% match rate. A dirty list with formatting issues, duplicates, and invalid entries might match at 20-40%. The gap between those two numbers represents wasted ad budget and missed targeting opportunities.

Why Match Rates Are Low

Bad Email Addresses

Email is the highest-value matching field because most ad platform users register with an email address. When your email data is dirty, you lose matches on the most important identifier. Common email quality issues include typos in the domain ("gmial.com", "yahooo.com"), outdated work emails from people who changed jobs, disposable email addresses that do not correspond to any ad platform account, and formatting inconsistencies like extra spaces or mixed casing.

An email like " John.Smith@Gmail.Com " with leading/trailing spaces and inconsistent casing may not match even though the underlying address is valid. Platforms normalize some of these issues during matching, but they do not catch everything. Cleaning your email data before upload eliminates these preventable match failures. The email validator catches typos, invalid domains, and syntax problems that kill your match rate.

Wrong Phone Format

Facebook, Google, and TikTok all recommend or require phone numbers in E.164 format for custom audience uploads. E.164 uses a plus sign, the country code, and the subscriber number with no spaces or punctuation: +15551234567. If your data contains phone numbers as "(555) 123-4567" or "555.123.4567" or "5551234567" without the country code, many of those numbers will fail to match.

The platform may attempt to normalize some formats, but the results are inconsistent and undocumented. Relying on the platform to fix your formatting is gambling with your ad budget. Converting all phone numbers to E.164 before upload guarantees that every valid number has the best possible chance of matching. Read our guide to E.164 format for the full explanation of why this standard matters.

Duplicates Inflate List Size Without Improving Reach

Duplicate records in your upload list inflate the apparent size of your audience without adding any actual reach. If the same customer appears three times in your list, whether through different email addresses, slight name variations, or repeated entries, the platform deduplicates on their end. Your upload shows 30,000 records, but after dedup the platform only has 20,000 unique users. Your metrics look worse than they should because you are measuring match rate against an inflated denominator.

More practically, duplicates make it harder to build accurate lookalike audiences. When the platform analyzes your custom audience to find similar users, it weighs the characteristics of the matched users. If some users are over-represented due to duplicates, the lookalike model may skew toward those users' characteristics, producing a less representative and less effective lookalike audience.

Inconsistent Name Formatting

Names are used as a secondary matching field to improve match confidence. If your list contains names formatted as "JOHN SMITH", "john smith", "Smith, John", or "J. Smith", the platform's ability to use the name field for matching is degraded. While name alone rarely determines a match, it contributes to the platform's confidence score. Standardizing names to proper case with first and last in separate columns maximizes this supplementary matching value.

The Math: What Low Match Rates Actually Cost

Let's walk through a concrete example. You have a customer list of 50,000 records that you want to use for a Facebook Custom Audience campaign and a lookalike audience expansion. With dirty data, your match rate is 35%. That means Facebook matches 17,500 of your records. The other 32,500 records are effectively invisible to the platform.

After cleaning the same list, standardizing emails, formatting phones to E.164, removing duplicates, and fixing name formatting, your match rate jumps to 68%. Now Facebook matches 34,000 records. You have nearly doubled your matched audience without acquiring a single new customer. Every campaign you run against this audience is now targeting almost twice as many known customers.

The impact on lookalike audiences is even more significant. A lookalike built from 34,000 matched customers is based on a much richer and more representative sample than one built from 17,500. The platform has more data points to identify common characteristics, producing a lookalike audience that more closely resembles your actual customer base. Better lookalikes mean higher conversion rates, lower cost per acquisition, and less wasted spend on users who were never likely to convert.

If you are spending $10,000 per month on lookalike campaigns and the lookalike built from clean data converts at 3.5% instead of 2.5%, the improvement represents $2,857 in additional revenue per month at the same ad spend. Over a year, that is $34,000 in incremental revenue from a data cleaning step that takes minutes.

Platform-Specific Best Practices

Facebook / Meta Custom Audiences

Facebook matches on email, phone, first name, last name, city, state, zip, country, date of birth, gender, and mobile advertiser ID. Email and phone are the highest-value fields. Facebook recommends providing as many identifiers as possible for each record, as multi-field matching significantly improves match rates. Ensure emails are lowercase and trimmed, phone numbers include the country code, and names use proper casing with first and last in separate columns. For a detailed walkthrough, see our guide to increasing Facebook audience match rates and our guide to cleaning CSVs for Facebook Ads.

Google Ads Customer Match

Google matches on email, phone, first name, last name, country, and zip code. Google requires a minimum of 1,000 matched records for a Customer Match audience to be usable. For lists close to this threshold, every additional match matters. Google explicitly requires that email addresses be lowercase before hashing. Phone numbers must include the country code in E.164 format. Google also supports mailing address matching, so including address fields can boost your match rate even if the email is missing or incorrect.

TikTok Custom Audiences

TikTok matches on email, phone, and mobile advertiser ID (IDFA/GAID). TikTok's user base skews younger, which means email addresses are more likely to be personal Gmail or iCloud addresses rather than work emails. If your list is heavy on corporate email addresses, phone matching becomes proportionally more important. Ensure all phone numbers include the country code and that email addresses are cleaned for common typos in consumer email providers.

The Pre-Upload Cleaning Workflow

Before uploading any customer list to an ad platform, run it through this cleaning workflow to maximize your match rate and minimize wasted spend.

Step 1: Validate and Clean Email Addresses

Run every email address through validation to catch typos, invalid domains, and syntax errors. Remove obvious invalid entries like "test@test.com" or "noemail@none.com". Convert all emails to lowercase. Trim leading and trailing whitespace. Fix common domain typos: "gmial.com" to "gmail.com", "yahooo.com" to "yahoo.com", "hotmal.com" to "hotmail.com". The email validator handles all of these corrections automatically.

Step 2: Format Phone Numbers to E.164

Convert every phone number to E.164 format: +[country code][subscriber number] with no spaces, dashes, or parentheses. For U.S. numbers, this means +1XXXXXXXXXX. Remove any extensions. Flag numbers that appear to be landlines, as these will not match mobile-only platform accounts. The phone formatter converts any input format to E.164 in a single pass.

Step 3: Deduplicate Records

Remove duplicate records using fuzzy matching. Two records with slightly different name spellings but the same email address are the same person. Two records with the same phone number but different email addresses might be the same person. Dedup after format standardization so that "555-123-4567" and "+15551234567" are recognized as identical. Our guide to merging CSVs and removing duplicates covers advanced dedup scenarios.

Step 4: Standardize Names

Separate first and last names into individual columns if they are combined. Convert all names to proper case (capitalize first letter, lowercase the rest). Remove titles like "Mr.", "Mrs.", "Dr." that do not match how users register on ad platforms. Trim extra whitespace and remove special characters. Consistent name formatting improves the platform's ability to use names as a supplementary matching signal.

Step 5: Add Missing Country Codes and Zip Codes

If your data includes addresses, ensure that country codes are in the correct format for each platform (two-letter ISO codes for most platforms). Standardize zip codes to the five-digit format for U.S. addresses. These supplementary fields improve multi-field matching and can rescue records where the email or phone alone does not produce a match.

NoSheet's Pre-Upload Cleaning for Ad Platforms

NoSheet was built to solve exactly this problem. Instead of manually running your customer list through five separate cleaning steps using spreadsheet formulas, scripts, and third-party tools, you upload your file to NoSheet and run the entire pre-upload workflow in a single pass. Email validation, phone E.164 formatting, deduplication, name standardization, and whitespace cleanup all happen simultaneously.

The result is a clean, platform-ready file that maximizes your match rate on Facebook, Google, TikTok, and any other platform that supports custom audience uploads. Every email is validated and lowercase. Every phone number is in E.164 format. Duplicates are removed. Names are standardized. The file is ready to upload the moment it downloads.

For teams that upload custom audiences regularly, whether weekly, bi-weekly, or monthly, integrating NoSheet into the upload workflow eliminates the recurring manual labor and ensures consistent data quality across every campaign. The time savings alone justify the investment, but the real value is in the match rate improvement and the downstream impact on campaign performance.

If you are also preparing data for specific CRM platforms before ad targeting, our guides on cleaning for HubSpot, Salesforce, and Shopify cover the platform-specific formatting requirements that affect the quality of your exports.

The Bottom Line

Every dollar you spend on ads that target custom or lookalike audiences is influenced by the quality of the data that built those audiences. Dirty data means low match rates, which means smaller audiences, weaker lookalikes, and higher cost per acquisition. Clean data means high match rates, larger audiences, stronger lookalikes, and more revenue per ad dollar spent. The cleaning step takes minutes. The impact lasts for every campaign you run on that data. Stop uploading dirty lists and start getting the match rates your ad budget deserves.

Boost Your Ad Match Rates Today

Upload your customer list and get platform-ready data with validated emails, E.164 phones, and zero duplicates. Higher match rates, lower wasted spend.

Clean Your Ad Audience List