Data Operations

How to Merge Two CSV Files and Remove Duplicates

Updated March 2026 · 9 min read

You have two CSV files. Maybe one is from your marketing team's webinar sign-ups and the other is from a purchased lead list. Maybe one is an export from HubSpot and the other from Salesforce. Maybe one is from last quarter and the other from this quarter. Whatever the source, you need to combine them into a single clean file with no duplicate records. This sounds simple. It is not. Different column orders, inconsistent formatting, partial duplicates where the same person appears with slightly different data — all of these make merge-and-dedup one of the trickiest common data operations. This guide walks through the entire process step by step, covers the pitfalls that catch people off guard, and shows how NoSheet makes the whole thing painless.

Why Merging CSVs Is Harder Than It Looks

If both files had identical column headers in the same order with perfectly consistent data formatting, merging would be trivial: concatenate the rows and done. In reality, that almost never happens. Here are the challenges you will actually face:

Different Column Orders

File A has columns in the order: Name, Email, Phone, Company. File B has: Company, Phone, Email, First Name, Last Name. You cannot just paste one below the other — the data will end up in the wrong columns. You need to align the columns first, mapping "First Name" + "Last Name" to "Name" or splitting "Name" into two fields to match File B's structure.

Different Column Names

One file calls it "Email Address," the other calls it "email," and a third calls it "E-mail." One uses "Phone Number," the other uses "Mobile." These are the same data, but no automated tool will recognize that unless you explicitly map them.

Inconsistent Data Formatting

This is where things get really messy. The same person might appear in both files, but their data looks different:

Email in File A: John.Smith@Company.com
Email in File B: john.smith@company.com
Phone in File A: (415) 555-1234
Phone in File B: +14155551234

A naive dedup comparing these strings would consider them different records. They are not. They are the same person with different formatting. You need to standardize before you deduplicate, or your "merged" file will be full of duplicates that just look slightly different.

Partial Duplicates

Sometimes a record appears in both files but with different amounts of information. File A has name and email. File B has the same email but also includes phone number and company. Do you keep both? Merge the records? Keep the more complete one? This requires a decision and a strategy, not just a simple dedup.

The Merge-and-Dedup Workflow (Step by Step)

Here is the correct order of operations for merging two CSVs and removing duplicates. Skipping steps or doing them out of order is the main reason people end up with bad results.

Step 1: Inspect Both Files

Before you do anything, look at both files. Note the column names, column order, and a sample of the data in each. Identify which columns exist in both files, which exist in only one, and which columns contain the same data under different names. This is your mapping step.

Step 2: Align the Columns

Rename columns so both files use the same headers. If File A has "Email Address" and File B has "email," pick one and rename the other. If File A has a single "Name" column and File B has "First Name" and "Last Name," decide which format to use and transform accordingly. The goal is for both files to have identical column headers in the same order before you combine them.

Step 3: Concatenate the Files

With columns aligned, you can now stack the rows from both files into a single file. In a spreadsheet, this means copying all rows from File B and pasting them below the last row of File A. In code, this is a simple pd.concat([df_a, df_b]) in pandas. The combined file will have duplicate records at this point — that is expected.

Step 4: Standardize Before Dedup

This is the step most people skip, and it is the most important one. Before you can accurately identify duplicates, you need to normalize the data in your dedup key columns:

Lowercase all email addresses. Email addresses are case-insensitive by spec (John@Gmail.com and john@gmail.com deliver to the same inbox), so lowercase everything before comparing.
Format all phone numbers to E.164. Convert (415) 555-1234, 415.555.1234, and +1 415 555 1234 all to +14155551234. See our E.164 conversion guide for details.
Trim whitespace. Leading and trailing spaces are invisible but will cause string comparisons to fail. " john@gmail.com" does not equal "john@gmail.com".
Standardize name formatting. Decide on a capitalization style (Title Case is standard for names) and apply it consistently.

Step 5: Deduplicate on Key Columns

Now that your data is standardized, you can identify true duplicates. Choose your dedup key — the column or combination of columns that uniquely identifies a record:

Email only: The most common approach. Two records with the same email address are considered the same person.
Phone only: Useful when email data is sparse or unreliable.
Email + Phone: More conservative. Only marks records as duplicates if both email and phone match. Reduces false positives but may leave some real duplicates.
Email OR Phone: More aggressive. Marks records as duplicates if either the email or the phone matches. Catches more duplicates but may produce false positives (two people sharing a company phone number, for example).

When duplicates are found, you need a resolution strategy: keep the first occurrence, keep the last occurrence, keep the most complete record, or merge fields from both records. For most use cases, keeping the first occurrence (or the more complete record) is the right choice.

Step 6: Verify Your Results

After dedup, verify the numbers. If File A had 25,000 rows and File B had 18,000 rows, your merged file should have somewhere between 18,000 (maximum overlap) and 43,000 (zero overlap) rows. If you end up with more than 43,000 rows, something went wrong with the merge. If you end up with far fewer than expected, your dedup might have been too aggressive.

Spot-check a few records that were marked as duplicates. Confirm they are truly the same person. Also spot-check records that were kept to make sure no data was lost in the merge.

The Workflow in Text Diagram Form

File A (25,000 rows) ——→ Column Alignment ——→ |

| Concatenate

File B (18,000 rows) ——→ Column Alignment ——→ |

Combined file (43,000 rows)

Standardize (lowercase emails, E.164 phones)

Deduplicate on email (7,200 dupes found)

Clean merged file (35,800 rows)

How NoSheet Handles the Entire Merge-and-Dedup Workflow

NoSheet streamlines this entire multi-step process into a guided workflow that anyone on your team can use, regardless of technical skill:

Upload both files. Drag and drop both CSVs. NoSheet reads them simultaneously and displays the column headers from each.
Auto-detect column mapping. NoSheet automatically matches columns with similar names across the two files. "Email Address" in File A gets mapped to "email" in File B. You can review and adjust the mapping before proceeding.
Apply cleaning operations. Before dedup, apply any combination of cleaning ops: lowercase emails, format phones to E.164, trim whitespace, standardize dates, remove blank rows. These all run in a single pass.
One-click dedup. Choose your dedup key column(s), select your resolution strategy (keep first, keep last, keep most complete), and click dedup. NoSheet shows you exactly how many duplicates were found and lets you review them before confirming.
Download the merged clean file. A single CSV with aligned columns, standardized data, and no duplicates. Ready for Salesforce, HubSpot, Mailchimp, or whatever comes next.

The entire process takes under two minutes for files with tens of thousands of rows. Everything runs in your browser — your contact data never touches a server.

Common Merge-and-Dedup Scenarios

Here are the situations where this workflow comes up most often:

Combining marketing lists. Your team ran three webinars this quarter. Each produced a separate CSV of registrants. You need one unified list for the follow-up email campaign, with no one receiving the email twice.
Merging CRM exports. Your company uses Salesforce for enterprise accounts and HubSpot for mid-market. An executive wants a single report of all contacts. You need to merge the exports and dedup on email.
Purchased list integration. You bought a list of 15,000 contacts from a data provider. Before loading them into your CRM, you need to merge them with your existing contacts and remove anyone who is already in your database.
Post-acquisition data merge. Your company acquired a competitor. Their customer list needs to be merged with yours. Different systems, different formats, overlapping customers.
Event consolidation. A trade show gave you a CSV of badge scans. Your booth team collected a separate list of people they had conversations with. Merge the two, dedup, and follow up with everyone exactly once.

Tips for Better Merge Results

A few practical tips that will improve your merge-and-dedup accuracy:

Always standardize before dedup. This is the single most important rule. If you skip standardization, you will miss duplicates that are formatted differently.
Use email as your primary dedup key. Email is the most reliable unique identifier for contacts. Phone numbers can be shared (company main lines), and names are not unique.
Handle empty key fields. Before dedup, remove or flag rows where the dedup key column is empty. Otherwise, all records without an email will be "deduplicated" into one record, which is probably not what you want.
Keep a record count log. Note the row count of each source file, the combined count, the number of duplicates found, and the final count. This lets you quickly verify that the math adds up and nothing was lost.
Preserve the source. Add a "Source" column to each file before merging (e.g., "Webinar Q1," "Purchased List March") so you can always trace where a record came from after the merge.

Merge and Dedup Your CSVs in Under 2 Minutes

Upload both files, map columns, clean, dedup, download. No code, no Excel, no data leaving your browser.

Try NoSheet Free