• Luc Patterson

AVOID These 3 Deduping DIY Disasters


Man holding business end of chainsaw between his legs and pulling the start chord
Even the "right tools", if used incorrectly, can have devastating results

We had to deliver the bad news:


The agency's team was removing the wrong records from the clients' data.

On one small campaign alone the agency lost $1100.


But it was worse: they had been running their data quality process incorrectly for months.


Oh, wait! It gets even worse... not only did they remove the incorrect records, they were also removing their client's best customers!


This is ugly! They lost money for themselves and even more money for the client.


Excel is very powerful but if used incorrectly disasters result.


Here are 3 common DIY disasters we uncovered:


(BEFORE stepping through ANY process, ALWAYS make a copy of your data!)


Disaster 1: Removing your best customers


If you have customers in your database that have purchased more than once from your business, they are your best customers. These are the customers that are most likely to respond to your marketing.


Oftentimes, when deduping your customer list, the entire set of duplicate records are mistakenly removed. What happens to your profits when you erase your best customers? Exactly!


How does this happen? One way to identify duplicates in Excel is to highlight the records that are duplicates based on conditional formatting. This functionality is great for VIEWING duplicates.


However, do not use conditional formatting to delete duplicates if your goal is to keep one of the records from the duplicate set.


If you do delete duplicates based on conditional formatting, you can end up removing all of the records within that duplicate set.


Disaster 2: Removing the wrong records


Although using the "Remove Duplicates" function in Excel is handy in a pinch, it may not be ideal for the best results.


Did you know that you can prioritize records to ensure you keep the best one? When using the "Remove Duplicates" function, Excel keeps the FIRST record and removes the rest in a duplicate set.


Important!: BEFORE you "Remove Duplicates", sort your file so that your best record will be the top record in the set and the one that Excel keeps.


For example, if some records have an email and others do not, you can add email address to your sorted fields. The records that have an email address will appear at the top so that when running "Remove Duplicates", the record that has an email address will remain.


Disaster 3: Not standardizing before deduping


When Excel dedupes, it is matching on the EXACT field contents. When you use the "Remove Duplicates" function, there is NO fuzzy matching. Even an extra space will cause 2 fields that look the same to not match!


So, it is critical to ensure the fields you are using for deduping are consistent and standardized. For example:


If choosing address:

  • are apartment numbers part of the address or in a separate field or mixed?

  • are some addresses incomplete?

  • is the zip code field only the 5 digit zip code or a mix of 5 digit and 9 digit?

If choosing names:

  • is there a mixture of full names and split names?

  • is there punctuation in some names but not others?

If choosing phone number:

  • do dashes exist in some records but not others?

  • are parentheses around some of the area codes?

  • do you have multiple phone fields?


What other DIY disasters are lurking in your process?

Need to upgrade your processing? Worried about your data integrity? Concerned about your job?


If you are not sure how successful your results are, you can always contact us or upload your file here for a no-cost evaluation.



IN the NEXT post: 5 Critical Steps for deduping in Excel