Finding and handling duplicate data
Why duplicate IDs and rows are a problem and how to detect and handle them.
Why duplicates are a problem
When importing CSV into a database, duplicate primary or unique keys cause constraint errors. When merging two CSVs, duplicate keys make it unclear which row is “correct,” leading to wrong overwrites or duplicate rows. So it’s important to find duplicates before import or merge and handle them by your rules.
Choosing the key column
You need to decide which column is the unique key. Examples: ID, email, product code, or a combination of columns. The single-file check suggests key columns or lets you pick one, so you can quickly see which rows are duplicates.
How detection works
This tool flags “duplicate ID” when the same value appears in two or more rows for the chosen column. It lists row numbers and values so you can fix or remove duplicates. You can export the report for use in other tools.
How to handle duplicates
- Exact duplicates: Keep one row and delete the rest, or use your editor’s “remove duplicates” feature.
- Same key, different content: Could be bad data or multiple versions. Decide which row is correct and keep it; delete or merge the others.
- Allow duplicates: For analysis only, you may just want to know duplicates exist and not remove them.
Duplicates in two-file compare
When comparing old vs new CSV with two-file compare, duplicate keys can make diffs show as “changed” instead of “added/removed” because row alignment breaks. Cleaning key-column duplicates with the single-file check first makes the diff result clearer. See CSV errors guide for more.
Open the tools
- Single-file check — list duplicate IDs and apply fixes
- Format & basic check — column count and structure before import
- Compare two files — review changes between versions
- CSV cleaning guide — broader cleanup workflow