CSV data cleaning
Basics of data cleansing for CSV and what you can do with this tool.
What is data cleaning?
Data cleaning (cleansing) means finding and fixing or removing bad or noisy data so it’s ready for analysis or integration. For CSV, that often means removing invisible characters, trimming spaces, and finding duplicates.
Why it matters
Uncleaned CSV can cause:
- “Same” values not matching (invisible chars or spaces)
- Primary key violations on import
- Wrong aggregates when numbers or dates are mixed with text
Cleaning first reduces errors and rework. See also CSV errors guide.
Removing invisible characters
Pasted or imported data may contain zero-width spaces, control characters, or odd spaces. The single-file check detects “invisible characters” and can remove them with Fix all issues. All processing stays in your browser.
Trimming spaces
Leading/trailing spaces make “ A ” and “A” different and break matching. The single-file check can trim them in one click and optionally normalize full‑width/half‑width per column.
Finding duplicates
Duplicate IDs or emails can cause DB errors or wrong merges. The duplicate data guide explains how to find and handle them. This tool detects and lists duplicates; you then edit the downloaded CSV to remove or merge as needed.
Cleaning workflow (this tool)
- Format check: encoding, delimiter, column count, empty lines.
- If needed, encoding fix to UTF-8 BOM.
- Single-file check: upload CSV, detect invisible chars, duplicate IDs, spaces.
- Apply “Fix all issues” and download the cleaned CSV.
Open the tools
- Single-file check — invisible characters, trim, duplicate IDs
- Format & basic check — encoding, delimiter, column mismatch
- Encoding recovery — fix garbled text first if needed
- Duplicate data guide — decide how to handle duplicates