Encoding issues and garbled CSV
Why CSV text gets garbled and how to fix it by converting to UTF-8.
What is character encoding?
A character encoding is a mapping between characters (letters, digits, symbols) and the byte sequences used to store them in a file. When you save a text file, every character is converted to bytes using an encoding. When you open the file, those bytes are converted back to characters using — ideally — the same encoding.
CSV is plain text, so encoding matters for every single character in the file. If the encoding used to save the file does not match the encoding used to open it, characters are decoded incorrectly and display as garbled symbols, question marks, or boxes. This is what “garbled text” means.
Common encodings and where they appear
- UTF-8
- The universal standard for text on the web and modern systems. Supports all languages and special characters. Most APIs, databases, and modern applications expect UTF-8. Recommended for all new files.
- UTF-8 with BOM
- UTF-8 with a three-byte prefix (EF BB BF) at the start of the file. Excel uses this BOM to recognise UTF-8 files and open them correctly. Without the BOM, Excel may default to a regional encoding and garble non-ASCII characters. If your CSV will be opened in Excel, use UTF-8 BOM.
- Shift-JIS (CP932)
- The dominant encoding for Japanese text in legacy Windows software, older databases, and many Japanese government systems. Files from Japanese ERP systems or older Excel versions are frequently in Shift-JIS.
- EUC-KR / CP949
- Common encodings for Korean text in older systems, especially Windows applications. Modern Korean web content and systems use UTF-8, but exports from legacy databases or older software may still use EUC-KR or CP949.
- Windows-1252 (CP1252)
- Used for Western European languages (English, French, German, Spanish, etc.) in older Windows applications. Excel’s default encoding for CSV in Western European locales is often Windows-1252, not UTF-8.
- ISO-8859-1 (Latin-1)
- An older Western European encoding, largely superseded by Windows-1252. Some legacy Unix systems and older web exports still produce ISO-8859-1 files.
Why Excel garbles CSV text
Excel’s CSV handling is the most common source of encoding problems. Here’s what happens:
- Saving: When you use “Save As → CSV (Comma delimited)” in Excel, it saves the file in your operating system’s default regional encoding — Windows-1252 on Western Windows, Shift-JIS on Japanese Windows, etc. — not UTF-8. This file then garbles when opened on other systems that assume UTF-8.
- Opening: When you open a UTF-8 CSV that does not have a BOM, Excel assumes the regional encoding and garbles non-ASCII characters. French accents, Japanese characters, Korean text, and special symbols all appear as garbage.
- The BOM solution: If the UTF-8 file starts with the UTF-8 BOM (EF BB BF), Excel recognises it as UTF-8 and opens it correctly. This is why “UTF-8 BOM” is the recommended format for CSV files that will be used in Excel.
To save correctly from Excel: use ”Save As → CSV UTF-8 (Comma delimited)” instead of plain “CSV (Comma delimited)”. This option produces a UTF-8 BOM file.
How to identify a file’s encoding
There is no guaranteed way to detect encoding from the file contents alone — encoding information is not stored inside the file. Tools use statistical analysis of byte patterns to make an educated guess. The Format & basic check shows the detected encoding and confidence level. If the detection looks wrong (garbled characters in the preview), you can override it manually in Encoding fix.
Clues that help determine the correct encoding:
- Where was the file created? Japanese system → likely Shift-JIS. Korean system → likely EUC-KR/CP949. Western Windows → likely Windows-1252.
- What application exported it? Check the application’s export settings for an encoding option.
- Does the preview look correct? Try the likely encoding and verify visually.
Step-by-step: fixing a garbled CSV
- Open Encoding fix — go to Encoding fix and drop your file onto the upload area. Nothing is sent to a server; the file is read entirely in your browser.
- Check the auto-detected encoding — the tool shows its best guess for the source encoding. Look at the preview of the first few lines.
- If the preview looks correct — proceed to download as UTF-8 BOM.
- If the preview is still garbled — try selecting the source encoding manually from the dropdown. For Japanese files try Shift-JIS; for Korean try EUC-KR or CP949; for Western European try Windows-1252.
- Download UTF-8 BOM — the downloaded file has the correct characters and will open correctly in Excel and all modern systems.
- Verify — run Format & basic check on the converted file to confirm encoding is now UTF-8 and the content looks right.
Choosing the right encoding for your workflow
- Will the file be opened in Excel? Use UTF-8 BOM. This is the only UTF-8 format Excel reliably opens without garbling.
- Will the file be imported into a database or API? Use UTF-8 without BOM. Most modern systems expect plain UTF-8; the BOM can cause parsing issues in some importers.
- Does the target system require a specific encoding? Use that encoding. Some legacy Japanese and Korean systems require Shift-JIS or EUC-KR. Check the system’s import documentation.
Prevention
- Establish a single encoding standard for your team (UTF-8 BOM for Excel workflows, plain UTF-8 for everything else) and document it.
- When exporting from Excel, always use “CSV UTF-8 (Comma delimited)”, not plain “CSV (Comma delimited)”.
- Run Format & basic check on any file received from an external source before using it — catch encoding issues at the intake stage, not after import.
Open the tools
- Encoding recovery — auto-detect and convert to UTF-8 BOM
- Format & basic check — confirm detected encoding and delimiter
- Single-file check — verify data quality after conversion
- CSV errors guide — full checklist for import failures