Encoding issues and garbled CSV

Why CSV text gets garbled and how to fix it by converting to UTF-8.

What is character encoding?

A character encoding is a mapping between characters (letters, digits, symbols) and the byte sequences used to store them in a file. When you save a text file, every character is converted to bytes using an encoding. When you open the file, those bytes are converted back to characters using — ideally — the same encoding.

CSV is plain text, so encoding matters for every single character in the file. If the encoding used to save the file does not match the encoding used to open it, characters are decoded incorrectly and display as garbled symbols, question marks, or boxes. This is what “garbled text” means.

Common encodings and where they appear

UTF-8
The universal standard for text on the web and modern systems. Supports all languages and special characters. Most APIs, databases, and modern applications expect UTF-8. Recommended for all new files.
UTF-8 with BOM
UTF-8 with a three-byte prefix (EF BB BF) at the start of the file. Excel uses this BOM to recognise UTF-8 files and open them correctly. Without the BOM, Excel may default to a regional encoding and garble non-ASCII characters. If your CSV will be opened in Excel, use UTF-8 BOM.
Shift-JIS (CP932)
The dominant encoding for Japanese text in legacy Windows software, older databases, and many Japanese government systems. Files from Japanese ERP systems or older Excel versions are frequently in Shift-JIS.
EUC-KR / CP949
Common encodings for Korean text in older systems, especially Windows applications. Modern Korean web content and systems use UTF-8, but exports from legacy databases or older software may still use EUC-KR or CP949.
Windows-1252 (CP1252)
Used for Western European languages (English, French, German, Spanish, etc.) in older Windows applications. Excel’s default encoding for CSV in Western European locales is often Windows-1252, not UTF-8.
ISO-8859-1 (Latin-1)
An older Western European encoding, largely superseded by Windows-1252. Some legacy Unix systems and older web exports still produce ISO-8859-1 files.

Why Excel garbles CSV text

Excel’s CSV handling is the most common source of encoding problems. Here’s what happens:

To save correctly from Excel: use ”Save As → CSV UTF-8 (Comma delimited)” instead of plain “CSV (Comma delimited)”. This option produces a UTF-8 BOM file.

How to identify a file’s encoding

There is no guaranteed way to detect encoding from the file contents alone — encoding information is not stored inside the file. Tools use statistical analysis of byte patterns to make an educated guess. The Format & basic check shows the detected encoding and confidence level. If the detection looks wrong (garbled characters in the preview), you can override it manually in Encoding fix.

Clues that help determine the correct encoding:

Step-by-step: fixing a garbled CSV

  1. Open Encoding fix — go to Encoding fix and drop your file onto the upload area. Nothing is sent to a server; the file is read entirely in your browser.
  2. Check the auto-detected encoding — the tool shows its best guess for the source encoding. Look at the preview of the first few lines.
  3. If the preview looks correct — proceed to download as UTF-8 BOM.
  4. If the preview is still garbled — try selecting the source encoding manually from the dropdown. For Japanese files try Shift-JIS; for Korean try EUC-KR or CP949; for Western European try Windows-1252.
  5. Download UTF-8 BOM — the downloaded file has the correct characters and will open correctly in Excel and all modern systems.
  6. Verify — run Format & basic check on the converted file to confirm encoding is now UTF-8 and the content looks right.

Choosing the right encoding for your workflow

Prevention

Open the tools

Home · Use encoding fix