Convert Shift_JIS files to UTF-8
Mojibake in Japanese CSV and text files usually means a Shift_JIS ↔ UTF-8 mismatch. Concrete conversion recipes by OS.
Symptoms
- CSV opened in Excel shows garbled characters like "繝・Η繝シ".
- Japanese filenames on GitHub render as mojibake.
- Terminal output collapses to question marks.
Mac / Linux (CLI)
iconv -f SHIFT_JIS -t UTF-8 in.csv > out.csv
# With BOM (helps Excel on Windows)
printf '\xEF\xBB\xBF' > out.csv && iconv -f SHIFT_JIS -t UTF-8 in.csv >> out.csv
Windows (PowerShell)
$bytes = [System.IO.File]::ReadAllBytes('in.csv')
$text = [System.Text.Encoding]::GetEncoding('shift_jis').GetString($bytes)
[System.IO.File]::WriteAllText('out.csv', $text, [System.Text.UTF8Encoding]::new($true))
VS Code
- Open the file. Click the encoding in the bottom-right.
- Reopen with Encoding > Shift_JIS.
- Save with Encoding > UTF-8 with BOM.
Which encoding to save as
- For CSVs opened in Excel: UTF-8 with BOM.
- For anything consumed by web or programmatic tools: UTF-8 (no BOM).