Skip to main content

How to Solve "UnicodeDecodeError: 'charmap' codec can't decode byte" in Python

The UnicodeDecodeError: 'charmap' codec can't decode byte ... error in Python occurs when you try to read or decode a file (or byte string) using the wrong character encoding. This typically happens on Windows, where the default 'charmap' codec (often cp1252) doesn't match the file's actual encoding (often UTF-8).

This guide explains how to diagnose and fix this error.

Understanding the Error

The error message:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined>

means:

  • 'charmap': Python is trying to use the system's default character encoding, which is often a legacy encoding like cp1252 on Windows.
  • can't decode byte 0x9d: The specific byte (represented in hexadecimal) 0x9d can not be mapped to a character in the 'charmap' encoding.
  • in position 1: The problematic byte is at the specified position (starting from 0) within the input.
  • character maps to <undefined> The character is not defined in the codec.

Solution 1: Specify the Correct Encoding (UTF-8)

The vast majority of text files today are encoded using UTF-8. The most common and reliable solution is to explicitly specify UTF-8 encoding when opening the file:

with open('example.txt', 'r', encoding='utf-8') as f:
lines = f.readlines()
print(lines)
  • encoding='utf-8': This tells Python to use UTF-8 to decode the file's contents.

Solution 2: Use utf-8-sig for Files with a BOM

Some files (especially those created on Windows) might have a Byte Order Mark (BOM) at the beginning. If you see \ufeff at the start of your output, use utf-8-sig:

with open('example.txt', 'r', encoding='utf-8-sig') as f:
lines = f.readlines()
print(lines)
  • utf-8-sig is a variant of UTF-8 that specifically handles the BOM.

Solution 3: Handling Unknown Encodings (with errors='ignore' or chardet)

If you don't know the file's encoding, you have two main options:

errors='ignore'

You can tell Python to ignore decoding errors. This will result in data loss, but it will prevent the program from crashing.

with open('example.txt', 'r', encoding='utf-8', errors='ignore') as f:
lines = f.readlines()
print(lines) # May contain replacement characters or missing data
  • errors='ignore': This tells Python to skip any bytes it can't decode. This is a last resort as it will lose data.

The chardet Library

The chardet library attempts to detect the encoding of a file. This is not foolproof, but it's often helpful.

pip install chardet
import chardet

with open('example.txt', 'rb') as f: # Open in binary mode for chardet
rawdata = f.read()
result = chardet.detect(rawdata)
encoding = result['encoding']
confidence = result['confidence']
print(f"Detected encoding: {encoding} (confidence: {confidence})")

with open('example.txt', 'r', encoding=encoding) as f: # Open again with correct encoding
lines = f.readlines()
print(lines)
  • chardet.detect() analyzes the raw bytes and returns a dictionary with its best guess for the encoding and a confidence level.
  • Open the file in binary read mode ('rb') when using chardet.
  • Use the detected encoding when you re-open the file in text mode ('r').

Solution 4: Using Other Encodings (If You Know It)

If you know the file is encoded with a specific encoding (e.g., 'latin-1', 'cp437', 'utf-16'), use that encoding directly:

with open('example.txt', 'r', encoding='latin-1') as f:
lines = f.readlines()
print(lines)

with open('example.txt', 'r', encoding='cp437') as f: # Example with cp437
lines = f.readlines()
print(lines)

Finding the File's Encoding (If Unknown)

If you're unsure of the encoding, here are some ways to try and determine it:

Using the file Command (Linux/macOS)

The file command (on Linux/macOS, and available in Git Bash on Windows) can often guess the encoding:

file example.txt
  • Check the result to find which encoding to use.

Using Notepad (Windows)

On Windows, Notepad can sometimes show the encoding:

  1. Open the file in Notepad.
  2. Go to "File" -> "Save As...".
  3. Look at the Encoding dropdown near the bottom of the Save As` dialog. It will show the currently detected encoding.

Conclusion

The UnicodeDecodeError: 'charmap' error almost always indicates an encoding mismatch.

The best solution is to explicitly specify the correct encoding (usually UTF-8) when opening the file using encoding='utf-8' or encoding='utf-8-sig'.

  • If the encoding is unknown, try using the chardet library to detect it, or try common encodings like 'latin-1'.
  • As a last resort, you can ignore errors with errors='ignore', but be aware that this will result in data loss.
  • Always prioritize finding the correct encoding.