How to Resolve "UnicodeDecodeError: 'charmap' codec can't decode byte" in Python

The UnicodeDecodeError: 'charmap' codec can't decode byte ... error in Python occurs when you try to read or decode a file (or byte string) using the wrong character encoding. This typically happens on Windows, where the default 'charmap' codec (often cp1252) doesn't match the file's actual encoding (often UTF-8).

This guide explains how to diagnose and fix this error.

Understanding the Error

The error message:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined>

means:

'charmap': Python is trying to use the system's default character encoding, which is often a legacy encoding like cp1252 on Windows.
can't decode byte 0x9d: The specific byte (represented in hexadecimal) 0x9d can not be mapped to a character in the 'charmap' encoding.
in position 1: The problematic byte is at the specified position (starting from 0) within the input.
character maps to <undefined> The character is not defined in the codec.

Solution 1: Specify the Correct Encoding (UTF-8)

The vast majority of text files today are encoded using UTF-8. The most common and reliable solution is to explicitly specify UTF-8 encoding when opening the file:

with open('example.txt', 'r', encoding='utf-8') as f:
    lines = f.readlines()
    print(lines)

encoding='utf-8': This tells Python to use UTF-8 to decode the file's contents.

Solution 2: Use `utf-8-sig` for Files with a BOM

Some files (especially those created on Windows) might have a Byte Order Mark (BOM) at the beginning. If you see \ufeff at the start of your output, use utf-8-sig:

with open('example.txt', 'r', encoding='utf-8-sig') as f:
    lines = f.readlines()
    print(lines)

utf-8-sig is a variant of UTF-8 that specifically handles the BOM.

Solution 3: Handling Unknown Encodings (with `errors='ignore'` or `chardet`)

If you don't know the file's encoding, you have two main options:

`errors='ignore'`

You can tell Python to ignore decoding errors. This will result in data loss, but it will prevent the program from crashing.

with open('example.txt', 'r', encoding='utf-8', errors='ignore') as f:
    lines = f.readlines()
    print(lines) # May contain replacement characters or missing data

errors='ignore': This tells Python to skip any bytes it can't decode. This is a last resort as it will lose data.

The `chardet` Library

The chardet library attempts to detect the encoding of a file. This is not foolproof, but it's often helpful.

pip install chardet

import chardet

with open('example.txt', 'rb') as f:  # Open in binary mode for chardet
    rawdata = f.read()
    result = chardet.detect(rawdata)
    encoding = result['encoding']
    confidence = result['confidence']
    print(f"Detected encoding: {encoding} (confidence: {confidence})")

with open('example.txt', 'r', encoding=encoding) as f: # Open again with correct encoding
    lines = f.readlines()
    print(lines)

chardet.detect() analyzes the raw bytes and returns a dictionary with its best guess for the encoding and a confidence level.
Open the file in binary read mode ('rb') when using chardet.
Use the detected encoding when you re-open the file in text mode ('r').

Solution 4: Using Other Encodings (If You Know It)

If you know the file is encoded with a specific encoding (e.g., 'latin-1', 'cp437', 'utf-16'), use that encoding directly:

with open('example.txt', 'r', encoding='latin-1') as f:
    lines = f.readlines()
    print(lines)

with open('example.txt', 'r', encoding='cp437') as f: # Example with cp437
    lines = f.readlines()
    print(lines)

Finding the File's Encoding (If Unknown)

If you're unsure of the encoding, here are some ways to try and determine it:

Using the `file` Command (Linux/macOS)

The file command (on Linux/macOS, and available in Git Bash on Windows) can often guess the encoding:

file example.txt

Check the result to find which encoding to use.

Using Notepad (Windows)

On Windows, Notepad can sometimes show the encoding:

Open the file in Notepad.
Go to "File" -> "Save As...".
Look at the Encoding dropdown near the bottom of the Save As` dialog. It will show the currently detected encoding.

Conclusion

The UnicodeDecodeError: 'charmap' error almost always indicates an encoding mismatch.

The best solution is to explicitly specify the correct encoding (usually UTF-8) when opening the file using encoding='utf-8' or encoding='utf-8-sig'.

If the encoding is unknown, try using the chardet library to detect it, or try common encodings like 'latin-1'.
As a last resort, you can ignore errors with errors='ignore', but be aware that this will result in data loss.
Always prioritize finding the correct encoding.

Understanding the Error​

Solution 1: Specify the Correct Encoding (UTF-8)​

Solution 2: Use utf-8-sig for Files with a BOM​

Solution 3: Handling Unknown Encodings (with errors='ignore' or chardet)​

errors='ignore'​

The chardet Library​

Solution 4: Using Other Encodings (If You Know It)​

Finding the File's Encoding (If Unknown)​

Using the file Command (Linux/macOS)​

Using Notepad (Windows)​

Conclusion​

Table of Contents