Skip to main content

How to Solve "LookupError: unknown encoding" in Python

The LookupError: unknown encoding error in Python occurs when you try to use an encoding that Python doesn't recognize. This typically happens when opening files, encoding/decoding strings, or configuring standard input/output.

This guide explains the causes of this error and provides solutions, including using valid encodings, setting environment variables, and reconfiguring sys.stdin and sys.stdout.

Understanding the Error: Invalid Encoding

The LookupError: unknown encoding error means you've specified an encoding name that Python's codec registry doesn't know. This most commonly happens in these situations:

  • Opening files: open('filename.txt', 'r', encoding='invalid-encoding')
  • Encoding/Decoding strings: 'my string'.encode('invalid-encoding') or b'my bytes'.decode('invalid-encoding')
  • Changing Standard Input/Output Encoding.

Example of the error:

# ⛔️ LookupError: unknown encoding: example
with open('example.txt', 'w', encoding='example') as my_file: # 'example' is invalid
my_file.write('first line' + '\n')

Using Valid Encodings

The most direct solution is to use a valid encoding. Here are some of the most common and recommended encodings:

  • utf-8: The most widely used encoding for Unicode text. It can represent virtually any character from any language. This is generally the best default choice.
  • utf-8-sig: Same as utf-8, but it automatically handles the BOM (Byte Order Mark) if present at the beginning of a file. Use this when reading files that might have a BOM.
  • latin-1 (or iso-8859-1): A common encoding for Western European languages. It's a single-byte encoding, so it can't represent as many characters as UTF-8.
  • ascii: A very basic encoding that only covers the standard English alphabet, numbers, and some punctuation. It's a subset of UTF-8. Use it only if you're certain your data contains only ASCII characters.
  • utf-16 and utf-32: Other Unicode encodings, less commonly used for file I/O than UTF-8.

Corrected Code Example:

# ✅ Specify 'utf-8' encoding
with open('example.txt', 'w', encoding='utf-8') as my_file:
my_file.write('first line' + '\n')
my_file.write('second line' + '\n')
my_file.write('third line' + '\n')
  • This code uses the utf-8 encoding to encode the file.

Where to Find a List of Valid Encodings

Python has a comprehensive list of supported encodings. You can find it in the official documentation:

Setting the PYTHONIOENCODING Environment Variable

You can set the PYTHONIOENCODING environment variable to change the default encoding used for standard input, output, and error streams (stdin, stdout, stderr). This is useful if you're consistently working with a specific encoding and don't want to specify it in every open() call.

  • Linux/macOS:

    export PYTHONIOENCODING=utf-8
  • Windows:

    setx PYTHONIOENCODING utf-8
    setx PYTHONLEGACYWINDOWSSTDIO utf-8 # Also required on some Windows versions
    note

    Setting PYTHONIOENCODING affects the default encoding. You can still override it within your Python code using the encoding argument in functions like open(). Also on Windows, you have to set up PYTHONLEGACYWINDOWSSTDIO to make sure that the default python encoding is used.

Reconfiguring sys.stdin, sys.stdout, and sys.stderr

In some situations, you might need to change the encoding of the standard input/output streams within your running Python script. You can do this using sys.stdin.reconfigure(), sys.stdout.reconfigure(), and sys.stderr.reconfigure() (available in Python 3.7+):

import sys

sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
sys.stderr.reconfigure(encoding='utf-8')
  • This code changes the encoding to UTF-8. Place this code at the very beginning of your script, before any other input/output operations. This is a relatively drastic measure and should only be used if you absolutely can not control the environment in which your script is run.