How to Solve "LookupError: unknown encoding" in Python
The LookupError: unknown encoding
error in Python occurs when you try to use an encoding that Python doesn't recognize. This typically happens when opening files, encoding/decoding strings, or configuring standard input/output.
This guide explains the causes of this error and provides solutions, including using valid encodings, setting environment variables, and reconfiguring sys.stdin
and sys.stdout
.
Understanding the Error: Invalid Encoding
The LookupError: unknown encoding
error means you've specified an encoding name that Python's codec registry doesn't know. This most commonly happens in these situations:
- Opening files:
open('filename.txt', 'r', encoding='invalid-encoding')
- Encoding/Decoding strings:
'my string'.encode('invalid-encoding')
orb'my bytes'.decode('invalid-encoding')
- Changing Standard Input/Output Encoding.
Example of the error:
# ⛔️ LookupError: unknown encoding: example
with open('example.txt', 'w', encoding='example') as my_file: # 'example' is invalid
my_file.write('first line' + '\n')
Using Valid Encodings
The most direct solution is to use a valid encoding. Here are some of the most common and recommended encodings:
utf-8
: The most widely used encoding for Unicode text. It can represent virtually any character from any language. This is generally the best default choice.utf-8-sig
: Same asutf-8
, but it automatically handles the BOM (Byte Order Mark) if present at the beginning of a file. Use this when reading files that might have a BOM.latin-1
(oriso-8859-1
): A common encoding for Western European languages. It's a single-byte encoding, so it can't represent as many characters as UTF-8.ascii
: A very basic encoding that only covers the standard English alphabet, numbers, and some punctuation. It's a subset of UTF-8. Use it only if you're certain your data contains only ASCII characters.utf-16
andutf-32
: Other Unicode encodings, less commonly used for file I/O than UTF-8.
Corrected Code Example:
# ✅ Specify 'utf-8' encoding
with open('example.txt', 'w', encoding='utf-8') as my_file:
my_file.write('first line' + '\n')
my_file.write('second line' + '\n')
my_file.write('third line' + '\n')
- This code uses the
utf-8
encoding to encode the file.
Where to Find a List of Valid Encodings
Python has a comprehensive list of supported encodings. You can find it in the official documentation:
Setting the PYTHONIOENCODING
Environment Variable
You can set the PYTHONIOENCODING
environment variable to change the default encoding used for standard input, output, and error streams (stdin
, stdout
, stderr
). This is useful if you're consistently working with a specific encoding and don't want to specify it in every open()
call.
-
Linux/macOS:
export PYTHONIOENCODING=utf-8
-
Windows:
setx PYTHONIOENCODING utf-8
setx PYTHONLEGACYWINDOWSSTDIO utf-8 # Also required on some Windows versionsnoteSetting
PYTHONIOENCODING
affects the default encoding. You can still override it within your Python code using theencoding
argument in functions likeopen()
. Also on Windows, you have to set upPYTHONLEGACYWINDOWSSTDIO
to make sure that the default python encoding is used.
Reconfiguring sys.stdin
, sys.stdout
, and sys.stderr
In some situations, you might need to change the encoding of the standard input/output streams within your running Python script. You can do this using sys.stdin.reconfigure()
, sys.stdout.reconfigure()
, and sys.stderr.reconfigure()
(available in Python 3.7+):
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
sys.stderr.reconfigure(encoding='utf-8')
- This code changes the encoding to UTF-8. Place this code at the very beginning of your script, before any other input/output operations. This is a relatively drastic measure and should only be used if you absolutely can not control the environment in which your script is run.