Skip to main content

How to Resolve Python Error "UnicodeEncodeError: 'ascii' codec can't encode character ..."

The UnicodeEncodeError: 'ascii' codec can't encode character '...' in position X: ordinal not in range(128) is a common Python error encountered when attempting to convert a string (str) containing non-ASCII characters into a sequence of bytes (bytes) using the limited ASCII encoding. ASCII can only represent characters with values 0-127 (standard English letters, numbers, basic symbols), while modern text often uses a wider range of Unicode characters (accented letters, emojis, symbols, non-Latin scripts).

This guide explains why this encoding error occurs and provides the standard solutions, primarily using the UTF-8 encoding.

Understanding the Error: Strings, Bytes, and Encoding

  • Strings (str): Sequences of Unicode characters representing text.
  • Bytes (bytes): Sequences of raw byte values (0-255).
  • Encoding: Converting str to bytes using a specific codec (e.g., text.encode('utf-8')).
  • Decoding: Converting bytes back to str using a specific codec (e.g., byte_data.decode('utf-8')).

The UnicodeEncodeError specifically happens during the encoding process (string to bytes).

The Cause: Encoding Non-ASCII Characters with the ASCII Codec

The ASCII encoding standard defines mappings only for byte values 0 through 127. When you try to encode a Python string that contains characters outside this range (like 'é', 'ф', '€', '’', '你好') using the 'ascii' codec, Python encounters a character it has no representation for within the ASCII standard and raises the UnicodeEncodeError.

The error message 'ascii' codec can't encode character '\uXXXX' in position Y... tells you exactly which character (\uXXXX is the Unicode code point) at which position (Y) could not be represented using ASCII.

# Error Scenario
my_string = "Don’t use ASCII for this!" # Contains '’' (U+2019)

try:
# ⛔️ UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 4: ordinal not in range(128)
# Attempting to encode a string with non-ASCII characters using the 'ascii' codec
ascii_bytes = my_string.encode('ascii')
print(ascii_bytes)
except UnicodeEncodeError as e:
print(e)

Python's default encoding varies by system and version, but explicitly calling .encode('ascii') or operations that implicitly use ASCII (in some older contexts or libraries) on non-ASCII strings will trigger this.

Solution 1: Specify a Capable Encoding (Usually utf-8)

The standard and most reliable solution is to encode your string using an encoding that can handle the full range of Unicode characters present in your string. UTF-8 is the universal standard and the recommended choice in almost all modern applications.

For str.encode()

Explicitly specify 'utf-8' (or another suitable encoding if necessary) when calling .encode().

my_string = "Don’t use ASCII for this! (é, ф)"

# ✅ Encode using UTF-8
try:
utf8_bytes = my_string.encode('utf-8')
print(f"Original String: '{my_string}'")
print(f"Encoded Bytes (UTF-8): {utf8_bytes}")
# Example Output: b'Don\xe2\x80\x99t use ASCII for this! (\xc3\xa9, \xd1\x84)'

# For verification, decode back using UTF-8
decoded_string = utf8_bytes.decode('utf-8')
print(f"Decoded back: '{decoded_string}'") # Matches original
except UnicodeEncodeError as e:
print(f"Unexpected encoding error with utf-8: {e}") # Should not happen for valid Unicode

For open() (Writing to Files)

When writing text containing non-ASCII characters to a file, always specify encoding='utf-8' in the open() call.

my_string = "Writing non-ASCII: résumé, 你好"
filename = "output_utf8.txt"

try:
# ✅ Specify UTF-8 encoding when opening file for writing
with open(filename, 'w', encoding='utf-8') as f:
f.write(my_string)
print(f"Successfully wrote to '{filename}' using UTF-8.")

# Verify by reading back with UTF-8
with open(filename, 'r', encoding='utf-8') as f:
content = f.read()
print(f"Read back: '{content}'") # Should match original

except Exception as e:
print(f"Error writing/reading file: {e}")

# For comparison, this would likely fail:
# with open('output_ascii.txt', 'w', encoding='ascii') as f:
# f.write(my_string) # Raises UnicodeEncodeError

Output:

Successfully wrote to 'output_utf8.txt' using UTF-8.
Read back: 'Writing non-ASCII: résumé, 你好'

For Other Libraries (e.g., smtplib)

Libraries that handle text internally might try to encode using a default (potentially ASCII). If they accept encoded bytes or allow specifying an encoding, provide UTF-8 encoded bytes. For smtplib.sendmail, the message content often needs to be encoded first.

import smtplib # Example context

message_subject = "Subject: Non-ASCII test éф"
message_body = "Body content with résumé and other symbols."
full_message = f"{message_subject}\n\n{message_body}"

# ✅ Encode the entire message content using UTF-8 before sending
encoded_msg_content = full_message.encode('utf-8')

# Example smtplib usage (requires server setup, actual sending omitted)
# server = smtplib.SMTP('smtp.example.com', 587)
# server.login("user", "password")
# server.sendmail('[email protected]', '[email protected]', encoded_msg_content)
# server.quit()
print("Message encoded as UTF-8 bytes, ready for sending.")
print(encoded_msg_content[:60], b"...") # Show first few bytes

Output:

Message encoded as UTF-8 bytes, ready for sending.
b'Subject: Non-ASCII test \xc3\xa9\xd1\x84\n\nBody content with r\xc3\xa9sum\xc3\xa9 and' b'...'

Solution 2: Handle Encoding Errors (errors parameter - Use Cautiously)

The .encode() method (and open() when writing) accepts an errors parameter to specify how to handle characters that cannot be encoded by the chosen codec. Using this with 'ascii' usually means losing or altering data and is generally not recommended.

Using errors='ignore' (Data Loss)

Silently discards characters that cannot be encoded in ASCII.

my_string = "Don’t use ASCII for this! (é, ф)"

# ⚠️ Encoding with ASCII and ignoring errors - DATA LOSS!
ascii_bytes_ignored = my_string.encode('ascii', errors='ignore')
print(f"Encoded ('ascii', ignore): {ascii_bytes_ignored}")
# Output: Encoded ('ascii', ignore): b'Dont use ASCII for this! (, )'
# Note: Non-ASCII chars are lost!

# Decode back to see the loss
print(f"Decoded back: '{ascii_bytes_ignored.decode('ascii')}'")
# Output: Decoded back: 'Dont use ASCII for this! (, )'

Output:

Encoded ('ascii', ignore): b'Dont use ASCII for this! (, )'
Decoded back: 'Dont use ASCII for this! (, )'

Using errors='replace' (Replacement Character)

Replaces unencodable characters with a placeholder (usually ?).

my_string = "Don’t use ASCII for this! (é, ф)"

# ⚠️ Encoding with ASCII and replacing errors - DATA ALTERED!
ascii_bytes_replaced = my_string.encode('ascii', errors='replace')
print(f"Encoded ('ascii', replace): {ascii_bytes_replaced}")
# Output: Encoded ('ascii', replace): b'Don?t use ASCII for this! (?, ?)'

print(f"Decoded back: '{ascii_bytes_replaced.decode('ascii')}'")
# Output: Decoded back: 'Don?t use ASCII for this! (?, ?)'

Output:

Encoded ('ascii', replace): b'Don?t use ASCII for this! (?, ?)'
Decoded back: 'Don?t use ASCII for this! (?, ?)'

Other Error Handlers (xmlcharrefreplace, backslashreplace)

These replace characters with XML character references (e.g., ’) or Python backslash escapes (e.g., \u2019). Useful in specific contexts but don't produce standard ASCII.

Again, using a capable encoding like UTF-8 (Solution 1) is almost always preferable to using error handlers with 'ascii'.

Solution 3: Check and Set Environment Variables (Advanced/System-Level)

In some rare cases, particularly on Linux/macOS or within specific deployment environments, Python's default encoding behavior might be influenced by system locale settings. While less common for encoding errors (more common for decoding), incorrect environment variables could potentially play a role if defaults are being relied upon unexpectedly.

  • PYTHONIOENCODING: Setting this environment variable forces Python's standard streams (stdin, stdout, stderr) to use a specific encoding. Setting it to utf-8 can sometimes help in environments with misconfigured locales.
    # Linux/macOS
    export PYTHONIOENCODING=utf-8
    # Windows (Command Prompt)
    set PYTHONIOENCODING=utf-8
    # Windows (PowerShell)
    $env:PYTHONIOENCODING = 'utf-8'
    Set this before running your Python script.
  • LANG / LC_ALL (Linux/macOS): These system locale variables influence default encodings. Ensure LANG is set to a UTF-8 locale (e.g., en_US.UTF-8). LC_ALL overrides LANG; ensure it's either unset or also set to a UTF-8 locale.
    # Check current settings
    echo $LANG
    echo $LC_ALL

    # Set LANG (example)
    export LANG='en_US.UTF-8'
    # Unset LC_ALL (if problematic)
    unset LC_ALL
    You might need to generate the locale (sudo locale-gen en_US.UTF-8) or install language packs (sudo apt-get install language-pack-en) on some Linux systems.

Modifying environment variables is usually a system-level fix rather than an application-level one. Prefer specifying encoding explicitly in your code (utf-8).

Debugging the Error

  1. Identify the Operation: Is the error from str.encode() or from open()/write()/another function implicitly encoding?
  2. Check the String: Examine the string content. Does it contain any characters beyond A-Z, a-z, 0-9, and basic punctuation/symbols? If yes, it likely requires an encoding like UTF-8, not ASCII. Use print(repr(my_string)) to see non-printable characters more clearly.
  3. Verify Encoding Argument: Check the encoding='...' argument being used. Is it explicitly 'ascii'? If omitted, what is Python's default in that context? (Assume UTF-8 is needed).

Conclusion

The UnicodeEncodeError: 'ascii' codec can't encode character... occurs when trying to convert a string containing non-ASCII characters into bytes using the limited 'ascii' encoding.

  • The standard and recommended solution is to use a Unicode-capable encoding, primarily utf-8, when encoding strings to bytes (my_string.encode('utf-8')) or writing text files (open(..., encoding='utf-8')).

  • Avoid using errors='ignore' or errors='replace' with the 'ascii' codec unless data loss or alteration is acceptable.

  • Modifying system environment variables is a more advanced solution usually unnecessary if encodings are handled correctly within the Python code itself.

  • Always prefer utf-8 for encoding modern text data.