Skip to main content

How to Solve http.client.InvalidURL: "URL can't contain control characters" in Python

The http.client.InvalidURL: URL can't contain control characters error (often seen with libraries like urllib or http.client) occurs when a URL string contains characters that are not allowed in URLs, such as spaces, newlines, or other control characters.

This guide explains the causes of this error and presents the correct methods for encoding URLs to avoid it.

Understanding the Error: URL Encoding

URLs have a specific, restricted set of allowed characters. Spaces, newlines, and many other characters are not allowed directly within a URL. To include these characters, they must be URL-encoded (also called "percent-encoded"). Encoding replaces unsafe characters with a % followed by two hexadecimal digits representing the character's ASCII value.

For example:

  • Space becomes %20
  • Newline (\n) becomes %0A
  • Question mark (?) becomes %3F (if it's within the path; it's allowed as a separator between the path and query)
  • Double quotes " become %22
  • And so on.

The http.client.InvalidURL error occurs when the URL parser encounters an unencoded, disallowed character.

Common Causes and Solutions

Spaces in the URL

The most common cause is a space in the URL:

import urllib.request

# ⛔️ http.client.InvalidURL: URL can't contain control characters. '/ab cd' (found at least ' ')
# url = 'http://www.python.org/ab cd'
# with urllib.request.urlopen(url) as f:
# print(f.read(300))

Solution: Replace spaces with %20 (or, in some cases, +, but only in the query string). The best way to do this is with urllib.parse.quote(), which we'll cover in detail below. Don't just use replace() directly, as that won't handle other invalid characters.

Newline Characters and Other Control Characters

Newline characters (\n, \r), tabs (\t), and other control characters are also invalid in URLs. These often appear accidentally when you construct URLs from multi-line strings:

import urllib.request

# ⛔️ http.client.InvalidURL - contains newlines and spaces
# url = """
# http://www.python.org/ab cd
# """
# with urllib.request.urlopen(url) as f:
# print(f.read(300))

# ✅ Correct: Single line, no spaces. But still needs encoding!
url = "http://www.python.org/ab cd"

Solution:

  • Ensure your URLs are single-line strings. Avoid using triple-quoted strings (""") for URLs unless you explicitly remove newlines.
  • Use urllib.parse.quote() (as described below) to properly encode all unsafe characters, not just spaces.

Encoding URLs with urllib.parse.quote() and urlparse()

The correct and robust way to handle potentially invalid characters in a URL is to use the urllib.parse module. Specifically, use urlparse() to break the URL into components and quote() to encode the path (and query, if necessary).

import urllib.request
from urllib.parse import urlparse, quote

url = 'http://www.python.org/ab cd ef'

parsed_url = urlparse(url) # Break the URL into components

# Construct the URL, quoting the path
url = parsed_url.scheme + '://' + parsed_url.netloc + quote(parsed_url.path)
# If there is also a query
if parsed_url.query:
url += '?' + quote(parsed_url.query) # Handle query parameters

print(url) # Output: http://www.python.org/ab%20cd%20ef

# Now it's safe to use:
with urllib.request.urlopen(url) as f:
print(f.read(300)) # Output (will vary): b'<!doctype html>...'

  • urlparse(url): Breaks the URL into its components (scheme, netloc, path, params, query, fragment). This is essential because you need to encode the path and query parts separately. You don't want to encode the :// or the ? that separates the path and query.
  • quote(parsed_url.path): This is the key part. quote() percent-encodes only the path portion of the URL, replacing spaces and other unsafe characters with their %xx equivalents.
  • Reassembling the URL: We carefully reconstruct the URL, using the encoded path (and query, if present).
  • The if statement is for handling cases where the url contains a query.

Conclusion

The http.client.InvalidURL error, specifically the URL can't contain control characters variant, is almost always due to spaces, newlines, or other invalid characters within the URL string.

Never try to fix this by simply using replace() to remove spaces. That's a brittle and incomplete solution.

The correct approach is to use urllib.parse.urlparse() to break down the URL and urllib.parse.quote() to properly encode the path and query components. This ensures your URLs are valid and your requests work correctly.