How to Decode URL Parameters in Python
URL encoding (also known as percent-encoding) replaces unsafe ASCII characters with a %
followed by two hexadecimal digits.
This guide explains how to decode URL parameters in Python, effectively reversing this process. We'll focus on the urllib.parse.unquote()
and urllib.parse.unquote_plus()
functions, handle double-encoding, and briefly touch on using the requests
library.
Decoding URL Parameters with urllib.parse.unquote()
The urllib.parse.unquote()
function is the standard and recommended way to decode URL-encoded strings in Python:
from urllib.parse import unquote
url = 'https://tutorialreference.com/doc%3Fpage%3D1%26offset%3D10'
decoded_url = unquote(url)
print(decoded_url) # Output: https://tutorialreference.com/doc?page=1&offset=10
unquote(url)
replaces%xx
escapes with their single-character equivalent. For example,%3F
becomes?
,%3D
becomes=
, and%26
becomes&
.- The
unquote
method handles UTF-8 encoding.
Handling Plus Signs (+
) as Spaces with unquote_plus()
In HTML form encoding, spaces are often represented by plus signs (+
). urllib.parse.unquote()
does not automatically convert +
to space. For this, use urllib.parse.unquote_plus()
:
from urllib.parse import unquote_plus
url = 'https://tutorialreference.com/doc%3Fpage%3D1+%26+offset%3D10' # + instead of space
result = unquote_plus(url, encoding='utf-8')
print(result) # Output: https://tutorialreference.com/doc?page=1 & offset=10
unquote_plus()
behaves likeunquote()
, but also replaces plus signs with spaces. This is crucial for correctly decoding form data.
Decoding Double-Encoded Parameters
Sometimes, parameters might be encoded twice. In these cases, you need to call unquote()
(or unquote_plus()
) twice:
from urllib.parse import unquote
url = 'https://tutorialreference.com/doc%253Fpage%253D1%2526offset%253D10'
result = unquote(unquote(url)) # Call unquote() twice
print(result) # Output: https://tutorialreference.com/doc?page=1&offset=10
- Each call to
unquote()
decodes one level of encoding.
Using requests.utils.unquote()
(If you already have requests)
If you have installed requests
, you can use the requests.utils.unquote()
method.
import requests
url = 'https://tutorialreference.com/doc%3Fpage%3D1%26offset%3D10'
result = requests.utils.unquote(url)
print(result)
- The
requests.utils.unquote
decodes the string by replacing the%xx
with their corresponding character.