Skip to main content

How to Join Base URLs with Other URLs in Python

Constructing complete URLs by combining base URLs and path components is a common task in web development and data processing.

This guide explores how to use the urllib.parse.urljoin() and posixpath.join() methods in Python for this purpose, covering best practices and common pitfalls.

Joining Base URLs with urljoin()

The urllib.parse.urljoin() method constructs a full (absolute) URL by combining a base URL with another URL (relative or absolute).

from urllib.parse import urljoin

base_url = 'https://tutorialreference.com'
path = 'images/static/cat.jpg'
result = urljoin(base_url, path)
print(result) # Output: https://tutorialreference.com/images/static/cat.jpg
  • The method uses the base URL to form the base of the url and the path will be added on to it.

If you have multiple path components, use the posixpath.join() method to combine them before passing them to urljoin():

import posixpath
from urllib.parse import urljoin

base_url = 'https://tutorialreference.com'

path_1 = 'images'
path_2 = 'static'
path_3 = 'cat.jpg'

path = posixpath.join(path_1, path_2, path_3)
print(path) # Output: images/static/cat.jpg

result = urljoin(base_url, path)
print(result) # Output: https://tutorialreference.com/images/static/cat.jpg
  • The posixpath.join() combines the path components into a single path, which can then be used with urljoin().

Joining URL Path Components with urljoin()

The urljoin() method can also join URL path components without a base URL. This is useful when constructing URL paths programmatically.

from urllib.parse import urljoin

print(urljoin('/global/images/', 'static/dog.png')) # Output: /global/images/static/dog.png
  • The urljoin() method concatenates the two paths, using the first argument as the base path.

Understanding Relative Paths in urljoin()

Note that urljoin() can behave unexpectedly when working with relative paths that don't end with a forward slash. Consider the following example:

from urllib.parse import urljoin
print(urljoin('/global/images', 'static/dog.png')) # Output: /global/static/dog.png
  • urljoin removes /images from the first path and replaces it with second path.

To avoid these issues, make sure that the base part of your URL ends with a forward slash:

from urllib.parse import urljoin
print(urljoin('/global/images/', 'static/dog.png')) # Output: /global/images/static/dog.png

Also note that the method behaves in a special way if the second path starts with a /

from urllib.parse import urljoin
print(urljoin('/global/images', '/static/dog.png')) # Output: /static/dog.png
  • When second component starts with forward slash, it is assumed to start from the root, which means that the first path will be discarded.

Joining URL Path Components with posixpath.join()

The posixpath.join() method is more predictable and is a reliable alternative for joining URL path components:

import posixpath

print(posixpath.join('/global/images', 'static/dog.png')) # Output: /global/images/static/dog.png
print(posixpath.join('/global/images/', 'static/dog.png')) # Output: /global/images/static/dog.png
print(posixpath.join('/global/images', '/static/dog.png')) # Output: /static/dog.png
print(posixpath.join('/global', 'images', 'static', 'dog.png')) # Output: /global/images/static/dog.png
  • posixpath.join() concatenates path components using the / character, without removing parts of the path from the first argument.