How to Manipulate Path Components in Python
Working with file paths often requires extracting specific parts of the path or removing prefixes and suffixes.
This guide explores various techniques in Python for manipulating path components, including getting the last part of a path, removing components from the front or end of paths, using both the pathlib
and os.path
modules.
Getting the Last Part of a Path
To extract the last part of a file path, representing either a filename or a folder name, you can use a variety of techniques.
Using pathlib.PurePath().name
The pathlib
module provides a convenient way to get the last component of a path using the name
attribute:
import pathlib
path = '/home/tomnolan/Desktop/last/'
last_part = pathlib.PurePath(path).name
print(last_part) # Output: last
- The
PurePath(path)
creates a path object. - The
.name
attribute returns the last component of the given path string. - This approach works regardless of the style of path used.
Using ntpath
for Windows-Style paths on Linux
If you have to process Windows-style paths in Linux environments, use the ntpath
module:
import ntpath
def get_last_path(path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
last_path = get_last_path('C:\\Users\\tomnolan\\Desktop\\example.txt')
print(last_path) # Output: example.txt
- The function uses
ntpath.split()
which splits the path into two strings: before the last separator, and after the last separator (the tail). - If the tail is empty, which means that the last character was
/
or\
, then the function usesntpath.basename
to return the last path component of the head, if any.
Using os.path.basename()
Alternatively, you can use the os.path.basename()
function to extract the last part of a path, after stripping any trailing slashes:
import os
path = '/home/tomnolan/Desktop/last/'
last_path = os.path.basename(os.path.normpath(path))
print(last_path) # Output: last
- The
os.path.normpath()
is used to remove the trailing slashes, which helps to get consistent behavior regardless of whether the path ends with/
or not. - The
os.path.basename()
method gets the last part of the path after processing the string withos.path.normpath()
which removes trailing slashes.
Removing the Last Path Component
To remove the last part of a path, you can use pathlib
or os.path
module.
Using pathlib.Path().parent
The pathlib.Path class provides a convenient way to remove the last component using the parent attribute, which returns a new Path
object:
from pathlib import Path
absolute_path = '/home/tomnolan/Desktop/python/main.py'
result = Path(absolute_path).parent
print(result) # Output: /home/tomnolan/Desktop/python
absolute_path = '/home/tomnolan/Desktop/'
result = Path(absolute_path).parent
print(result) # Output: /home/tomnolan
- The
Path()
function creates a path object. - The
parent
attribute will extract the string until the last path component.
Using os.path.dirname()
The os.path.dirname()
method can also remove the last path component, but it’s important to strip trailing slashes first using os.path.normpath()
:
import os
absolute_path = '/home/tomnolan/Desktop/python/main.py'
result = os.path.dirname(os.path.normpath(absolute_path))
print(result) # Output: /home/tomnolan/Desktop/python
absolute_path = '/home/tomnolan/Desktop/'
result = os.path.dirname(os.path.normpath(absolute_path))
print(result) # Output: /home/tomnolan
os.path.normpath()
strips trailing slashes from the path.- The
os.path.dirname()
returns the directory component (everything before the last/
or\
character), effectively removing the last component.
Removing a Path Prefix
To remove a path prefix, you can use the os.path.relpath()
method or pathlib.Path().relative_to()
method.
Using os.path.relpath()
The os.path.relpath()
method returns the relative path from the start
path to the path
.
import os
absolute_path = '/home/tomnolan/Desktop/python/main.py'
relative_path = '/home/tomnolan'
without_prefix = os.path.relpath(absolute_path, relative_path)
print(without_prefix) # Output: Desktop/python/main.py
- The path is returned relative to the second parameter which indicates the starting path of the relative path.
Using pathlib.PurePath().relative_to()
The pathlib
module also offers a way to remove a prefix.
from pathlib import PurePath
absolute_path = '/home/tomnolan/Desktop/python/main.py'
a_path = PurePath(absolute_path)
without_prefix = str(a_path.relative_to('/home/tomnolan'))
print(without_prefix) # Output: Desktop/python/main.py
- The
relative_to()
method calculates the relative path from thea_path
to/home/tomnolan
.
Removing the First Folder from a Path
To remove the first folder from a path, use the Path
class to get path components using parts
, then use relative_to
to remove the first 2 parts of the path.
Using Path().relative_to()
and Slicing
from pathlib import Path
absolute_path = '/home/tomnolan/Desktop/python/main.py'
a_path = Path(absolute_path)
result = a_path.relative_to(*a_path.parts[:2])
print(result) # Output: tomnolan/Desktop/python/main.py
- The
parts
attribute of a Path object returns the components of the path in a tuple. - We used slice notation to select a tuple of the first 2 components.
- The iterable unpacking operator
*
is used to pass the path components as separate arguments torelative_to()
function.
Using os.path.join()
and split()
An alternative way to remove the first path component is using str.split()
with os.path.join()
:
import os
absolute_path = '/home/tomnolan/Desktop/python/main.py'
result = os.path.join(*(absolute_path.split(os.path.sep)[2:]))
print(result) # Output: tomnolan/Desktop/python/main.py
absolute_path.split(os.path.sep)
splits the path into a list of components based on the system's path separator (such as/
or\
).- The
[2:]
then selects all elements from index2
(skipping the first folder). - The asterisk (
*
) unpacks the sliced list into theos.path.join()
method, which will join all of the path components using the path separator.