How to Merge Text Files in Python
This guide explains how to combine multiple text files into a single output file in Python. We'll cover efficient methods using loops, shutil.copyfileobj()
, fileinput
, and glob
for handling multiple files based on patterns. We'll also address important considerations like handling large files and ensuring proper line endings.
Merging Files with Loops (Basic Approach)
The fundamental approach involves opening an output file and then iterating through the input files, writing their contents to the output.
Handling Large Files (Line-by-Line)
For large files, it's crucial to process them line by line to avoid loading the entire file content into memory at once. This is the most memory-efficient approach.
file_paths = ['file-1.txt', 'file-2.txt'] # List of input file paths
with open('output-file.txt', 'w', encoding='utf-8') as output_file:
for file_path in file_paths:
with open(file_path, 'r', encoding='utf-8') as input_file:
for line in input_file: # Iterate line by line
output_file.write(line)
output_file.write('\n') # Add newline at the end of the last file.
with open('output-file.txt', 'w', encoding='utf-8') as output_file:
: Opens the output file in write mode ('w'
). Thewith
statement ensures the file is automatically closed, even if errors occur. Always specifyencoding='utf-8'
(or your desired encoding) for correct text handling.for file_path in file_paths:
: Loops through each input file path.with open(file_path, 'r', encoding='utf-8') as input_file:
: Opens each input file in read mode.for line in input_file:
: Efficiently iterates over the lines of the input file without loading the entire file into memory.output_file.write(line)
: Writes each line to the output file.output_file.write('\n')
outside the inner loop adds a final newline after each file's contents are written (optional, but good practice).
Handling Smaller Files (Reading Entire Contents)
If you're certain your files are small enough to fit comfortably in memory, you can simplify the code slightly by reading the entire file content at once:
file_paths = ['file-1.txt', 'file-2.txt']
with open('output-file.txt', 'w', encoding='utf-8') as output_file:
for file_path in file_paths:
with open(file_path, 'r', encoding='utf-8') as input_file:
output_file.write(input_file.read())
output_file.write('\n') # Add newline after each file
input_file.read()
reads the entire content of the input file into a single string.
Adding Newlines Between Files
In both of the previous examples, the output_file.write('\n')
line adds a newline character after each input file is processed. This ensures that the content of each input file starts on a new line in the output file. If you don't want this separation, remove that line.
Merging Files with shutil.copyfileobj()
(Efficient for Large Files)
The shutil.copyfileobj()
function provides a highly efficient way to copy the contents of one file-like object to another. This is particularly well-suited for merging large files:
import shutil
file_paths = ['file-1.txt', 'file-2.txt']
with open('output-file.txt', 'wb') as output_file: # Open in binary write mode
for file_path in file_paths:
with open(file_path, 'rb') as input_file: # Open in binary read mode
shutil.copyfileobj(input_file, output_file)
output_file.write(b'\n') # Add newline (as bytes)
with open('output-file.txt', 'wb') as output_file:
: We open the output file in binary write mode ('wb'
).copyfileobj()
works with binary data.shutil.copyfileobj(input_file, output_file)
: Efficiently copies the contents ofinput_file
tooutput_file
in chunks. This is memory-efficient.output_file.write(b'\n')
: Because we're in binary mode, we append a newline as bytes (b'\n'
).
Merging Files with fileinput
(Concise Iteration)
The fileinput
module simplifies iterating over lines from multiple input files:
import fileinput
file_paths = ['file-1.txt', 'file-2.txt']
with open('output-file.txt', 'w', encoding='utf-8') as output_file, \
fileinput.input(files=file_paths) as input_files:
for line in input_files:
output_file.write(line)
- The
fileinput.input()
method merges all files into a single input, so the loop iterates through all the lines from all input files. fileinput.input(files=file_paths)
: Creates an iterator that yields lines from all files infile_paths
, one after another. This is very concise and handles opening/closing files automatically.
Merging Files Matching a Pattern with glob
The glob
module allows you to find files matching a wildcard pattern. This is useful for merging all files of a certain type (e.g., all .txt
files) in a directory:
import glob
import os #For os.linesep
file_paths = glob.glob('text-files/*.txt')
print(file_paths) # Output: (a list of paths that match the pattern)
with open('output-file.txt', 'w', encoding='utf-8') as output_file:
for file_path in file_paths:
with open(file_path, 'r', encoding='utf-8') as input_file:
output_file.write(input_file.read() + os.linesep)
file_paths = glob.glob('text-files/*.txt')
will look for all files in the directorytext-files
that end with.txt
and return the paths in a list.glob.glob('text-files/*.txt')
: Finds all files matching the pattern.*.txt
means "any filename ending in.txt
".- The rest of the code is similar to the previous examples, iterating through the found file paths.