Skip to main content

How to Merge Text Files in Python

This guide explains how to combine multiple text files into a single output file in Python. We'll cover efficient methods using loops, shutil.copyfileobj(), fileinput, and glob for handling multiple files based on patterns. We'll also address important considerations like handling large files and ensuring proper line endings.

Merging Files with Loops (Basic Approach)

The fundamental approach involves opening an output file and then iterating through the input files, writing their contents to the output.

Handling Large Files (Line-by-Line)

For large files, it's crucial to process them line by line to avoid loading the entire file content into memory at once. This is the most memory-efficient approach.

file_paths = ['file-1.txt', 'file-2.txt']  # List of input file paths

with open('output-file.txt', 'w', encoding='utf-8') as output_file:
for file_path in file_paths:
with open(file_path, 'r', encoding='utf-8') as input_file:
for line in input_file: # Iterate line by line
output_file.write(line)
output_file.write('\n') # Add newline at the end of the last file.
  • with open('output-file.txt', 'w', encoding='utf-8') as output_file:: Opens the output file in write mode ('w'). The with statement ensures the file is automatically closed, even if errors occur. Always specify encoding='utf-8' (or your desired encoding) for correct text handling.
  • for file_path in file_paths:: Loops through each input file path.
  • with open(file_path, 'r', encoding='utf-8') as input_file:: Opens each input file in read mode.
  • for line in input_file:: Efficiently iterates over the lines of the input file without loading the entire file into memory.
  • output_file.write(line): Writes each line to the output file.
  • output_file.write('\n') outside the inner loop adds a final newline after each file's contents are written (optional, but good practice).

Handling Smaller Files (Reading Entire Contents)

If you're certain your files are small enough to fit comfortably in memory, you can simplify the code slightly by reading the entire file content at once:

file_paths = ['file-1.txt', 'file-2.txt']

with open('output-file.txt', 'w', encoding='utf-8') as output_file:
for file_path in file_paths:
with open(file_path, 'r', encoding='utf-8') as input_file:
output_file.write(input_file.read())
output_file.write('\n') # Add newline after each file

  • input_file.read() reads the entire content of the input file into a single string.

Adding Newlines Between Files

In both of the previous examples, the output_file.write('\n') line adds a newline character after each input file is processed. This ensures that the content of each input file starts on a new line in the output file. If you don't want this separation, remove that line.

Merging Files with shutil.copyfileobj() (Efficient for Large Files)

The shutil.copyfileobj() function provides a highly efficient way to copy the contents of one file-like object to another. This is particularly well-suited for merging large files:

import shutil

file_paths = ['file-1.txt', 'file-2.txt']

with open('output-file.txt', 'wb') as output_file: # Open in binary write mode
for file_path in file_paths:
with open(file_path, 'rb') as input_file: # Open in binary read mode
shutil.copyfileobj(input_file, output_file)
output_file.write(b'\n') # Add newline (as bytes)
  • with open('output-file.txt', 'wb') as output_file:: We open the output file in binary write mode ('wb'). copyfileobj() works with binary data.
  • shutil.copyfileobj(input_file, output_file): Efficiently copies the contents of input_file to output_file in chunks. This is memory-efficient.
  • output_file.write(b'\n'): Because we're in binary mode, we append a newline as bytes (b'\n').

Merging Files with fileinput (Concise Iteration)

The fileinput module simplifies iterating over lines from multiple input files:

import fileinput

file_paths = ['file-1.txt', 'file-2.txt']

with open('output-file.txt', 'w', encoding='utf-8') as output_file, \
fileinput.input(files=file_paths) as input_files:
for line in input_files:
output_file.write(line)
  • The fileinput.input() method merges all files into a single input, so the loop iterates through all the lines from all input files.
  • fileinput.input(files=file_paths): Creates an iterator that yields lines from all files in file_paths, one after another. This is very concise and handles opening/closing files automatically.

Merging Files Matching a Pattern with glob

The glob module allows you to find files matching a wildcard pattern. This is useful for merging all files of a certain type (e.g., all .txt files) in a directory:

import glob
import os #For os.linesep

file_paths = glob.glob('text-files/*.txt')
print(file_paths) # Output: (a list of paths that match the pattern)

with open('output-file.txt', 'w', encoding='utf-8') as output_file:
for file_path in file_paths:
with open(file_path, 'r', encoding='utf-8') as input_file:
output_file.write(input_file.read() + os.linesep)
  • file_paths = glob.glob('text-files/*.txt') will look for all files in the directory text-files that end with .txt and return the paths in a list.
  • glob.glob('text-files/*.txt'): Finds all files matching the pattern. *.txt means "any filename ending in .txt".
  • The rest of the code is similar to the previous examples, iterating through the found file paths.