Python Pandas: How to Read CSVs with Multiple or Mixed Delimiters

CSV files are a common data exchange format, but they don't always adhere to a strict single-delimiter convention. You might encounter files where data fields are separated by commas in some rows, semicolons in others, or even a mix of different characters. Standard parsing methods often fail with such files.

This guide will thoroughly explore how to master pandas.read_csv() using regular expressions for its sep parameter, empowering you to reliably load these tricky, multi-delimiter CSV files into DataFrames. We'll also touch upon a pure Python alternative for scenarios where Pandas might not be an option.

The Challenge: Inconsistent Delimiters in CSV Files

Ideally, a CSV (Comma Separated Values) file uses a single, consistent character (like a comma) to separate data fields. However, real-world data can be messy. You might receive files where:

Different rows use different delimiters.
A single row uses a mix of delimiters.
Delimiters are not standard commas (e.g., semicolons, tabs, pipes, or other symbols).

Attempting to read such files with pd.read_csv() using a single, fixed delimiter in the sep argument will lead to incorrect parsing, with data being clumped into single columns or misaligned.

Pandas Solution: `read_csv()` with Regex Delimiters

Pandas provides a robust solution by allowing the sep (or delimiter) argument of pd.read_csv() to accept a regular expression. This gives you the flexibility to define multiple possible delimiters.

Let's assume we have an employees.csv file with mixed delimiters:

employees.csv:

first_name,last_name,date
Alice;Smith;2025-01-05
Tom Nolan 2025-03-25
Carl@Lemon@2024-01-24

This file uses commas (,), semicolons (;), spaces ( ), and at-symbols (@) as delimiters.

The `sep` Argument and the OR Operator (`|`)

You can specify multiple delimiters by constructing a regular expression pattern where each delimiter is separated by the pipe | character, which acts as an OR operator.

import pandas as pd

# Define the path to your CSV file
file_path = 'employees.csv'

# Use a regex with '|' to specify multiple delimiters
df = pd.read_csv(
    file_path,
    sep=r',|;|@| ',  # Delimiters: comma, semicolon, at-symbol, or space
    encoding='utf-8',
    engine='python'  # Important: See next section
)

print("DataFrame read with multiple delimiters using '|':")
print(df)

Output:

DataFrame read with multiple delimiters using '|':
  first_name last_name        date
0      Alice     Smith  2025-01-05
1        Tom     Nolan  2025-03-25
2       Carl     Lemon  2024-01-24

sep=r',|;|@| ': The r'' denotes a raw string, which is good practice for regular expressions. This pattern tells Pandas to split on a comma, OR a semicolon, OR an @ symbol, OR a space.

Crucial: Setting `engine='python'`

When you use a regular expression for the sep argument (especially if it's more complex than a single character or \s+), you must specify engine='python'. The default 'c' engine is faster but does not support regex separators.

If you omit engine='python' with a regex separator, you'll likely see a ParserWarning: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators... you can avoid this warning by specifying engine='python'.

To avoid this warning and ensure correct behavior, always include engine='python':

df = pd.read_csv(
    file_path,
    sep=r',|;|@| ',
    encoding='utf-8',
    engine='python' # Explicitly use the Python parsing engine
)

Handling Spaces and Whitespace as Delimiters

Specific Space: As shown above, a literal space can be included in the OR pattern: sep=r',|;|@| '.
General Whitespace (\s+): If your file might use any kind of whitespace (spaces, tabs, etc.) as delimiters, and potentially multiple whitespace characters consecutively, \s+ is a more robust choice. \s matches any whitespace character, and + means "one or more occurrences."

import pandas as pd

# Assuming employees.csv might have tabs or multiple spaces as delimiters
df_whitespace = pd.read_csv(
    'employees.csv',    # Using the same CSV as before
    sep=r',|;|@|\s+',   # Delimiters: comma, semicolon, at-symbol, OR one or more whitespaces
    engine='python',
    encoding='utf-8'
)

print("DataFrame read with '\\s+' for whitespace:")
print(df_whitespace)

Output (should be same as before if only single spaces were used, but robust to tabs or multiple spaces):

DataFrame read with '\s+' for whitespace:
  first_name last_name        date
0      Alice     Smith  2025-01-05
1        Tom     Nolan  2025-03-25
2       Carl     Lemon  2024-01-24

Using Character Classes (`[]`) for Single-Character Delimiters

If all your potential delimiters are single characters, you can use a regex character class [...]. Any character inside the brackets will be treated as a possible delimiter.

import pandas as pd

file_path = 'employees.csv'

# Use a character class for single-character delimiters
df_char_class = pd.read_csv(
    file_path,
    sep=r'[ ,;@]',  # Delimiters: space, OR comma, OR semicolon, OR at-symbol
    engine='python',
    encoding='utf-8'
)

print("DataFrame read with character class delimiters:")
print(df_char_class)

Output:

DataFrame read with character class delimiters:
  first_name last_name        date
0      Alice     Smith  2025-01-05
1        Tom     Nolan  2025-03-25
2       Carl     Lemon  2024-01-24

note

This character class approach is only suitable if each delimiter is a single character. If you have multi-character delimiters (e.g., _DELIM_), you must use the OR operator (|) method.

Alternative: Pure Python Parsing with `re.split()`

If you're not using Pandas or need to process a CSV file line by line with multiple delimiters before forming a DataFrame, Python's built-in re module with re.split() is an effective solution.

import re

file_path = 'employees.csv'
parsed_data = []

with open(file_path, 'r', encoding='utf-8') as csvfile:
    for i, line in enumerate(csvfile):
        # Remove trailing newline character before splitting
        cleaned_line = line.strip()
        # Split using the OR operator for delimiters
        # values = re.split(r',|;|@| ', cleaned_line)
        # Or, using a character class for single-character delimiters
        values = re.split(r'[ ,;@]', cleaned_line)
        parsed_data.append(values)
        print(f"Line {i}: {values}")

Output:

Line 0: ['first_name', 'last_name', 'date']
Line 1: ['Alice', 'Smith', '2025-01-05']
Line 2: ['Tom', 'Nolan', '2025-03-25']
Line 3: ['Carl', 'Lemon', '2024-01-24']

This method gives you fine-grained control but requires manual handling to construct a DataFrame if that's your ultimate goal.

Choosing Your Approach

pandas.read_csv() with regex sep: This is generally the preferred and most convenient method if your end goal is a Pandas DataFrame. It's powerful and integrates directly into the Pandas ecosystem. Remember engine='python'.
re.split() (Pure Python): Use this if you don't have Pandas as a dependency, need to perform complex pre-processing on each line before DataFrame creation, or are working in an environment where memory is extremely constrained for large files (though Pandas also has chunking options for large files).

Conclusion

Dealing with CSV files that have multiple or inconsistent delimiters is a common data cleaning challenge. By leveraging the power of regular expressions within the sep argument of pandas.read_csv() (and ensuring engine='python'), you can robustly parse these files into well-structured DataFrames. Whether using the OR operator | for varied delimiters or character classes [] for sets of single-character delimiters, Pandas offers the flexibility needed for real-world data. For non-Pandas workflows, re.split() provides a solid pure Python alternative.

The Challenge: Inconsistent Delimiters in CSV Files​

Pandas Solution: read_csv() with Regex Delimiters​

The sep Argument and the OR Operator (|)​

Crucial: Setting engine='python'​

Handling Spaces and Whitespace as Delimiters​

Using Character Classes ([]) for Single-Character Delimiters​

Alternative: Pure Python Parsing with re.split()​

Choosing Your Approach​

Conclusion​

Table of Contents

The Challenge: Inconsistent Delimiters in CSV Files

Pandas Solution: `read_csv()` with Regex Delimiters

The `sep` Argument and the OR Operator (`|`)

Crucial: Setting `engine='python'`

Handling Spaces and Whitespace as Delimiters

Using Character Classes (`[]`) for Single-Character Delimiters

Alternative: Pure Python Parsing with `re.split()`

Choosing Your Approach

Conclusion