Skip to main content

Python Pandas: How to Select DataFrame Columns Starting With a Specific String

When working with Pandas DataFrames, especially those with many columns or dynamically generated names, you often need to select a subset of columns based on a naming pattern. A common requirement is to select all columns whose names begin with a particular prefix or string.

This guide explains several effective methods to select Pandas DataFrame columns that start with a given string, using list comprehensions, Index.str.startswith(), and DataFrame.filter(regex=...).

The Goal: Selecting Columns by Prefix

Given a Pandas DataFrame, we want to create a new DataFrame (or a view) that contains only those columns whose names (headers) begin with a specified string pattern. For example, selecting all columns that start with "Sales_" or "User_".

Example DataFrame

import pandas as pd

data = {
'CustomerID': [101, 102, 103, 104],
'CustomerName': ['Alice WidgetCorp', 'Bob Solutions Ltd', 'Charlie Innovations', 'Diana Services Inc'],
'Region': ['North', 'South', 'North', 'West'],
'Sales_Q1': [1500, 2200, 1800, 2500],
'Sales_Q2': [1700, 2000, 1900, 2800],
'Support_Tickets': [2, 1, 3, 1],
'sales_rep': ['John', 'Jane', 'Mike', 'Anna'] # Lowercase 'sales'
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
CustomerID CustomerName Region Sales_Q1 Sales_Q2 \
0 101 Alice WidgetCorp North 1500 1700
1 102 Bob Solutions Ltd South 2200 2000
2 103 Charlie Innovations North 1800 1900
3 104 Diana Services Inc West 2500 2800

Support_Tickets sales_rep
0 2 John
1 1 Jane
2 3 Mike
3 1 Anna

Let's say we want to select all columns starting with "Customer" or "Sales".

Method 1: Using List Comprehension with str.startswith() (Pythonic & Clear)

This approach iterates through the column names and uses the built-in string method startswith().

Case-Sensitive Selection

import pandas as pd

df_example = pd.DataFrame({
'CustomerID': [101], 'CustomerName': ['Alice'], 'Region': ['North'],
'Sales_Q1': [1500], 'Sales_Q2': [1700], 'Support_Tickets': [2]
})

prefix_to_match = 'Sales'

# ✅ Create a list of column names that start with the prefix
selected_column_names_lc = [
col_name for col_name in df_example.columns
if col_name.startswith(prefix_to_match)
]
print(f"Column names starting with '{prefix_to_match}': {selected_column_names_lc}\n")

# Use this list to select the columns from the DataFrame
df_selected_lc = df_example[selected_column_names_lc]
print(f"DataFrame with columns starting with '{prefix_to_match}':")
print(df_selected_lc)

Output:

Column names starting with 'Sales': ['Sales_Q1', 'Sales_Q2']

DataFrame with columns starting with 'Sales':
Sales_Q1 Sales_Q2
0 1500 1700

Case-Insensitive Selection

To perform a case-insensitive match, convert both the column name and the prefix to a consistent case (e.g., lowercase) before comparison.

import pandas as pd

df_example = pd.DataFrame({
'Sales_Q1': [1500], 'sales_rep': ['John'], 'CustomerID': [101]
})

prefix_lower = 'sales' # Search for 'sales' case-insensitively

selected_column_names_lc_ci = [
col_name for col_name in df_example.columns
if col_name.lower().startswith(prefix_lower)
]
print(f"Column names starting with '{prefix_lower}' (case-insensitive): {selected_column_names_lc_ci}\n")

df_selected_lc_ci = df_example[selected_column_names_lc_ci]
print(f"DataFrame with columns starting with '{prefix_lower}' (case-insensitive):")
print(df_selected_lc_ci)

Output:

Column names starting with 'sales' (case-insensitive): ['Sales_Q1', 'sales_rep']

DataFrame with columns starting with 'sales' (case-insensitive):
Sales_Q1 sales_rep
0 1500 John

Method 2: Using df.columns.str.startswith() with df.loc or Direct Indexing (Vectorized)

Pandas Index objects (like df.columns) have a .str accessor that provides vectorized string methods.

Creating a Boolean Mask

df.columns.str.startswith(prefix) returns a boolean array indicating which column names start with the prefix.

import pandas as pd

df_example = pd.DataFrame({
'CustomerID': [101], 'CustomerName': ['Alice'], 'Sales_Q1': [1500]
})

prefix_to_match = 'Customer'

# ✅ Create a boolean mask for column names
column_mask = df_example.columns.str.startswith(prefix_to_match)
print(f"Boolean mask for columns starting with '{prefix_to_match}':")
print(column_mask)

Output:

Boolean mask for columns starting with 'Customer':
[ True True False]

Use df.loc[:, column_mask] to select all rows (:) and the columns where the mask is True.

import pandas as pd

df_example = pd.DataFrame({
'CustomerID': [101], 'CustomerName': ['Alice'], 'Sales_Q1': [1500]
})
prefix_to_match = 'Customer'
column_mask = df_example.columns.str.startswith(prefix_to_match)

# ✅ Select columns using .loc and the boolean mask
df_selected_loc = df_example.loc[:, column_mask]

print(f"DataFrame with columns starting with '{prefix_to_match}' (using .loc):")
print(df_selected_loc)

Output:

DataFrame with columns starting with 'Customer' (using .loc):
CustomerID CustomerName
0 101 Alice

Applying the Mask with Direct Bracket Indexing

You can also use the boolean mask directly with df[df.columns[column_mask]].

import pandas as pd

df_selected_direct = df_example[df_example.columns[column_mask]]
print(f"DataFrame with columns starting with '{prefix_to_match}' (direct indexing):")
print(df_selected_direct) # Same output as .loc

Output:

DataFrame with columns starting with 'Customer' (using .loc):
CustomerID CustomerName
0 101 Alice

While df.loc[:, column_mask] is generally preferred for clarity and explicit indexing, df[df.columns[column_mask]] also works because df.columns[column_mask] returns an Index of the selected column names.

note

For case-insensitive matching with str.startswith() on an Index, you'd need to convert the index to a Series or list first to use .lower(), or use flags=re.IGNORECASE if using str.contains() with a regex pattern (see Method 3).

Method 3: Using DataFrame.filter(regex=...) (Powerful for Patterns)

The DataFrame.filter() method can subset rows or columns. Using its regex parameter with axis=1 allows powerful pattern matching on column names. To match columns starting with a string, use the caret ^ in the regex to anchor the pattern to the beginning of the name.

import pandas as pd

df_example = pd.DataFrame({
'CustomerID': [101], 'CustomerName': ['Alice'], 'Sales_Q1': [1500], 'sales_rep': ['John']
})

prefix_to_match_exact_case = 'Sales'
# Regex: r'^Sales' matches columns that begin with "Sales" (case-sensitive)
df_selected_filter_cs = df_example.filter(regex=r'^' + prefix_to_match_exact_case, axis=1)
print(f"Columns starting with '{prefix_to_match_exact_case}' (filter, case-sensitive):")
print(df_selected_filter_cs)
print()

# For case-insensitive matching with regex, use the (?i) flag in the pattern
prefix_any_case = 'sales'
df_selected_filter_ci = df_example.filter(regex=r'(?i)^' + prefix_any_case, axis=1)
# Or, you can use re.IGNORECASE if constructing pattern carefully:
# import re
# df_selected_filter_ci_re = df.filter(regex=re.compile(r'^' + prefix_any_case, flags=re.IGNORECASE), axis=1)

print(f"Columns starting with '{prefix_any_case}' (filter, case-insensitive):")
print(df_selected_filter_ci)

Output:

Columns starting with 'Sales' (filter, case-sensitive):
Sales_Q1
0 1500

Columns starting with 'sales' (filter, case-insensitive):
Sales_Q1 sales_rep
0 1500 John
  • axis=1: Specifies that the filter should be applied to column names.
  • regex=r'^Prefix': The ^ asserts the start of the string.
  • regex=r'(?i)^Prefix': The (?i) flag within the regex pattern makes the match case-insensitive.

Choosing the Right Method

  • List comprehension with col.startswith(prefix): Very Pythonic, clear, and good for simple prefix matching. Easy to adapt for case-insensitivity.
  • df.columns.str.startswith(prefix) then .loc: Vectorized and efficient. Good for boolean mask operations. Case-insensitivity is a bit more involved directly on the Index's str.startswith.
  • df.filter(regex=r'^Prefix', axis=1): Most powerful if you need more complex starting patterns beyond a simple prefix (e.g., "starts with 'Sales_' followed by a digit"). Handles case-insensitivity well with regex flags.

For simple prefix matching, the list comprehension or df.columns.str.startswith() are often preferred for readability. For more complex patterns, filter(regex=...) excels.

Conclusion

Selecting Pandas DataFrame columns that start with a specific string can be achieved in several ways:

  1. Use a list comprehension iterating over df.columns with col_name.startswith(prefix).
  2. Create a boolean mask using df.columns.str.startswith(prefix) and apply it with df.loc[:, mask].
  3. Employ df.filter(regex=r'^' + prefix, axis=1) for a concise and powerful regex-based solution.

Remember to handle case sensitivity as needed by converting strings to a consistent case or using appropriate flags/parameters (case=False for str.contains if used, or re.IGNORECASE/(?i) for regex).

These methods provide robust control over selecting columns based on their naming patterns.