Python Pandas: Read Specific Columns from Excel File (`read_excel`, `usecols`)

When working with Excel files in Pandas using pd.read_excel(), you often don't need to load all columns from the spreadsheet, especially if the file is large or contains many irrelevant fields. Pandas provides the versatile usecols parameter to specify exactly which columns you want to read, allowing you to select columns by their letter identifiers (A, B, C...), integer positions (0, 1, 2...), or their actual names from the header row.

This guide explains how to effectively use the usecols parameter to read specific columns from an Excel file into a Pandas DataFrame.

Why Read Specific Columns?

Memory Efficiency: Loading only necessary columns reduces the memory footprint of your DataFrame, crucial for large Excel files.
Performance: Reading fewer columns can speed up the data loading process.
Focus: Simplifies the DataFrame by including only relevant data for your analysis.
Data Cleaning: Avoids loading columns that might have problematic data types or require extensive cleaning if they are not needed.

Prerequisites: Installing `openpyxl` (or other engines)

To read .xlsx files (the modern Excel format), Pandas typically uses the openpyxl library as its engine. If you need to read older .xls files, xlrd is often used. Ensure you have the necessary engine installed:

pip install pandas openpyxl
# For .xls support, you might also need xlrd:
pip install xlrd

note

Remember to use pip3 or python -m pip if that's standard for your environment.

Example Excel File (`data.xlsx`)

Assume we have an Excel file named data.xlsx with the following content:

first_name (A)	last_name (B)	date (C)	age (D)	experience (E)	salary (F)
Alice	Smith	2023-01-15	29	2	55000
Tom	Nolan	2023-03-20	30	3	62000
Carl	Jones	2022-07-10	31	4	68000
Diana	Adams	2023-05-01	32	5	75000

(Column letters A-F are shown for reference).

Selecting Columns by Excel Letters or Integer Positions

The usecols parameter in pd.read_excel(filepath_or_buffer, usecols=...) can accept various formats for specifying columns.

Using a Comma-Separated String of Letters (e.g., `"A,C,E"`)

You can provide a string of Excel column letters, separated by commas.

import pandas as pd

file_path = 'data.xlsx' # Path to your Excel file

# ✅ Select columns A (first_name), C (date), and E (experience)
df_letters = pd.read_excel(file_path, usecols="A,C,E")

print("DataFrame with columns A, C, E (by letter):")
print(df_letters)

Output (column names are taken from the header row of the Excel file):

  first_name        date  experience
    Alice  2023-01-15           2
      Tom  2023-03-20           3
     Carl  2022-07-10           4
    Diana  2023-05-01           5

Using a List of Integer Positions (e.g., `[0, 2, 4]`)

Provide a list of 0-based integer column indices.

import pandas as pd
file_path = 'data.xlsx'

# ✅ Select columns at position 0 (first_name), 2 (date), and 4 (experience)
df_positions = pd.read_excel(file_path, usecols=[0, 2, 4])

print("DataFrame with columns at positions 0, 2, 4:")
print(df_positions)

Output:

  first_name        date  experience
    Alice  2023-01-15           2
      Tom  2023-03-20           3
     Carl  2022-07-10           4
    Diana  2023-05-01           5

Using Excel-Style Ranges (e.g., `"A:C"`, `"A,D:F"`)

You can specify ranges of columns using Excel's colon notation (e.g., "A:C" for columns A through C, inclusive). You can also combine ranges and individual columns.

import pandas as pd
file_path = 'data.xlsx'

# ✅ Select columns A through C (inclusive)
df_range_AC = pd.read_excel(file_path, usecols="A:C")
print("\nDataFrame with columns A through C (A:C):")
print(df_range_AC)
# Output:
#   first_name last_name        date
# 0      Alice     Smith  2023-01-15
# ...

# ✅ Select columns A through B, AND column E
df_range_ABE = pd.read_excel(file_path, usecols="A:B,E")
print("\nDataFrame with columns A:B and E:")
print(df_range_ABE)
# Output:
#   first_name last_name  experience
# 0      Alice     Smith           2
# ...

# ✅ Select multiple ranges: A through B, AND D through E
df_multi_range = pd.read_excel(file_path, usecols="A:B,D:E")
print("\nDataFrame with columns A:B and D:E:")
print(df_multi_range)
# Output:
#   first_name last_name  age  experience
# 0      Alice     Smith   29           2
# ...

note

Ranges like "A:C" are inclusive of both start and end columns.

Using Python's `range()` for Positional Slices

You can pass a Python range() object to usecols to select a sequence of columns by position. Remember that range(start, stop) is exclusive of stop.

import pandas as pd
file_path = 'data.xlsx'

# ✅ Select columns at position 0 up to (but not including) position 3
# This means columns 0, 1, 2 (A, B, C)
df_py_range = pd.read_excel(file_path, usecols=range(0, 3))
# Or simply range(3) if starting from 0

print("DataFrame with columns using range(0, 3):")
print(df_py_range)

Output:

DataFrame with columns using range(0, 3):
  first_name last_name        date
0      Alice     Smith  2023-01-15
...

Selecting Columns by Their Names (Header Row)

If your Excel file has a header row, you can pass a list of column names (strings) that you want to read. This is often the most robust method as it doesn't depend on column letters or positions, which might change.

import pandas as pd
file_path = 'data.xlsx'

# ✅ Select columns by their names from the header row
columns_to_read_by_name = ['first_name', 'age', 'salary']
df_names = pd.read_excel(file_path, usecols=columns_to_read_by_name)

print("DataFrame with specific columns selected by name:")
print(df_names)

Output:

DataFrame with specific columns selected by name:
  first_name  age  salary
0      Alice   29   55000
1        Tom   30   62000
2       Carl   31   68000
3      Diana   32   75000

The names in the usecols list must exactly match the column names in the Excel file's header row (case-sensitive).
The header row itself is determined by the header parameter of read_excel (defaults to 0, the first row).

Important Notes on `usecols` Behavior

Default: If usecols is None (the default), all columns are parsed.
Callable: usecols can also be a function that takes a column name string and returns True if the column should be parsed, False otherwise. This allows for more dynamic column selection logic.
Error Handling: If you specify column names in usecols that do not exist in the Excel file, Pandas might raise a ValueError (e.g., "Usecols do not match columns"). Ensure your specified names/letters/indices are correct.

Conclusion

The usecols parameter in pandas.read_excel() provides powerful flexibility for reading only specific columns from an Excel file, which is essential for efficient data loading and focused analysis. You can select columns:

By Excel letter strings: "A,C", "B:D,F"
By 0-based integer positions: [0, 2], range(3)
By column names from the header row: ['Column Name 1', 'Another Column'] (Recommended for robustness if headers exist)

Choose the method that best suits the structure of your Excel file and your selection criteria. Using column names is generally the most reliable if your Excel file has a consistent header row.

Why Read Specific Columns?​

Prerequisites: Installing openpyxl (or other engines)​

Example Excel File (data.xlsx)​

Selecting Columns by Excel Letters or Integer Positions​

Using a Comma-Separated String of Letters (e.g., "A,C,E")​

Using a List of Integer Positions (e.g., [0, 2, 4])​

Using Excel-Style Ranges (e.g., "A:C", "A,D:F")​

Using Python's range() for Positional Slices​

Selecting Columns by Their Names (Header Row)​

Important Notes on usecols Behavior​

Conclusion​

Table of Contents

Why Read Specific Columns?

Prerequisites: Installing `openpyxl` (or other engines)

Example Excel File (`data.xlsx`)

Selecting Columns by Excel Letters or Integer Positions

Using a Comma-Separated String of Letters (e.g., `"A,C,E"`)

Using a List of Integer Positions (e.g., `[0, 2, 4]`)

Using Excel-Style Ranges (e.g., `"A:C"`, `"A,D:F"`)

Using Python's `range()` for Positional Slices

Selecting Columns by Their Names (Header Row)

Important Notes on `usecols` Behavior

Conclusion