Python Pandas: Select First N or Last N Columns of DataFrame
Selecting a specific number of columns from the beginning or end of a Pandas DataFrame is a common requirement when you want to focus on a subset of your data, prepare data for a specific function, or simply for display purposes. Pandas provides powerful integer-location based indexing with iloc
and other convenient methods to achieve this column selection.
This guide explains how to select the first N, last N, or exclude the last N columns of a Pandas DataFrame.
The Goal: Subsetting Columns by Position
Given a Pandas DataFrame, we want to create a new DataFrame that contains:
- Only the first
N
columns from the original. - Only the last
N
columns from the original. - All columns except for the last
N
columns.
This selection is based on the position of the columns, not their names.
Example DataFrame
import pandas as pd
data = {
'Col_A': [1, 2, 3, 4, 5],
'Col_B': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
'Col_C': [10.1, 20.2, 30.3, 40.4, 50.5],
'Col_D': [True, False, True, False, True],
'Col_E': ['X', 'Y', 'Z', 'X', 'Y']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
Col_A Col_B Col_C Col_D Col_E
0 1 Mon 10.1 True X
1 2 Tue 20.2 False Y
2 3 Wed 30.3 True Z
3 4 Thu 40.4 False X
4 5 Fri 50.5 True Y
Select the FIRST N Columns
Using DataFrame.iloc[:, :N]
(Recommended)
The DataFrame.iloc
indexer allows selection by integer position. The syntax df.iloc[row_slicer, column_slicer]
is used.
:
forrow_slicer
selects all rows.:N
forcolumn_slicer
selects columns from the beginning (position 0) up to (but not including) positionN
.
import pandas as pd
df_example = pd.DataFrame({
'Col_A': [1], 'Col_B': ['Mon'], 'Col_C': [10.1], 'Col_D': [True], 'Col_E': ['X']
})
# N = number of first columns to select
n_first = 3
# ✅ Select all rows (:) and the first N columns (:n_first)
df_first_n_cols = df_example.iloc[:, :n_first]
print(f"Selecting the first {n_first} columns using .iloc:")
print(df_first_n_cols)
Output:
Selecting the first 3 columns using .iloc:
Col_A Col_B Col_C
0 1 Mon 10.1
Creating a Reusable Function
For convenience if you do this often:
import pandas as pd
def select_first_n_columns(dataframe, n):
"""Selects the first N columns of a DataFrame."""
if n <= 0:
return pd.DataFrame(index=dataframe.index) # Return empty DataFrame with same index
if n > len(dataframe.columns):
return dataframe.copy() # Or raise error/warning if n is too large
return dataframe.iloc[:, :n]
df_example = pd.DataFrame({
'Col_A': [1,2], 'Col_B': ['Mon','Tue'], 'Col_C': [10.1,20.2], 'Col_D': [True,False], 'Col_E': ['X','Y']
})
df_first_2 = select_first_n_columns(df_example, 2)
print("Using reusable function for first 2 columns:")
print(df_first_2)
print()
df_first_1 = select_first_n_columns(df_example, 1)
print("Using reusable function for first 1 column:")
print(df_first_1)
Output:
Using reusable function for first 2 columns:
Col_A Col_B
0 1 Mon
1 2 Tue
Using reusable function for first 1 column:
Col_A
0 1
1 2
Select the LAST N Columns
Using DataFrame.iloc[:, -N:]
(Recommended)
Negative indexing in slicing means "count from the end."
:
forrow_slicer
selects all rows.-N:
forcolumn_slicer
selects columns from the Nth-to-last position up to the end.
import pandas as pd
df_example = pd.DataFrame({
'Col_A': [1], 'Col_B': ['Mon'], 'Col_C': [10.1], 'Col_D': [True], 'Col_E': ['X']
})
# N = number of last columns to select
n_last = 3
# ✅ Select all rows (:) and the last N columns (-n_last:)
df_last_n_cols = df_example.iloc[:, -n_last:]
print(f"Selecting the last {n_last} columns using .iloc:")
print(df_last_n_cols)
Output:
Selecting the last 3 columns using .iloc:
Col_C Col_D Col_E
0 10.1 True X
Using DataFrame.columns
Slicing
You can slice the df.columns
Index object to get the names of the last N columns, and then use these names to select from the DataFrame.
import pandas as pd
df_example = pd.DataFrame({
'Col_A': [1], 'Col_B': ['Mon'], 'Col_C': [10.1], 'Col_D': [True], 'Col_E': ['X']
})
n_last = 2
# Get the names of the last N columns
last_n_column_names = df_example.columns[-n_last:]
print(f"Names of last {n_last} columns: {last_n_column_names.tolist()}\n")
# ✅ Select columns by these names
df_last_n_cols_by_name = df_example[last_n_column_names]
print(f"Selecting the last {n_last} columns using df.columns slicing:")
print(df_last_n_cols_by_name)
Output:
Names of last 2 columns: ['Col_D', 'Col_E']
Selecting the last 2 columns using df.columns slicing:
Col_D Col_E
0 True X
While this works, df.iloc[:, -N:]
is generally more direct for positional selection.
EXCLUDE the Last N Columns (Select All BUT Last N)
To select all columns except for the last N
ones, use a negative stop index in the column slicer.
:
forrow_slicer
selects all rows.:-N
forcolumn_slicer
selects columns from the beginning up to (but not including) the Nth-to-last column.
import pandas as pd
df_example = pd.DataFrame({
'Col_A': [1], 'Col_B': ['Mon'], 'Col_C': [10.1], 'Col_D': [True], 'Col_E': ['X']
})
# N = number of last columns to EXCLUDE
n_exclude_last = 2
# ✅ Select all rows (:) and all columns UP TO the last N (:-n_exclude_last)
df_exclude_last_n = df_example.iloc[:, :-n_exclude_last]
print(f"Excluding the last {n_exclude_last} columns using .iloc:")
print(df_exclude_last_n)
Output:
Excluding the last 2 columns using .iloc:
Col_A Col_B Col_C
0 1 Mon 10.1
Conclusion
Pandas DataFrame.iloc
provides a powerful and concise way to select columns based on their integer positions:
- To select the FIRST N columns: Use
df.iloc[:, :N]
. - To select the LAST N columns: Use
df.iloc[:, -N:]
. - To EXCLUDE the last N columns (select all columns except the last N): Use
df.iloc[:, :-N]
.
Remember that iloc
is purely integer-location based, so it selects columns by their order (0th, 1st, 2nd, ... or -1st, -2nd from the end), regardless of their names. Using df.columns
slicing to get names first and then selecting is an alternative but usually less direct for positional tasks.