Python Pandas: How to Select DataFrame Rows Based on a List of Indices
Selecting specific rows from a Pandas DataFrame based on their numerical positions (indices) is a fundamental operation in data manipulation. You might have a list of row indices that you want to extract from a larger DataFrame for further analysis or processing.
This guide demonstrates several effective methods to select DataFrame rows using a list of integer indices, including iloc
, index.isin()
, loc
, take
, and query()
.
The Task: Selecting Rows by Positional Indices
Given a Pandas DataFrame and a list of integers representing the desired row positions (e.g., [0, 2, 5]
), we want to create a new DataFrame containing only the rows at these specified positions. It's important to distinguish this from selecting by label-based indices, which might not be integers or might not be sequential. Here, we focus on zero-based integer positions.
Example DataFrame:
import pandas as pd
data = {
'product_id': ['P101', 'P102', 'P103', 'P104', 'P105'],
'category': ['Electronics', 'Books', 'Home', 'Electronics', 'Apparel'],
'price': [299.99, 19.95, 45.50, 799.00, 32.00],
'in_stock': [True, True, False, True, True]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
product_id category price in_stock
0 P101 Electronics 299.99 True
1 P102 Books 19.95 True
2 P103 Home 45.50 False
3 P104 Electronics 799.00 True
4 P105 Apparel 32.00 True
Method 1: Using DataFrame.iloc
(Position-Based Indexer - Recommended)
The DataFrame.iloc
indexer is designed specifically for integer-position based selection. It's the most direct and idiomatic way to select rows (and/or columns) by their integer positions.
import pandas as pd
df = pd.DataFrame({
'product_id': ['P101', 'P102', 'P103', 'P104', 'P105'],
'category': ['Electronics', 'Books', 'Home', 'Electronics', 'Apparel'],
'price': [299.99, 19.95, 45.50, 799.00, 32.00],
'in_stock': [True, True, False, True, True]
})
list_of_indices_to_select = [0, 3, 4] # Select first, fourth, and fifth rows
# ✅ Select rows using .iloc with the list of indices
selected_rows_iloc = df.iloc[list_of_indices_to_select]
print("Selected rows using .iloc:")
print(selected_rows_iloc)
Output:
Selected rows using .iloc:
product_id category price in_stock
0 P101 Electronics 299.99 True
3 P104 Electronics 799.00 True
4 P105 Apparel 32.00 True
df.iloc[list_of_indices_to_select]
: This directly selects the rows at the integer positions specified in the list.df.iloc[list_of_indices_to_select, :]
: The colon:
indicates that all columns should be selected for the chosen rows. This is often implicit if only row indices are provided.iloc
raises anIndexError
if any index in the list is out of bounds.
To select all columns for the specified rows, you can also use:
selected_rows_iloc_all_cols = df.iloc[list_of_indices_to_select, :]
print("Selected rows using .iloc (all columns specified):")
print(selected_rows_iloc_all_cols)
Method 2: Using DataFrame.index.isin()
This method checks if the DataFrame's index labels are present in your list of indices. This is particularly useful when your DataFrame's index is the default integer index (0, 1, 2, ...), but it can also work with other index types if your list_of_indices
contains labels from that index.
import pandas as pd
df = pd.DataFrame({
'product_id': ['P101', 'P102', 'P103', 'P104', 'P105'],
'category': ['Electronics', 'Books', 'Home', 'Electronics', 'Apparel'],
'price': [299.99, 19.95, 45.50, 799.00, 32.00],
'in_stock': [True, True, False, True, True]
})
list_of_indices_to_select = [0, 3, 4]
# Create a boolean Series: True where index is in the list
boolean_mask = df.index.isin(list_of_indices_to_select)
print(f"Boolean mask from .isin():\n{boolean_mask}")
# ✅ Use the boolean mask to select rows
selected_rows_isin = df[boolean_mask]
print("Selected rows using .index.isin():")
print(selected_rows_isin)
Output:
Boolean mask from .isin():
[ True False False True True]
Selected rows using .index.isin():
product_id category price in_stock
0 P101 Electronics 299.99 True
3 P104 Electronics 799.00 True
4 P105 Apparel 32.00 True
df.index.isin(list_of_indices_to_select)
: Returns a boolean Series indicating for each row whether its index is inlist_of_indices_to_select
.df[boolean_mask]
: Standard boolean indexing to select rows where the mask isTrue
.
Method 3: Using DataFrame.loc
with Index Slicing
While df.loc
is primarily for label-based indexing, if your DataFrame's index consists of the integer labels you wish to select (e.g., a default RangeIndex
), you can use it. However, to select by a list of positions using loc
when the index might be something else (e.g., strings), you first get the actual index labels corresponding to those positions.
import pandas as pd
df = pd.DataFrame({
'product_id': ['P101', 'P102', 'P103', 'P104', 'P105'],
'category': ['Electronics', 'Books', 'Home', 'Electronics', 'Apparel'],
'price': [299.99, 19.95, 45.50, 799.00, 32.00],
'in_stock': [True, True, False, True, True]
})
list_of_positions = [0, 3, 4]
# Get the actual index LABELS at these integer positions
index_labels_at_positions = df.index[list_of_positions]
print(f"Index labels at positions {list_of_positions}: {index_labels_at_positions}")
# ✅ Use these labels with .loc
selected_rows_loc = df.loc[index_labels_at_positions]
print("Selected rows using .loc with derived index labels:")
print(selected_rows_loc)
Output:
Index labels at positions [0, 3, 4]: Int64Index([0, 3, 4], dtype='int64')
Selected rows using .loc with derived index labels:
product_id category price in_stock
0 P101 Electronics 299.99 True
3 P104 Electronics 799.00 True
4 P105 Apparel 32.00 True
This is slightly more indirect than iloc
for purely positional selection but is useful if you have positions and need to use .loc
for some reason (e.g., when also selecting columns by label). If the DataFrame index is already just [0, 1, 2, ...]
, then df.loc[list_of_indices]
works directly.
Method 4: Using DataFrame.take()
The DataFrame.take()
method is specifically designed to select elements along an axis using their integer positions.
import pandas as pd
df = pd.DataFrame({
'product_id': ['P101', 'P102', 'P103', 'P104', 'P105'],
'category': ['Electronics', 'Books', 'Home', 'Electronics', 'Apparel'],
'price': [299.99, 19.95, 45.50, 799.00, 32.00],
'in_stock': [True, True, False, True, True]
})
list_of_indices_to_select = [0, 3, 4]
# ✅ Select rows using .take()
# Default axis is 0 (rows)
selected_rows_take = df.take(list_of_indices_to_select)
print("Selected rows using .take():")
print(selected_rows_take)
Output:
Selected rows using .take():
product_id category price in_stock
0 P101 Electronics 299.99 True
3 P104 Electronics 799.00 True
4 P105 Apparel 32.00 True
df.take(list_of_indices_to_select)
: Returns a new DataFrame with rows at the specified integer positions.df.take(indices, axis=1)
would select columns by position.
Method 5: Using DataFrame.query()
with index
If your DataFrame's index consists of the integers you want to select by (e.g., a default RangeIndex
), you can use DataFrame.query()
by referring to the special index
field.
import pandas as pd
df = pd.DataFrame({
'product_id': ['P101', 'P102', 'P103', 'P104', 'P105'],
'category': ['Electronics', 'Books', 'Home', 'Electronics', 'Apparel'],
'price': [299.99, 19.95, 45.50, 799.00, 32.00],
'in_stock': [True, True, False, True, True]
})
list_of_indices_to_select = [0, 3, 4] # These are also the index labels in this case
# ✅ Query based on the index values
# The '@' prefix allows referencing a local variable in the query string
query_string = 'index in @list_of_indices_to_select'
selected_rows_query = df.query(query_string)
print("Selected rows using .query() on index:")
print(selected_rows_query)
Output:
Selected rows using .query() on index:
product_id category price in_stock
0 P101 Electronics 299.99 True
3 P104 Electronics 799.00 True
4 P105 Apparel 32.00 True
index in @list_of_indices_to_select
: The query string.index
refers to the DataFrame's index.@list_of_indices_to_select
allows the query to access the Python list variable.
Choosing the Right Method
df.iloc[list_of_indices]
: Most recommended for purely integer-position based row selection. It's explicit, idiomatic, and designed for this purpose.df.take(list_of_indices)
: Also very good for positional selection and clear in its intent.df[df.index.isin(list_of_indices)]
: Useful, especially if you are already working with boolean masks or if yourlist_of_indices
contains labels that match a non-integer index. For default integer indices, it works well.df.loc[df.index[list_of_positions]]
: More verbose for simple positional selection. Primarily used when you need label-based selection and derive labels from positions.df.query('index in @list_of_indices')
: A good option if you prefer the query string syntax and your list contains the actual index labels you want to match (which would be integer positions if you have a default index).
Conclusion
Pandas offers several flexible ways to select DataFrame rows based on a list of integer indices.
- For direct positional selection,
DataFrame.iloc
is generally the most direct and preferred method. DataFrame.take()
is another excellent choice specifically designed for positional selection.DataFrame.index.isin()
provides a boolean masking approach that works well with default integer indices.DataFrame.query()
can be used if your list of indices matches the actual labels of your DataFrame's index.
Choose the method that best fits the clarity of your code and the nature of your DataFrame's index. iloc
is often the go-to for straightforward selection by integer position.