Python Pandas: How to Repeat DataFrame Rows N Times
Repeating rows in a Pandas DataFrame is a useful operation for various data manipulation and preparation tasks, such as data augmentation, upsampling, or creating datasets for specific analytical techniques. You might need to repeat each row a fixed number of times, or repeat rows based on values in another column.
This guide demonstrates several effective methods to repeat DataFrame rows in Pandas, primarily using DataFrame.index.repeat()
, numpy.repeat()
, and pd.concat()
.
Why Repeat DataFrame Rows?
- Upsampling/Oversampling: In machine learning, repeating rows of minority classes can help balance datasets.
- Data Generation: Creating larger datasets for testing or simulation by duplicating existing entries.
- Exploding Lists/Counts: If a column represents a count or a list of items, you might want to "explode" each row so that it's repeated for each item or count.
- Time Series Expansion: Repeating observations for different time periods.
Example DataFrame:
import pandas as pd
import numpy as np # For some examples
data = {
'Product ID': ['A101', 'B202', 'C303'],
'Category': ['Electronics', 'Books', 'Home Goods'],
'Price': [299.99, 19.95, 45.50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
Product ID Category Price
0 A101 Electronics 299.99
1 B202 Books 19.95
2 C303 Home Goods 45.50
Method 1: Using DataFrame.index.repeat()
and .loc
(Recommended)
This is often the most idiomatic and efficient Pandas way to repeat rows a fixed number of times.
Repeating Each Row a Fixed Number of Times
The df.index.repeat(N)
method creates a new Index where each original index label is repeated N
times. You then use this new Index with df.loc[]
to select and duplicate the rows.
import pandas as pd
df_original = pd.DataFrame({
'Product ID': ['A101', 'B202', 'C303'],
'Category': ['Electronics', 'Books', 'Home Goods'],
'Price': [299.99, 19.95, 45.50]
})
# Number of times to repeat each row
repetitions = 3
# Create a new index with repeated original indices
repeated_index = df_original.index.repeat(repetitions)
print(f"Repeated Index:\n{repeated_index}\n")
# ✅ Use .loc to select rows based on the repeated index
df_repeated_rows = df_original.loc[repeated_index]
print(f"DataFrame with each row repeated {repetitions} times:")
print(df_repeated_rows)
Output:
Repeated Index:
Index([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype='int64')
DataFrame with each row repeated 3 times:
Product ID Category Price
0 A101 Electronics 299.99
0 A101 Electronics 299.99
0 A101 Electronics 299.99
1 B202 Books 19.95
1 B202 Books 19.95
1 B202 Books 19.95
2 C303 Home Goods 45.50
2 C303 Home Goods 45.50
2 C303 Home Goods 45.50
Notice that the index in the resulting DataFrame also contains repeated values.
Resetting the Index After Repetition
If you want a new, unique, sequential index (0, 1, 2, ...) for the resulting DataFrame, use reset_index(drop=True)
.
import pandas as pd
df_original = pd.DataFrame({
'Product ID': ['A101', 'B202', 'C303'],
'Category': ['Electronics', 'Books', 'Home Goods'],
'Price': [299.99, 19.95, 45.50]
})
repetitions = 2
# ✅ Repeat rows and then reset the index
df_repeated_reset_index = df_original.loc[df_original.index.repeat(repetitions)].reset_index(drop=True)
print(f"DataFrame with rows repeated {repetitions} times and index reset:")
print(df_repeated_reset_index)
Output:
DataFrame with rows repeated 2 times and index reset:
Product ID Category Price
0 A101 Electronics 299.99
1 A101 Electronics 299.99
2 B202 Books 19.95
3 B202 Books 19.95
4 C303 Home Goods 45.50
5 C303 Home Goods 45.50
drop=True
prevents the old (repeated) index from being added as a new column.
Method 2: Repeating Rows Based on Values in Another Column
You can pass a Series or list-like object (of the same length as the DataFrame's index) to df.index.repeat()
where each value specifies how many times the corresponding row should be repeated.
import pandas as pd
data_with_counts = {
'Item': ['Apple', 'Banana', 'Cherry'],
'Category': ['Fruit', 'Fruit', 'Fruit'],
'RepeatCount': [1, 3, 2] # Repeat Apple 1x, Banana 3x, Cherry 2x
}
df_counts = pd.DataFrame(data_with_counts)
print("Original DataFrame for conditional repeat:")
print(df_counts)
print()
# ✅ Repeat rows based on the 'RepeatCount' column
df_conditional_repeat = df_counts.loc[df_counts.index.repeat(df_counts['RepeatCount'])].reset_index(drop=True)
print("DataFrame with rows repeated based on 'RepeatCount':")
print(df_conditional_repeat)
Output:
Original DataFrame for conditional repeat:
Item Category RepeatCount
0 Apple Fruit 1
1 Banana Fruit 3
2 Cherry Fruit 2
DataFrame with rows repeated based on 'RepeatCount':
Item Category RepeatCount
0 Apple Fruit 1
1 Banana Fruit 3
2 Banana Fruit 3
3 Banana Fruit 3
4 Cherry Fruit 2
5 Cherry Fruit 2
This is very powerful for "exploding" rows based on a count column.
Method 3: Using numpy.repeat()
NumPy's repeat()
function can operate on the DataFrame's underlying NumPy array values.
Basic Usage
np.repeat(df.values, N, axis=0)
repeats the rows. You then need to reconstruct a DataFrame.
import pandas as pd
import numpy as np
df_original = pd.DataFrame({
'Product ID': ['A101', 'B202', 'C303'],
'Category': ['Electronics', 'Books', 'Home Goods'],
'Price': [299.99, 19.95, 45.50]
})
repetitions = 2
# Get the DataFrame values as a NumPy array
df_values = df_original.values
# Repeat the rows (axis=0)
repeated_values = np.repeat(df_values, repetitions, axis=0)
# ✅ Create a new DataFrame from the repeated NumPy array
df_repeated_numpy = pd.DataFrame(repeated_values)
print(f"DataFrame repeated using np.repeat (raw, no columns yet):")
print(df_repeated_numpy)
Output:
DataFrame repeated using np.repeat (raw, no columns yet):
0 1 2
0 A101 Electronics 299.99
1 A101 Electronics 299.99
2 B202 Books 19.95
3 B202 Books 19.95
4 C303 Home Goods 45.5
5 C303 Home Goods 45.5
Reassigning Column Names
The new DataFrame created from the NumPy array won't have the original column names. You need to assign them.
import pandas as pd
import numpy as np
df_original = pd.DataFrame({
'Product ID': ['A101', 'B202', 'C303'],
'Category': ['Electronics', 'Books', 'Home Goods'],
'Price': [299.99, 19.95, 45.50]
})
repetitions = 2
# Get the DataFrame values as a NumPy array
df_values = df_original.values
# Repeat the rows (axis=0)
repeated_values = np.repeat(df_values, repetitions, axis=0)
# Create a new DataFrame from the repeated NumPy array
df_repeated_numpy = pd.DataFrame(repeated_values)
# ✅ Assign original column names
df_repeated_numpy.columns = df_original.columns
print(f"DataFrame from np.repeat with original column names:")
print(df_repeated_numpy)
Output:
DataFrame from np.repeat with original column names:
Product ID Category Price
0 A101 Electronics 299.99
1 A101 Electronics 299.99
2 B202 Books 19.95
3 B202 Books 19.95
4 C303 Home Goods 45.5
5 C303 Home Goods 45.5
This method doesn't naturally preserve the original index labels if they were non-numeric.
Combining np.repeat()
with .loc
You can also use np.repeat(df.index, N)
to generate the repeated index, similar to df.index.repeat()
.
import pandas as pd
import numpy as np
df_original = pd.DataFrame({
'Product ID': ['A101', 'B202', 'C303'],
'Category': ['Electronics', 'Books', 'Home Goods'],
'Price': [299.99, 19.95, 45.50]
})
repetitions = 2
# Generate repeated index using np.repeat
repeated_idx_np = np.repeat(df_original.index, repetitions)
# Use .loc with the NumPy-generated repeated index
df_np_loc_repeat = df_original.loc[repeated_idx_np].reset_index(drop=True)
print(f"DataFrame repeated using np.repeat(df.index, N) and .loc:")
print(df_np_loc_repeat)
Output: (Same as df.index.repeat()
method)
DataFrame repeated using np.repeat(df.index, N) and .loc:
Product ID Category Price
0 A101 Electronics 299.99
1 A101 Electronics 299.99
2 B202 Books 19.95
3 B202 Books 19.95
4 C303 Home Goods 45.50
5 C303 Home Goods 45.50
This is very similar to Method 1 and often just as efficient.
Method 4: Using pd.concat()
You can concatenate a list of the same DataFrame repeated N
times. This requires an extra step to sort by index if you want the repetitions to be grouped by original row.
import pandas as pd
df_original = pd.DataFrame({
'Product ID': ['A101', 'B202', 'C303'],
'Category': ['Electronics', 'Books', 'Home Goods'],
'Price': [299.99, 19.95, 45.50]
})
repetitions = 2
# Create a list of N copies of the DataFrame
list_of_df_copies = [df_original] * repetitions
print(f"List of DataFrame copies:\n{list_of_df_copies}\n")
# Concatenate them
df_concatenated = pd.concat(list_of_df_copies)
print(f"Concatenated DataFrame (unsorted):")
print(df_concatenated)
print()
# ✅ Sort by index to group repetitions, then reset index
df_repeated_concat = df_concatenated.sort_index().reset_index(drop=True)
print(f"DataFrame repeated {repetitions} times using pd.concat(), sorted, and index reset:")
print(df_repeated_concat)
Output:
List of DataFrame copies:
[ Product ID Category Price
0 A101 Electronics 299.99
1 B202 Books 19.95
2 C303 Home Goods 45.50, Product ID Category Price
0 A101 Electronics 299.99
1 B202 Books 19.95
2 C303 Home Goods 45.50]
Concatenated DataFrame (unsorted):
Product ID Category Price
0 A101 Electronics 299.99
1 B202 Books 19.95
2 C303 Home Goods 45.50
0 A101 Electronics 299.99
1 B202 Books 19.95
2 C303 Home Goods 45.50
DataFrame repeated 2 times using pd.concat(), sorted, and index reset:
Product ID Category Price
0 A101 Electronics 299.99
1 A101 Electronics 299.99
2 B202 Books 19.95
3 B202 Books 19.95
4 C303 Home Goods 45.50
5 C303 Home Goods 45.50
This method is generally less direct and potentially less efficient for simple row repetition compared to df.index.repeat()
.
Choosing the Right Method
df.loc[df.index.repeat(N)]
(ordf.loc[np.repeat(df.index, N)]
): Generally recommended for its clarity, efficiency, and idiomatic Pandas style for repeating all rows a fixed number of times or based on a count Series. It also preserves the DataFrame structure and data types well.np.repeat(df.values, N, axis=0)
followed bypd.DataFrame(...)
: Can be efficient, especially if you're already working with NumPy arrays. Requires manual reassignment of column names.pd.concat([df] * N)
: Works, but is more verbose and usually less efficient for this specific task as it involves an explicit sort.
Conclusion
Repeating rows in a Pandas DataFrame can be achieved in several ways:
- The most common and often most efficient method is using
df.loc[df.index.repeat(repetitions)]
. This can also take a Series of repetition counts for variable repetitions per row. Remember to.reset_index(drop=True)
if you need a clean sequential index. - Using
numpy.repeat
ondf.values
ordf.index
provides an alternative, especially if already in a NumPy-heavy workflow. pd.concat
can be used but is generally less direct for this specific task.
Choose the method that best combines readability, efficiency, and fits your specific requirements for how rows should be repeated.