Python Pandas: How to Repeat DataFrame Rows N Times

Repeating rows in a Pandas DataFrame is a useful operation for various data manipulation and preparation tasks, such as data augmentation, upsampling, or creating datasets for specific analytical techniques. You might need to repeat each row a fixed number of times, or repeat rows based on values in another column.

This guide demonstrates several effective methods to repeat DataFrame rows in Pandas, primarily using DataFrame.index.repeat(), numpy.repeat(), and pd.concat().

Why Repeat DataFrame Rows?

Upsampling/Oversampling: In machine learning, repeating rows of minority classes can help balance datasets.
Data Generation: Creating larger datasets for testing or simulation by duplicating existing entries.
Exploding Lists/Counts: If a column represents a count or a list of items, you might want to "explode" each row so that it's repeated for each item or count.
Time Series Expansion: Repeating observations for different time periods.

Example DataFrame:

import pandas as pd
import numpy as np # For some examples

data = {
    'Product ID': ['A101', 'B202', 'C303'],
    'Category': ['Electronics', 'Books', 'Home Goods'],
    'Price': [299.99, 19.95, 45.50]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
  Product ID     Category   Price
0       A101  Electronics  299.99
1       B202        Books   19.95
2       C303   Home Goods   45.50

Method 1: Using `DataFrame.index.repeat()` and `.loc` (Recommended)

This is often the most idiomatic and efficient Pandas way to repeat rows a fixed number of times.

Repeating Each Row a Fixed Number of Times

The df.index.repeat(N) method creates a new Index where each original index label is repeated N times. You then use this new Index with df.loc[] to select and duplicate the rows.

import pandas as pd

df_original = pd.DataFrame({
    'Product ID': ['A101', 'B202', 'C303'],
    'Category': ['Electronics', 'Books', 'Home Goods'],
    'Price': [299.99, 19.95, 45.50]
})

# Number of times to repeat each row
repetitions = 3

# Create a new index with repeated original indices
repeated_index = df_original.index.repeat(repetitions)
print(f"Repeated Index:\n{repeated_index}\n")

# ✅ Use .loc to select rows based on the repeated index
df_repeated_rows = df_original.loc[repeated_index]

print(f"DataFrame with each row repeated {repetitions} times:")
print(df_repeated_rows)

Output:

Repeated Index:
Index([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype='int64')

DataFrame with each row repeated 3 times:
  Product ID     Category   Price
0       A101  Electronics  299.99
0       A101  Electronics  299.99
0       A101  Electronics  299.99
1       B202        Books   19.95
1       B202        Books   19.95
1       B202        Books   19.95
2       C303   Home Goods   45.50
2       C303   Home Goods   45.50
2       C303   Home Goods   45.50

note

Notice that the index in the resulting DataFrame also contains repeated values.

Resetting the Index After Repetition

If you want a new, unique, sequential index (0, 1, 2, ...) for the resulting DataFrame, use reset_index(drop=True).

import pandas as pd

df_original = pd.DataFrame({
    'Product ID': ['A101', 'B202', 'C303'],
    'Category': ['Electronics', 'Books', 'Home Goods'],
    'Price': [299.99, 19.95, 45.50]
})
repetitions = 2

# ✅ Repeat rows and then reset the index
df_repeated_reset_index = df_original.loc[df_original.index.repeat(repetitions)].reset_index(drop=True)

print(f"DataFrame with rows repeated {repetitions} times and index reset:")
print(df_repeated_reset_index)

Output:

DataFrame with rows repeated 2 times and index reset:
  Product ID     Category   Price
     A101  Electronics  299.99
     A101  Electronics  299.99
     B202        Books   19.95
     B202        Books   19.95
     C303   Home Goods   45.50
     C303   Home Goods   45.50

drop=True prevents the old (repeated) index from being added as a new column.

Method 2: Repeating Rows Based on Values in Another Column

You can pass a Series or list-like object (of the same length as the DataFrame's index) to df.index.repeat() where each value specifies how many times the corresponding row should be repeated.

import pandas as pd

data_with_counts = {
    'Item': ['Apple', 'Banana', 'Cherry'],
    'Category': ['Fruit', 'Fruit', 'Fruit'],
    'RepeatCount': [1, 3, 2] # Repeat Apple 1x, Banana 3x, Cherry 2x
}
df_counts = pd.DataFrame(data_with_counts)
print("Original DataFrame for conditional repeat:")
print(df_counts)
print()

# ✅ Repeat rows based on the 'RepeatCount' column
df_conditional_repeat = df_counts.loc[df_counts.index.repeat(df_counts['RepeatCount'])].reset_index(drop=True)

print("DataFrame with rows repeated based on 'RepeatCount':")
print(df_conditional_repeat)

Output:

Original DataFrame for conditional repeat:
     Item Category  RepeatCount
0   Apple    Fruit            1
1  Banana    Fruit            3
2  Cherry    Fruit            2

DataFrame with rows repeated based on 'RepeatCount':
     Item Category  RepeatCount
0   Apple    Fruit            1
1  Banana    Fruit            3
2  Banana    Fruit            3
3  Banana    Fruit            3
4  Cherry    Fruit            2
5  Cherry    Fruit            2

note

This is very powerful for "exploding" rows based on a count column.

Method 3: Using `numpy.repeat()`

NumPy's repeat() function can operate on the DataFrame's underlying NumPy array values.

Basic Usage

np.repeat(df.values, N, axis=0) repeats the rows. You then need to reconstruct a DataFrame.

import pandas as pd
import numpy as np

df_original = pd.DataFrame({
    'Product ID': ['A101', 'B202', 'C303'],
    'Category': ['Electronics', 'Books', 'Home Goods'],
    'Price': [299.99, 19.95, 45.50]
})
repetitions = 2

# Get the DataFrame values as a NumPy array
df_values = df_original.values

# Repeat the rows (axis=0)
repeated_values = np.repeat(df_values, repetitions, axis=0)

# ✅ Create a new DataFrame from the repeated NumPy array
df_repeated_numpy = pd.DataFrame(repeated_values)

print(f"DataFrame repeated using np.repeat (raw, no columns yet):")
print(df_repeated_numpy)

Output:

DataFrame repeated using np.repeat (raw, no columns yet):
          1       2
A101  Electronics  299.99
A101  Electronics  299.99
B202        Books   19.95
B202        Books   19.95
C303   Home Goods    45.5
C303   Home Goods    45.5

Reassigning Column Names

The new DataFrame created from the NumPy array won't have the original column names. You need to assign them.

import pandas as pd
import numpy as np

df_original = pd.DataFrame({
    'Product ID': ['A101', 'B202', 'C303'],
    'Category': ['Electronics', 'Books', 'Home Goods'],
    'Price': [299.99, 19.95, 45.50]
})
repetitions = 2

# Get the DataFrame values as a NumPy array
df_values = df_original.values

# Repeat the rows (axis=0)
repeated_values = np.repeat(df_values, repetitions, axis=0)

# Create a new DataFrame from the repeated NumPy array
df_repeated_numpy = pd.DataFrame(repeated_values)

# ✅ Assign original column names
df_repeated_numpy.columns = df_original.columns

print(f"DataFrame from np.repeat with original column names:")
print(df_repeated_numpy)

Output:

DataFrame from np.repeat with original column names:
  Product ID     Category   Price
     A101  Electronics  299.99
     A101  Electronics  299.99
     B202        Books   19.95
     B202        Books   19.95
     C303   Home Goods    45.5
     C303   Home Goods    45.5

note

This method doesn't naturally preserve the original index labels if they were non-numeric.

Combining `np.repeat()` with `.loc`

You can also use np.repeat(df.index, N) to generate the repeated index, similar to df.index.repeat().

import pandas as pd
import numpy as np

df_original = pd.DataFrame({
    'Product ID': ['A101', 'B202', 'C303'],
    'Category': ['Electronics', 'Books', 'Home Goods'],
    'Price': [299.99, 19.95, 45.50]
})
repetitions = 2

# Generate repeated index using np.repeat
repeated_idx_np = np.repeat(df_original.index, repetitions)

# Use .loc with the NumPy-generated repeated index
df_np_loc_repeat = df_original.loc[repeated_idx_np].reset_index(drop=True)

print(f"DataFrame repeated using np.repeat(df.index, N) and .loc:")
print(df_np_loc_repeat)

Output: (Same as df.index.repeat() method)

DataFrame repeated using np.repeat(df.index, N) and .loc:
  Product ID     Category   Price
     A101  Electronics  299.99
     A101  Electronics  299.99
     B202        Books   19.95
     B202        Books   19.95
     C303   Home Goods   45.50
     C303   Home Goods   45.50

This is very similar to Method 1 and often just as efficient.

Method 4: Using `pd.concat()`

You can concatenate a list of the same DataFrame repeated N times. This requires an extra step to sort by index if you want the repetitions to be grouped by original row.

import pandas as pd

df_original = pd.DataFrame({
    'Product ID': ['A101', 'B202', 'C303'],
    'Category': ['Electronics', 'Books', 'Home Goods'],
    'Price': [299.99, 19.95, 45.50]
})
repetitions = 2

# Create a list of N copies of the DataFrame
list_of_df_copies = [df_original] * repetitions
print(f"List of DataFrame copies:\n{list_of_df_copies}\n")

# Concatenate them
df_concatenated = pd.concat(list_of_df_copies)
print(f"Concatenated DataFrame (unsorted):")
print(df_concatenated)
print()

# ✅ Sort by index to group repetitions, then reset index
df_repeated_concat = df_concatenated.sort_index().reset_index(drop=True)

print(f"DataFrame repeated {repetitions} times using pd.concat(), sorted, and index reset:")
print(df_repeated_concat)

Output:

List of DataFrame copies:
[  Product ID     Category   Price
     A101  Electronics  299.99
     B202        Books   19.95
     C303   Home Goods   45.50,   Product ID     Category   Price
     A101  Electronics  299.99
     B202        Books   19.95
     C303   Home Goods   45.50]

Concatenated DataFrame (unsorted):
  Product ID     Category   Price
     A101  Electronics  299.99
     B202        Books   19.95
     C303   Home Goods   45.50
     A101  Electronics  299.99
     B202        Books   19.95
     C303   Home Goods   45.50

DataFrame repeated 2 times using pd.concat(), sorted, and index reset:
  Product ID     Category   Price
     A101  Electronics  299.99
     A101  Electronics  299.99
     B202        Books   19.95
     B202        Books   19.95
     C303   Home Goods   45.50
     C303   Home Goods   45.50

note

This method is generally less direct and potentially less efficient for simple row repetition compared to df.index.repeat().

Choosing the Right Method

df.loc[df.index.repeat(N)] (or df.loc[np.repeat(df.index, N)]): Generally recommended for its clarity, efficiency, and idiomatic Pandas style for repeating all rows a fixed number of times or based on a count Series. It also preserves the DataFrame structure and data types well.
np.repeat(df.values, N, axis=0) followed by pd.DataFrame(...): Can be efficient, especially if you're already working with NumPy arrays. Requires manual reassignment of column names.
pd.concat([df] * N): Works, but is more verbose and usually less efficient for this specific task as it involves an explicit sort.

Conclusion

Repeating rows in a Pandas DataFrame can be achieved in several ways:

The most common and often most efficient method is using df.loc[df.index.repeat(repetitions)]. This can also take a Series of repetition counts for variable repetitions per row. Remember to .reset_index(drop=True) if you need a clean sequential index.
Using numpy.repeat on df.values or df.index provides an alternative, especially if already in a NumPy-heavy workflow.
pd.concat can be used but is generally less direct for this specific task.

Choose the method that best combines readability, efficiency, and fits your specific requirements for how rows should be repeated.

Why Repeat DataFrame Rows?​

Method 1: Using DataFrame.index.repeat() and .loc (Recommended)​

Repeating Each Row a Fixed Number of Times​

Resetting the Index After Repetition​

Method 2: Repeating Rows Based on Values in Another Column​

Method 3: Using numpy.repeat()​

Basic Usage​

Reassigning Column Names​

Combining np.repeat() with .loc​

Method 4: Using pd.concat()​

Choosing the Right Method​

Conclusion​

Table of Contents

Why Repeat DataFrame Rows?

Method 1: Using `DataFrame.index.repeat()` and `.loc` (Recommended)

Repeating Each Row a Fixed Number of Times

Resetting the Index After Repetition

Method 2: Repeating Rows Based on Values in Another Column

Method 3: Using `numpy.repeat()`

Basic Usage

Reassigning Column Names

Combining `np.repeat()` with `.loc`

Method 4: Using `pd.concat()`

Choosing the Right Method

Conclusion