Python Pandas: Concatenate Strings from Multiple Rows after GroupBy

When working with Pandas DataFrames, a common data aggregation task after grouping is to concatenate string values from multiple rows within each group into a single string (or a list of strings). For example, you might want to compile a comma-separated list of all products purchased by each customer or all tasks assigned to an employee on a given date.

This guide explains how to effectively use groupby() in conjunction with methods like transform(), apply(), and agg() to concatenate strings from multiple rows.

The Goal: String Aggregation within Groups

Given a Pandas DataFrame, we want to:

Group the rows based on the values in one or more columns.
For each group, take the string values from another specified column.
Concatenate these string values into a single string (e.g., separated by commas, spaces, or another delimiter) or into a list of strings for each group.

Example DataFrame

import pandas as pd

data = {
    'OrderID': [101, 101, 102, 103, 103, 103, 104],
    'Customer': ['Alice', 'Alice', 'Bob', 'Charlie', 'Charlie', 'Charlie', 'Alice'],
    'Product': ['Apple', 'Banana', 'Orange', 'Apple', 'Milk', 'Bread', 'Orange'],
    'Category': ['Fruit', 'Fruit', 'Fruit', 'Fruit', 'Dairy', 'Bakery', 'Fruit']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
   OrderID Customer Product Category
    101    Alice   Apple    Fruit
    101    Alice  Banana    Fruit
    102      Bob  Orange    Fruit
    103  Charlie   Apple    Fruit
    103  Charlie    Milk    Dairy
    103  Charlie   Bread   Bakery
    104    Alice  Orange    Fruit

We might want to get a comma-separated list of all 'Product's for each 'OrderID' or 'Customer'.

Method 1: Using `groupby().transform(','.join)` (Adds Concatenated String to Original DataFrame Shape)

The transform() method is useful when you want to perform a group-wise calculation and then broadcast the result back to a Series or DataFrame that has the same index as your original DataFrame.

Grouping by a Single Column

Let's concatenate 'Product' names for each 'OrderID'.

import pandas as pd

df_example = pd.DataFrame({
    'OrderID': [101, 101, 102, 103, 103],
    'Product': ['Apple', 'Banana', 'Orange', 'Apple', 'Milk']
})

# Group by 'OrderID', then transform the 'Product' column by joining strings
# This creates a new Series with the same index as df_example
df_example['Products_In_Order'] = df_example.groupby('OrderID')['Product'].transform(lambda x: ','.join(x))

print("DataFrame with concatenated products per order (using transform):")
print(df_example)
print()

# To get unique rows after transform:
df_unique_orders = df_example[['OrderID', 'Products_In_Order']].drop_duplicates().reset_index(drop=True)
print("Unique orders with concatenated products:")
print(df_unique_orders)

Output:

DataFrame with concatenated products per order (using transform):
   OrderID Product Products_In_Order
0      101   Apple      Apple,Banana
1      101  Banana      Apple,Banana
2      102  Orange            Orange
3      103   Apple        Apple,Milk
4      103    Milk        Apple,Milk

Unique orders with concatenated products:
   OrderID Products_In_Order
0      101      Apple,Banana
1      102            Orange
2      103        Apple,Milk

df.groupby('OrderID')['Product']: Selects the 'Product' Series for each 'OrderID' group. .transform(lambda x: ','.join(x)): For each group's 'Product' Series (x), it joins all strings with a comma. The result is aligned back to the original DataFrame's index. drop_duplicates(): Necessary if you want one row per group in the final output after using transform this way.

Grouping by Multiple Columns

Concatenate 'Product' for each ('Customer', 'Category') combination.

import pandas as pd

data = {
    'OrderID': [101, 101, 102, 103, 103, 103, 104],
    'Customer': ['Alice', 'Alice', 'Bob', 'Charlie', 'Charlie', 'Charlie', 'Alice'],
    'Product': ['Apple', 'Banana', 'Orange', 'Apple', 'Milk', 'Bread', 'Orange'],
    'Category': ['Fruit', 'Fruit', 'Fruit', 'Fruit', 'Dairy', 'Bakery', 'Fruit']
}

df = pd.DataFrame(data)
df_multi_group = df.copy()

df_multi_group['Products_Per_Customer_Category'] = df_multi_group.groupby(
    ['Customer', 'Category']
)['Product'].transform(lambda x: ' | '.join(x)) # Using a different separator

print("DataFrame with concatenated products per customer & category (transform):")
print(df_multi_group)
print()

df_unique_cust_cat = df_multi_group[['Customer', 'Category', 'Products_Per_Customer_Category']].drop_duplicates().reset_index(drop=True)
print(df_unique_cust_cat)

Output:

DataFrame with concatenated products per customer & category (transform):
   OrderID Customer Product Category Products_Per_Customer_Category
    101    Alice   Apple    Fruit        Apple | Banana | Orange
    101    Alice  Banana    Fruit        Apple | Banana | Orange
    102      Bob  Orange    Fruit                         Orange
    103  Charlie   Apple    Fruit                          Apple
    103  Charlie    Milk    Dairy                           Milk
    103  Charlie   Bread   Bakery                          Bread
    104    Alice  Orange    Fruit        Apple | Banana | Orange

  Customer Category Products_Per_Customer_Category
  Alice    Fruit        Apple | Banana | Orange
    Bob    Fruit                         Orange
Charlie    Fruit                          Apple
Charlie    Dairy                           Milk
Charlie   Bakery                          Bread

Method 2: Using `groupby().apply(','.join)` (Creates an Aggregated Series/DataFrame)

The apply() method on a GroupBy object can also be used. When applied to a grouped Series (like df.groupby('ColA')['ColB']), it passes each group (as a Series) to the function.

Grouping by a Single Column

import pandas as pd

df_example = pd.DataFrame({
    'OrderID': [101, 101, 102, 103, 103],
    'Product': ['Apple', 'Banana', 'Orange', 'Apple', 'Milk']
})

# Group by 'OrderID' and apply ','.join to the 'Product' Series of each group
aggregated_products_apply = df_example.groupby('OrderID')['Product'].apply(lambda x: ','.join(x))
# This returns a Series where the index is 'OrderID' and values are concatenated strings

print("Concatenated products per order (using apply):")
print(aggregated_products_apply)
print()

# To get a DataFrame:
aggregated_products_df_apply = aggregated_products_apply.reset_index(name='ProductList_String')
print("As a DataFrame:")
print(aggregated_products_df_apply)

Output:

Concatenated products per order (using apply):
OrderID
101    Apple,Banana
102          Orange
103      Apple,Milk
Name: Product, dtype: object

As a DataFrame:
   OrderID ProductList_String
0      101       Apple,Banana
1      102             Orange
2      103         Apple,Milk

Grouping by Multiple Columns

The grouping columns become a MultiIndex in the resulting Series.

import pandas as pd

data = {
    'OrderID': [101, 101, 102, 103, 103, 103, 104],
    'Customer': ['Alice', 'Alice', 'Bob', 'Charlie', 'Charlie', 'Charlie', 'Alice'],
    'Product': ['Apple', 'Banana', 'Orange', 'Apple', 'Milk', 'Bread', 'Orange'],
    'Category': ['Fruit', 'Fruit', 'Fruit', 'Fruit', 'Dairy', 'Bakery', 'Fruit']
}

df = pd.DataFrame(data)
df_full = df.copy()

aggregated_multi_apply = df_full.groupby(['Customer', 'Category'])['Product'].apply(lambda x: ' | '.join(sorted(x.unique())))
# Using sorted(x.unique()) to get unique products in a defined order per group

print("Concatenated unique products per Customer & Category (apply):")
print(aggregated_multi_apply.reset_index(name='Product_Summary'))

Output:

Concatenated products per order (using apply):
OrderID
101    Apple,Banana
102          Orange
103      Apple,Milk
Name: Product, dtype: object

As a DataFrame:
   OrderID ProductList_String
0      101       Apple,Banana
1      102             Orange
2      103         Apple,Milk

Applying `list` to get a list of strings per group

If you want a list of strings instead of a single concatenated string:

import pandas as pd

data = {
    'OrderID': [101, 101, 102, 103, 103, 103, 104],
    'Customer': ['Alice', 'Alice', 'Bob', 'Charlie', 'Charlie', 'Charlie', 'Alice'],
    'Product': ['Apple', 'Banana', 'Orange', 'Apple', 'Milk', 'Bread', 'Orange'],
    'Category': ['Fruit', 'Fruit', 'Fruit', 'Fruit', 'Dairy', 'Bakery', 'Fruit']
}

df = pd.DataFrame(data)
df_full = df.copy()

list_of_products_per_group = df_full.groupby('OrderID')['Product'].apply(list)
print("List of products per OrderID (apply(list)):")
print(list_of_products_per_group)

Output:

List of products per OrderID (apply(list)):
OrderID
101         [Apple, Banana]
102                [Orange]
103    [Apple, Milk, Bread]
104                [Orange]
Name: Product, dtype: object

Method 3: Using `groupby().agg(','.join)` or `agg(list)` (Similar to `apply`)

The agg() (or aggregate()) method is designed for applying one or more aggregation functions. You can pass ''.join (or a lambda for a specific separator) or list.

import pandas as pd

df_example = pd.DataFrame({
    'OrderID': [101, 101, 102, 103, 103],
    'Product': ['Apple', 'Banana', 'Orange', 'Apple', 'Milk']
})

# --- Using agg with a custom join function ---
# Aggregating products into a single string per OrderID with a custom separator (',')
aggregated_products_agg_join = df_example.groupby('OrderID')['Product'].agg(lambda x: ','.join(x))

print("Concatenated products per order (using agg):")
print(aggregated_products_agg_join.reset_index(name='Products_Agg'))

# --- Using agg with 'list' to aggregate as lists ---
aggregated_products_agg_list = df_example.groupby('OrderID')['Product'].agg(list)
print("\nList of products per order (using agg(list)):")
print(aggregated_products_agg_list.reset_index(name='Product_List_Agg'))

# --- Named Aggregation for multiple columns (Modern Syntax) ---
# Since 'Date' and 'Name' don't exist in the current DataFrame, we'll perform a simpler example
# Aggregating by 'OrderID' with the join function for 'Product'
result_named_agg = df_example.groupby(['OrderID'], as_index=False).agg(
    Joined_Products=('Product', lambda x: ','.join(x))
)

print("\nNamed Aggregation (Modern Pandas Syntax):")
print(result_named_agg)

Output:

Concatenated products per order (using agg):
   OrderID  Products_Agg
0      101  Apple,Banana
1      102        Orange
2      103    Apple,Milk

List of products per order (using agg(list)):
   OrderID Product_List_Agg
0      101  [Apple, Banana]
1      102         [Orange]
2      103    [Apple, Milk]

Named Aggregation (Modern Pandas Syntax):
   OrderID Joined_Products
0      101    Apple,Banana
1      102          Orange
2      103      Apple,Milk

Choosing the Right Method

groupby().transform(separator.join): Use when you want the concatenated string added as a new column to your original DataFrame, aligned with its original index. You'll then likely drop_duplicates() to get one row per group.
groupby().apply(separator.join) or groupby().agg(separator.join): Use when you want a new Series (or DataFrame after reset_index()) where the index represents the groups and the values are the concatenated strings. These are direct aggregation methods.
- The apply() method is often more flexible for custom functions on groups, while agg() is powerful for applying multiple aggregations or named aggregations. For simply joining strings or creating lists, both work well.
Use list instead of separator.join with apply or agg if you want a list of strings per group rather than a single concatenated string.

Conclusion

Concatenating strings from multiple rows within groups in a Pandas DataFrame is a common data summarization technique.

df.groupby(group_cols)[str_col].transform(lambda x: separator.join(x)) is suitable if you need the concatenated string aligned with the original DataFrame's rows (often followed by drop_duplicates()).
df.groupby(group_cols)[str_col].apply(lambda x: separator.join(x)) or df.groupby(group_cols)[str_col].agg(lambda x: separator.join(x)) are excellent for producing an aggregated Series or DataFrame with one concatenated string per group. Substitute list for the lambda function if you need lists of strings.

Choose the method based on whether you need to modify the original DataFrame structure via transform or create a new aggregated summary. Remember to select the specific string column you want to concatenate after the groupby() call.

The Goal: String Aggregation within Groups​

Example DataFrame​

Method 1: Using groupby().transform(','.join) (Adds Concatenated String to Original DataFrame Shape)​

Grouping by a Single Column​

Grouping by Multiple Columns​

Method 2: Using groupby().apply(','.join) (Creates an Aggregated Series/DataFrame)​

Grouping by a Single Column​

Grouping by Multiple Columns​

Applying list to get a list of strings per group​

Method 3: Using groupby().agg(','.join) or agg(list) (Similar to apply)​

Choosing the Right Method​

Conclusion​

Table of Contents

The Goal: String Aggregation within Groups

Example DataFrame

Method 1: Using `groupby().transform(','.join)` (Adds Concatenated String to Original DataFrame Shape)

Grouping by a Single Column

Grouping by Multiple Columns

Method 2: Using `groupby().apply(','.join)` (Creates an Aggregated Series/DataFrame)

Grouping by a Single Column

Grouping by Multiple Columns

Applying `list` to get a list of strings per group

Method 3: Using `groupby().agg(','.join)` or `agg(list)` (Similar to `apply`)

Choosing the Right Method

Conclusion