Python Pandas: Convert Nested Dictionary to DataFrame
Nested dictionaries are a common way to represent structured hierarchical data in Python. Pandas provides flexible methods to convert these nested structures into DataFrames, which are essential for tabular data analysis and manipulation. The key is often how you want the levels of the dictionary to map to the rows and columns of the DataFrame.
This guide explains how to convert various forms of nested dictionaries into Pandas DataFrames, primarily using pd.DataFrame.from_dict()
and the pd.DataFrame()
constructor, along with techniques for handling different nesting patterns.
Understanding Nested Dictionaries for DataFrame Creation
A nested dictionary is a dictionary where some of its values are themselves dictionaries. For example:
data_rows = {
'record1': {'name': 'Alice', 'age': 30, 'city': 'New York'},
'record2': {'name': 'Bob', 'age': 24, 'city': 'London'}
}
# Here, 'record1' and 'record2' are outer keys.
# {'name': ..., 'age': ..., 'city': ...} are inner dictionaries.
Pandas needs to know how to map these levels to DataFrame rows, columns, and index.
Common Nested Dictionary Structures
Outer Keys as Rows, Inner Keys as Columns
This is a very common structure where each outer dictionary key represents a row identifier (which will become the DataFrame index), and the inner dictionary's keys become the column names.
This is a very typical structure for representing multiple records, where each record has a unique identifier (outer key) and a set of attributes (inner key-value pairs).
Example:
data_structure_1 = {
'record_001': {
'name': 'Alice Wonderland',
'age': 30,
'department': 'Engineering',
'city': 'New York'
},
'record_002': {
'name': 'Bob The Builder',
'age': 24,
'department': 'Construction',
'city': 'London'
},
'record_003': {
'name': 'Charlie Chaplin',
'age': 45,
'department': 'Entertainment',
'city': 'Paris'
}
}
- Outer keys (
'record_001'
,'record_002'
,'record_003'
) are unique identifiers for each entity (e.g., a person, a product, an observation). - Inner dictionaries contain the attributes (like
'name'
,'age'
) and their corresponding values for each outer key.
Outer Keys as Columns, Inner Keys as Row Index
Less common for direct conversion but possible. Here, the outer keys would form the DataFrame columns, and the inner keys would form the row index.
This structure organizes data primarily by attribute (outer keys), with each attribute having values corresponding to different entities (inner keys).
Example:
data_structure_2 = {
'Feature_A': {
'item_X': 10,
'item_Y': 15,
'item_Z': 20
},
'Feature_B': {
'item_X': 100,
'item_Y': 150,
'item_Z': 220 # Note: item_Z has a different value for Feature_B
},
'Feature_C': {
'item_X': True,
'item_Y': False,
'item_Z': True
}
}
- Outer keys (
'Feature_A'
,'Feature_B'
,'Feature_C'
) represent the main attributes or variables you are tracking. - Inner dictionaries map identifiers (
'item_X'
,'item_Y'
) to the values for that specific feature.
Dictionary with List Values
This structure can represent multiple observations per outer key or can be used to construct MultiIndex DataFrames.
This structure is often used when the "columns" of your conceptual table are known, and each "column" contains a list of values, one for each "row".
Example:
data_structure_3 = {
'Experiment_ID': ['Exp1', 'Exp2', 'Exp1', 'Exp3', 'Exp2'],
'Measurement_Type': ['Temp', 'Pressure', 'Temp', 'Humidity', 'Pressure'],
'Value': [25.5, 101.2, 26.1, 60.3, 100.9],
'Unit': ['Celsius', 'kPa', 'Celsius', '%', 'kPa']
}
- Keys (
'Experiment_ID'
,'Measurement_Type'
, etc.) directly map to what will become column names in the DataFrame. - Values are lists, where each list contains all the entries for that column. All lists must be of the same length for this to work directly with
pd.DataFrame()
.
Method 1: pd.DataFrame.from_dict(nested_dict, orient='index')
(Outer Keys to Rows)
This is typically the most straightforward way when your nested dictionary structure has outer keys intended as row identifiers and inner keys as column headers.
import pandas as pd
student_data_nested = {
'student_A': {'Math': 90, 'Science': 85, 'History': 78},
'student_B': {'Math': 75, 'Science': 92, 'History': 88},
'student_C': {'Math': 88, 'Science': 80, 'History': 95}
}
# ✅ orient='index' makes outer keys ('student_A', etc.) the row index.
# Inner keys ('Math', etc.) become column names.
df_from_dict_orient_index = pd.DataFrame.from_dict(student_data_nested, orient='index')
print("DataFrame from nested dict (orient='index'):")
print(df_from_dict_orient_index)
Output:
DataFrame from nested dict (orient='index'):
Math Science History
student_A 90 85 78
student_B 75 92 88
student_C 88 80 95
orient='index'
: Tells Pandas to treat the keys of the inputdict
as the row labels (index).
Method 2: pd.DataFrame.from_dict(nested_dict, orient='columns')
(Outer Keys to Columns - Default)
If orient='columns'
(which is the default for from_dict
if the inner values are dict-like), the outer keys of your nested_dict
will become the DataFrame's column names. The keys of the inner dictionaries will become the DataFrame's index.
import pandas as pd
# Same data as before, but note how it's interpreted differently
student_data_nested = {
'student_A': {'Math': 90, 'Science': 85, 'History': 78},
'student_B': {'Math': 75, 'Science': 92, 'History': 88},
'student_C': {'Math': 88, 'Science': 80, 'History': 95}
}
# ✅ orient='columns' (or omitting orient with this dict structure)
df_from_dict_orient_cols = pd.DataFrame.from_dict(student_data_nested, orient='columns')
print("DataFrame from nested dict (orient='columns'):")
print(df_from_dict_orient_cols)
Output:
DataFrame from nested dict (orient='columns'):
student_A student_B student_C
Math 90 75 88
Science 85 92 80
History 78 88 95
This is useful if your nested dictionary is structured "by column."
Method 3: pd.DataFrame(nested_dict)
(Often Outer Keys to Columns)
Passing a nested dictionary directly to the pd.DataFrame()
constructor often behaves like orient='columns'
if the inner values are dictionaries.
import pandas as pd
# Using the same student_data_nested
student_data_nested = {
'student_A': {'Math': 90, 'Science': 85, 'History': 78},
'student_B': {'Math': 75, 'Science': 92, 'History': 88},
'student_C': {'Math': 88, 'Science': 80, 'History': 95}
}
df_constructor = pd.DataFrame(student_data_nested)
print("DataFrame from nested dict using pd.DataFrame() constructor:")
print(df_constructor)
Output: (Same as orient='columns')
DataFrame from nested dict using pd.DataFrame() constructor:
student_A student_B student_C
Math 90 75 88
Science 85 92 80
History 78 88 95
Using .T
(Transpose) to Flip Rows and Columns
If the pd.DataFrame()
constructor gives you columns where you wanted rows (or vice-versa), you can use the transpose attribute .T
to swap rows and columns.
import pandas as pd
student_data_nested = {
'student_A': {'Math': 90, 'Science': 85, 'History': 78},
'student_B': {'Math': 75, 'Science': 92, 'History': 88},
'student_C': {'Math': 88, 'Science': 80, 'History': 95}
}
df_constructor = pd.DataFrame(student_data_nested)
# If pd.DataFrame(nested_dict) results in outer keys as columns,
# and you wanted them as rows, transpose it.
df_transposed = df_constructor.T # .T is the transpose accessor
print("Transposed DataFrame (to get outer keys as rows):")
print(df_transposed)
Output:
Transposed DataFrame (to get outer keys as rows):
Math Science History
student_A 90 85 78
student_B 75 92 88
student_C 88 80 95
Handling Nested Dictionaries with List Values
If your inner dictionaries contain lists as values, pd.DataFrame.from_dict(..., orient='index')
will create columns where each cell contains a list.
import pandas as pd
data_with_lists = {
'SensorA': {'Temperature': [22, 23, 22], 'Humidity': [60, 62, 61]},
'SensorB': {'Temperature': [25, 25, 26], 'Humidity': [55, 54, 55]}
}
df_lists = pd.DataFrame.from_dict(data_with_lists, orient='index')
print("DataFrame from dict with list values:")
print(df_lists)
print()
# To expand these lists into separate rows (if desired), you can use .stack().explode()
# or pd.explode() on each column.
df_exploded = df_lists.stack().explode().reset_index()
df_exploded.columns = ['Sensor', 'Metric', 'Value']
print("Exploded DataFrame:")
print(df_exploded.head()) # Show first few rows
Output:
DataFrame from dict with list values:
Temperature Humidity
SensorA [22, 23, 22] [60, 62, 61]
SensorB [25, 25, 26] [55, 54, 55]
Exploded DataFrame:
Sensor Metric Value
0 SensorA Temperature 22
1 SensorA Temperature 23
2 SensorA Temperature 22
3 SensorA Humidity 60
4 SensorA Humidity 62
Further processing with explode()
or other methods might be needed depending on the desired final structure.
Converting Deeply Nested Dictionaries (Custom Logic/List Comprehensions)
If your dictionary is more deeply nested or has an irregular structure not directly supported by from_dict
's orient
parameter, you might need to first "flatten" the dictionary into a list of simpler dictionaries or a list of lists using custom Python logic (often list comprehensions or loops) before passing it to pd.DataFrame()
.
import pandas as pd
deeply_nested_dict = {
'dept_sales': {
'employee_101': {'name': 'Alice', 'region': 'North', 'sales': 1000},
'employee_102': {'name': 'Bob', 'region': 'South', 'sales': 1500}
},
'dept_hr': {
'employee_201': {'name': 'Charlie', 'region': 'North', 'role': 'Manager'},
'employee_202': {'name': 'David', 'region': 'West', 'role': 'Assistant'}
}
}
# Flatten into a list of dictionaries
flattened_data = []
for dept_name, employees in deeply_nested_dict.items():
for emp_id, emp_details in employees.items():
record = {'department': dept_name, 'employee_id': emp_id}
record.update(emp_details) # Add all inner details
flattened_data.append(record)
df_deep_flat = pd.DataFrame(flattened_data)
print("DataFrame from deeply nested dict after custom flattening:")
print(df_deep_flat[['department', 'employee_id', 'name', 'region', 'sales', 'role']]) # Select relevant columns
Output:
DataFrame from deeply nested dict after custom flattening:
department employee_id name region sales role
0 dept_sales employee_101 Alice North 1000.0 NaN
1 dept_sales employee_102 Bob South 1500.0 NaN
2 dept_hr employee_201 Charlie North NaN Manager
3 dept_hr employee_202 David West NaN Assistant
This approach gives you full control over how the nested structure is translated into rows and columns.
Setting Index Names After Conversion
After creating a DataFrame where dictionary keys became the index, you can name the index using df.index.name
. If you used pd.concat()
with the names
argument, this might already be set.
import pandas as pd
student_data_nested = {
'student_A': {'Math': 90, 'Science': 85, 'History': 78},
'student_B': {'Math': 75, 'Science': 92, 'History': 88},
'student_C': {'Math': 88, 'Science': 80, 'History': 95}
}
df_from_dict_orient_index = pd.DataFrame.from_dict(student_data_nested, orient='index')
# Set the name of the index
df_from_dict_orient_index.index.name = 'StudentID'
print("DataFrame with named index:")
print(df_from_dict_orient_index)
Output:
DataFrame with named index:
Math Science History
StudentID
student_A 90 85 78
student_B 75 92 88
student_C 88 80 95
Conclusion
Pandas offers several ways to convert nested dictionaries into DataFrames, depending on how the dictionary's structure should map to rows and columns:
pd.DataFrame.from_dict(my_dict, orient='index')
: Ideal when outer dictionary keys should become the DataFrame's row index and inner keys its columns.pd.DataFrame(my_dict)
orpd.DataFrame.from_dict(my_dict, orient='columns')
: Use when outer dictionary keys should become DataFrame columns and inner keys the row index. You can use.T
to transpose if the initial orientation is not what you need.- Custom Flattening Logic (e.g., list comprehensions) +
pd.DataFrame()
: Necessary for more complex or irregularly nested dictionaries to transform the data into a list of records (dictionaries) or list of lists suitable for DataFrame creation.
Understanding the orient
parameter of from_dict
and the default behavior of the DataFrame
constructor is key to efficiently converting your nested dictionary data into a structured Pandas DataFrame.