Skip to main content

Python Pandas: How to Create Dictionary from Two DataFrame Columns (Key-Value Pairs)

Converting two columns of a Pandas DataFrame into a Python dictionary, where one column provides the keys and the other provides the corresponding values, is a common data transformation task. This is useful for creating lookup tables, mapping values, or preparing data for functions that expect dictionary input.

This guide explains several effective methods to create a dictionary from two DataFrame columns in Pandas, using zip(), pd.Series().to_dict(), and DataFrame.set_index().

The Goal: Mapping One Column to Another as Key-Value Pairs

Given a Pandas DataFrame with at least two columns, we want to select one column to serve as the keys for our new dictionary and another column to serve as the values. Each corresponding pair of values from these two columns will form a key-value pair in the resulting dictionary.

Example: If we have:

Key_ColValue_Col
'A'1
'B'2

We want to produce: {'A': 1, 'B': 2}.

Example DataFrame

import pandas as pd

data = {
'CountryCode': ['US', 'CA', 'GB', 'DE', 'JP'],
'CountryName': ['United States', 'Canada', 'United Kingdom', 'Germany', 'Japan'],
'Population': [330, 38, 67, 83, 126] # In millions
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
CountryCode CountryName Population
0 US United States 330
1 CA Canada 38
2 GB United Kingdom 67
3 DE Germany 83
4 JP Japan 126

Let's aim to create a dictionary mapping 'CountryCode' (keys) to 'CountryName' (values).

This is often the most Pythonic and direct way.

  1. Select the two columns you want to use for keys and values. These are Pandas Series.
  2. Use the built-in zip() function to pair up corresponding elements from these two Series. zip() creates an iterator of tuples.
  3. Pass this iterator of tuples directly to the dict() constructor.
import pandas as pd

df_example = pd.DataFrame({
'CountryCode': ['US', 'CA', 'GB', 'DE', 'JP'],
'CountryName': ['United States', 'Canada', 'United Kingdom', 'Germany', 'Japan']
})

key_column = 'CountryCode'
value_column = 'CountryName'

# ✅ Zip the key and value columns and pass to dict()
country_code_to_name_dict = dict(zip(df_example[key_column], df_example[value_column]))

print(f"Dictionary from '{key_column}' (keys) and '{value_column}' (values):")
print(country_code_to_name_dict)
print()

zipped_pairs = zip(df_example[key_column], df_example[value_column])
print(list(zipped_pairs))

Output:

Dictionary from 'CountryCode' (keys) and 'CountryName' (values):
{'US': 'United States', 'CA': 'Canada', 'GB': 'United Kingdom', 'DE': 'Germany', 'JP': 'Japan'}

[('US', 'United States'), ('CA', 'Canada'), ('GB', 'United Kingdom'), ('DE', 'Germany'), ('JP', 'Japan')]

This method is concise, readable, and generally efficient.

Swapping Key and Value Columns

To make 'CountryName' the key and 'CountryCode' the value, simply reverse the order in zip():

import pandas as pd

df_example = pd.DataFrame({
'CountryCode': ['US', 'CA', 'GB', 'DE', 'JP'],
'CountryName': ['United States', 'Canada', 'United Kingdom', 'Germany', 'Japan']
})

key_column = 'CountryCode'
value_column = 'CountryName'

country_name_to_code_dict = dict(zip(df_example[value_column], df_example[key_column]))
print(f"Dictionary from '{value_column}' (keys) and '{key_column}' (values):")
print(country_name_to_code_dict)

Output:

Dictionary from 'CountryName' (keys) and 'CountryCode' (values):
{'United States': 'US', 'Canada': 'CA', 'United Kingdom': 'GB', 'Germany': 'DE', 'Japan': 'JP'}

Method 2: Using pd.Series(values, index=keys).to_dict()

You can create a Pandas Series where one DataFrame column forms the Series' data and another forms its index. Then, convert this Series to a dictionary using Series.to_dict().

import pandas as pd

df_example = pd.DataFrame({
'CountryCode': ['US', 'CA', 'GB', 'DE', 'JP'],
'CountryName': ['United States', 'Canada', 'United Kingdom', 'Germany', 'Japan']
})

key_column = 'CountryCode'
value_column = 'CountryName'

# ✅ Create a Series using 'CountryCode' as index and 'CountryName' as values
# Note: .values is used to get the underlying NumPy array for data, index takes the Series directly.
temp_series = pd.Series(df_example[value_column].values, index=df_example[key_column])
print("Temporary Series created:")
print(temp_series)
print()

# Convert this Series to a dictionary
series_to_dict_result = temp_series.to_dict()

print(f"Dictionary from Series using '{key_column}' as index and '{value_column}' as values:")
print(series_to_dict_result)

Output:

Temporary Series created:
CountryCode
US United States
CA Canada
GB United Kingdom
DE Germany
JP Japan
dtype: object

Dictionary from Series using 'CountryCode' as index and 'CountryName' as values:
{'US': 'United States', 'CA': 'Canada', 'GB': 'United Kingdom', 'DE': 'Germany', 'JP': 'Japan'}

This is also a valid approach, effectively using the Series as an intermediate step to structure the key-value mapping.

Method 3: Using DataFrame.set_index() and Series.to_dict()**

  1. Set one column as the DataFrame's index using df.set_index('KeyColumnName').
  2. Select the other column (which is now a Series with the desired index).
  3. Call .to_dict() on this Series.
import pandas as pd

df_example = pd.DataFrame({
'CountryCode': ['US', 'CA', 'GB', 'DE', 'JP'],
'CountryName': ['United States', 'Canada', 'United Kingdom', 'Germany', 'Japan'],
'Population': [330,38,67,83,126]
})

key_column = 'CountryCode'
value_column = 'CountryName'

# ✅ Set 'CountryCode' as index, then select 'CountryName' Series and convert to dict
set_index_dict_result = df_example.set_index(key_column)[value_column].to_dict()
# df.set_index('CountryCode') would look like:
# CountryName Population
# CountryCode
# US United States 330
# ...
# Then df.set_index('CountryCode')['CountryName'] selects the 'CountryName' Series.

print(f"Dictionary using set_index('{key_column}') and selecting '{value_column}':")
print(set_index_dict_result)

Output:

Dictionary using set_index('CountryCode') and selecting 'CountryName':
{'US': 'United States', 'CA': 'Canada', 'GB': 'United Kingdom', 'DE': 'Germany', 'JP': 'Japan'}

This method is also quite idiomatic in Pandas. If the DataFrame has only two columns, after set_index(), .to_dict() on the remaining column might need an extra step or a different orient parameter if used on the whole DataFrame (e.g., df.set_index('key_col').to_dict()['value_col']).

note

Note on other methods:

  • DataFrame.to_records(index=False): This converts the DataFrame to a NumPy record array. Passing list(df[['key_col', 'value_col']].to_records(index=False)) to dict() will work if the keys are unique.
  • pd.MultiIndex.from_frame(df[['key_col', 'value_col']]): This creates a MultiIndex. Passing list(...) of this to dict() also creates the dictionary.

These methods are generally less direct and more verbose for the simple task of creating a dictionary from two columns compared to zip() or set_index().to_dict().

Considerations for Duplicate Keys

If the column chosen for dictionary keys contains duplicate values, the dict() constructor (used in zip() and Series.to_dict()) will only keep the last occurrence of that key-value pair. Dictionaries cannot have duplicate keys.

import pandas as pd

df_duplicates = pd.DataFrame({
'Key': ['A', 'B', 'A', 'C'],
'Value': [1, 2, 3, 4]
})

dict_with_duplicates = dict(zip(df_duplicates['Key'], df_duplicates['Value']))
print(f"Dictionary from columns with duplicate keys ('A'): {dict_with_duplicates}")

Output:

Dictionary from columns with duplicate keys ('A'): {'A': 3, 'B': 2, 'C': 4}

If you need to handle duplicate keys differently (e.g., aggregate values into a list), you'll need more advanced logic, typically involving groupby().

Conclusion

Creating a Python dictionary from two Pandas DataFrame columns is a straightforward task with several idiomatic solutions:

  1. dict(zip(df['key_column'], df['value_column'])): This is generally the most Pythonic, concise, and recommended method for its readability and efficiency.
  2. pd.Series(df['value_column'].values, index=df['key_column']).to_dict(): Constructs an intermediate Series with the desired key-value mapping before converting to a dictionary.
  3. df.set_index('key_column')['value_column'].to_dict(): Sets one column as the index, then converts the value column (now a Series with the correct index) to a dictionary.

Choose the method that you find most intuitive. The zip() approach is often favored for its directness. Be mindful of how duplicate keys are handled by the dict() constructor (last one wins).