Home > front end >  How to map/join some index column of a multiindex pandas DataFrame?
How to map/join some index column of a multiindex pandas DataFrame?

Time:09-16

Lets say I have a DataFrame df with a multi index ['siec', 'geo'] (shown in italic):

siec geo value
a DE 1
a FR 2

and a mapping DataFrame mapping_df from geo to id_region with a single index ['geo']:

geo id_region
DE 10
FR 20

=> How can I join/merge/replace the index column 'geo' of df with the values of the column 'id_region' from mapping_df?

Expected result with new multi index ['siec', 'id_region']:

siec id_region value
a 10 1
a 20 2

I tried following code:

import pandas as pd

df = pd.DataFrame([{'siec': 'a', 'geo': 'DE', 'value': 1}, {'siec': 'a', 'geo': 'FR', 'value': 1}])
df.set_index(['siec', 'geo'], inplace=True)

mapping_df = pd.DataFrame([{'geo': 'DE', 'id_region': 10}, {'geo': 'FR', 'id_region': 20}])
mapping_df.set_index(['geo'], inplace=True)

joined_data = df.join(mapping_df)

merged_data = df.merge(mapping_df, left_index=True, right_index=True)

but it does not do what I want. It adds an additional column and keeps the old index.

siec geo value id_region
a DE 1 10
a FR 2 20

=> Is there a convenient method for my use case or would I need to manually correct the index after a joining step?

As a workaround, I could reindex() the DataFrames, do some joining manipulations and then reintroduce some multi index.

However, I would like to avoid switching back and forth between the indexed and non-indexed versions of the DataFrames if possible (?).

CodePudding user response:

Try as follows.

df.index = df.index.set_levels(
    df.index.get_level_values(1).map(mapping_df['id_region']), level=1)\
    .set_names('id_region', level=1)

print(df)

                value
siec id_region       
a    10             1
     20             2
  • Related