Lets say I have a DataFrame df
with a multi index ['siec', 'geo'] (shown in italic):
siec | geo | value |
---|---|---|
a | DE | 1 |
a | FR | 2 |
and a mapping DataFrame mapping_df
from geo
to id_region
with a single index ['geo']:
geo | id_region |
---|---|
DE | 10 |
FR | 20 |
=> How can I join/merge/replace the index column 'geo' of df
with the values of the column 'id_region' from mapping_df
?
Expected result with new multi index ['siec', 'id_region']:
siec | id_region | value |
---|---|---|
a | 10 | 1 |
a | 20 | 2 |
I tried following code:
import pandas as pd
df = pd.DataFrame([{'siec': 'a', 'geo': 'DE', 'value': 1}, {'siec': 'a', 'geo': 'FR', 'value': 1}])
df.set_index(['siec', 'geo'], inplace=True)
mapping_df = pd.DataFrame([{'geo': 'DE', 'id_region': 10}, {'geo': 'FR', 'id_region': 20}])
mapping_df.set_index(['geo'], inplace=True)
joined_data = df.join(mapping_df)
merged_data = df.merge(mapping_df, left_index=True, right_index=True)
but it does not do what I want. It adds an additional column and keeps the old index.
siec | geo | value | id_region |
---|---|---|---|
a | DE | 1 | 10 |
a | FR | 2 | 20 |
=> Is there a convenient method for my use case or would I need to manually correct the index after a joining step?
As a workaround, I could reindex()
the DataFrames, do some joining manipulations and then reintroduce some multi index.
However, I would like to avoid switching back and forth between the indexed and non-indexed versions of the DataFrames if possible (?).
CodePudding user response:
Try as follows.
- Use
MultiIndex.get_level_values
to select only level1
(or:geo
) and applyIndex.map
withmapping_df['id_region']
as mapper. - Wrap the result inside
MultiIndex.set_levels
to overwrite level1
. - Finally, chain
Index.set_names
to rename the level (or useMultiIndex.rename
).
df.index = df.index.set_levels(
df.index.get_level_values(1).map(mapping_df['id_region']), level=1)\
.set_names('id_region', level=1)
print(df)
value
siec id_region
a 10 1
20 2