Home > Mobile >  Retain MultiIndex on Left Merge in Pandas
Retain MultiIndex on Left Merge in Pandas

Time:09-06

I'm performing a left join via merge between two DataFrames, the left one has a MultiIndex, the right has a single index. How can I retain the MultiIndex after the join?

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
  'i1': np.arange(0, 100),
  'i2': np.arange(0, 100),
  'c': np.random.randint(0 , 10, 100)
}).set_index(['i1', 'i2'])

df2 = pd.DataFrame({
  'i1': np.arange(0, 100),
  'd': np.random.randint(0 , 10, 100)
}).set_index('i1')

merged = df1.merge(df2, how="left", on="i1")


merged.index # => Int64Index i1 but I want the i1, i2 index to remain

I hoped this might be as simple as reassigning the old index, but it looks like there are no ordering guarantees after a merge.

While I could avoid using indexing all together, I need them. The dataset I'm using is large, and I need the speed up gained when merging with indexes.

CodePudding user response:

merge()

df1.merge(df2, left_index=True, right_index=True, how='left')
###
       c  d
i1 i2      
0  0   6  2
1  1   8  6
2  2   7  7
3  3   9  6
4  4   7  5
...   .. ..
95 95  5  9
96 96  9  4
97 97  8  5
98 98  4  2
99 99  1  4

join()

df1.join(df2, on='i1', how='left')
###
       c  d
i1 i2      
0  0   6  2
1  1   8  6
2  2   7  7
3  3   9  6
4  4   7  5
...   .. ..
95 95  5  9
96 96  9  4
97 97  8  5
98 98  4  2
99 99  1  4

combine_first()

df1.combine_first(df2)
###
       c  d
i1 i2      
0  0   6  2
1  1   8  6
2  2   7  7
3  3   9  6
4  4   7  5
...   .. ..
95 95  5  9
96 96  9  4
97 97  8  5
98 98  4  2
99 99  1  4
  • Related