I'm performing a left join via merge between two DataFrames, the left one has a MultiIndex, the right has a single index. How can I retain the MultiIndex after the join?
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'i1': np.arange(0, 100),
'i2': np.arange(0, 100),
'c': np.random.randint(0 , 10, 100)
}).set_index(['i1', 'i2'])
df2 = pd.DataFrame({
'i1': np.arange(0, 100),
'd': np.random.randint(0 , 10, 100)
}).set_index('i1')
merged = df1.merge(df2, how="left", on="i1")
merged.index # => Int64Index i1 but I want the i1, i2 index to remain
I hoped this might be as simple as reassigning the old index, but it looks like there are no ordering guarantees after a merge.
While I could avoid using indexing all together, I need them. The dataset I'm using is large, and I need the speed up gained when merging with indexes.
CodePudding user response:
merge()
df1.merge(df2, left_index=True, right_index=True, how='left')
###
c d
i1 i2
0 0 6 2
1 1 8 6
2 2 7 7
3 3 9 6
4 4 7 5
... .. ..
95 95 5 9
96 96 9 4
97 97 8 5
98 98 4 2
99 99 1 4
join()
df1.join(df2, on='i1', how='left')
###
c d
i1 i2
0 0 6 2
1 1 8 6
2 2 7 7
3 3 9 6
4 4 7 5
... .. ..
95 95 5 9
96 96 9 4
97 97 8 5
98 98 4 2
99 99 1 4
combine_first()
df1.combine_first(df2)
###
c d
i1 i2
0 0 6 2
1 1 8 6
2 2 7 7
3 3 9 6
4 4 7 5
... .. ..
95 95 5 9
96 96 9 4
97 97 8 5
98 98 4 2
99 99 1 4