I have two dataframes df1 and df2. How do I obtain df3 using pd.merge or any other function?
What I tried?
df3=df1.merge(df2, on='A', how='left')
This gives me df3 with number of rows same as df2. But, what I want is number of rows in df1 and df3 are same. df3 should look exactly as in the example provided.
df1
A B
1 1
2 1
3 2
4 2
5 3
6 7
7 1
8 1
9 9
10 3
df2
A C
1 XY
1 XY
2 XY
2 XY
2 XY
3 XY
3 XY
4 XY
4 XY
5 XY
5 XY
6 AB
6 AB
7 AB
8 AB
9 AB
9 AB
9 AB
10 MN
10 MN
df3
A B C
1 1 XY
2 1 XY
3 2 XY
4 2 XY
5 3 XY
6 7 AB
7 1 AB
8 1 AB
9 9 AB
10 3 MN
CodePudding user response:
Use drop_duplicates
to keep one instance of (A, B)
:
>>> df1.merge(df2, on='A', how='left').drop_duplicates(['A', 'B'], ignore_index=True)
A B C
0 1 1 XY
2 2 1 XY
5 3 2 XY
7 4 2 XY
9 5 3 XY
11 6 7 AB
13 7 1 AB
14 8 1 AB
15 9 9 AB
18 10 3 MN
CodePudding user response:
map works too
# map df2.C to df1.A via df2.A
df1["C"] = df1.A.map(df2.drop_duplicates().set_index('A').C)
df1