Home > OS >  VLOOKUP using pandas without repetition
VLOOKUP using pandas without repetition

Time:06-04

I have two dataframes df1 and df2. How do I obtain df3 using pd.merge or any other function?

What I tried?

df3=df1.merge(df2, on='A', how='left')

This gives me df3 with number of rows same as df2. But, what I want is number of rows in df1 and df3 are same. df3 should look exactly as in the example provided.

df1

A   B
1   1
2   1
3   2
4   2
5   3
6   7
7   1
8   1
9   9
10  3

df2

A   C
1   XY
1   XY
2   XY
2   XY
2   XY
3   XY
3   XY
4   XY
4   XY
5   XY
5   XY
6   AB
6   AB
7   AB
8   AB
9   AB
9   AB
9   AB
10  MN
10  MN

df3

A   B   C
1   1   XY
2   1   XY
3   2   XY
4   2   XY
5   3   XY
6   7   AB
7   1   AB
8   1   AB
9   9   AB
10  3   MN

CodePudding user response:

Use drop_duplicates to keep one instance of (A, B):

>>> df1.merge(df2, on='A', how='left').drop_duplicates(['A', 'B'], ignore_index=True)
     A  B   C
0    1  1  XY
2    2  1  XY
5    3  2  XY
7    4  2  XY
9    5  3  XY
11   6  7  AB
13   7  1  AB
14   8  1  AB
15   9  9  AB
18  10  3  MN

CodePudding user response:

map works too

# map df2.C to df1.A via df2.A
df1["C"] = df1.A.map(df2.drop_duplicates().set_index('A').C)
df1

enter image description here

  • Related