I am trying to use pandas for a data analysis, used merge
for performing a vlookup, the two data sets are as below,
data1 =
acc_name tier content group gcode acc_ID
abc 3 55 b 111 R-DDD
def 4 45 c 222 X-TTT
xyz 4 60 a 333 S-UUU
abc 4 4 b 112 R-DDD
xyz 4 6 a 331 X-TTT
def 4 10 c 221 S-UUU
data2=
Accountn type status
xyz internal Active
def external Active
abc internal Inactive
the code I used is
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')
data1.rename(columns={'acc_name':'Accountn'},inplace = True)
final = pd.merge(data1,data2[['Accountn','status']],on=['Accountn'],how='left')
The Output I am getting is:
final =
Accountn tier content group gcode acc_ID status
abc 3 55 b 111 R-DDD Inactive
abc 3 55 b 111 R-DDD Inactive
abc 3 55 b 111 R-DDD Inactive
def 4 45 c 222 X-TTT Active
def 4 45 c 222 X-TTT Active
def 4 45 c 222 X-TTT Active
xyz 4 60 a 333 S-UUU Active
xyz 4 60 a 333 S-UUU Active
xyz 4 60 a 333 S-UUU Active
abc 4 4 b 112 R-DDD Inactive
abc 4 4 b 112 R-DDD Inactive
abc 4 4 b 112 R-DDD Inactive
xyz 4 6 a 331 X-TTT Active
xyz 4 6 a 331 X-TTT Active
xyz 4 6 a 331 X-TTT Active
def 4 10 c 221 S-UUU Active
def 4 10 c 221 S-UUU Active
def 4 10 c 221 S-UUU Active
The O/P I want is,
Accountn tier content group gcode acc_ID status
abc 3 55 b 111 R-DDD Inactive
def 4 45 c 222 X-TTT Active
xyz 4 60 a 333 S-UUU Active
abc 4 4 b 112 R-DDD Inactive
xyz 4 6 a 331 X-TTT Active
def 4 10 c 221 S-UUU Active
I dont know what is wrong in my code
CodePudding user response:
Remove duplicates in df2
by column for join, here Accountn
, so no duplicates in output:
final = pd.merge(data1,data2[['Accountn','status']].drop_duplicates('Accountn'),
on=['Accountn'],
how='left')