I have two data frames.
df1 = pd.DataFrame({'vin':['aaa','aaa','aaa','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['ABS','ABS','IPMA','BCCM','HPOC','ABS','ABS','HPOC','ABS','ABS']})
df2 = pd.DataFrame({'vin':['aaa','bbb','ccc','ddd','eee','fff']})
So basically in df2, I want to pull values of the 'module' column from df1 with the respective column 'vin' but the challenge is I want all values in one cell separated by a comma. I tried the below command.
df_merge = pd.merge(df2, df1[['module','vin']], on ='vin', how ='left')
Now the problem with this line of code is, that it is pulling data in multiple rows that I don't want.
My expected output will be like this:-
df2 = pd.DataFrame({'vin':['aaa','bbb','ccc','ddd'],'module':['ABS,ABS,IPMA','BCCM','HPOC,ABS','ABS']})
CodePudding user response:
Check below code
df_merge = pd.merge(df2, df1.groupby(['vin'])['module'].apply(list), on ='vin', how ='left')
df_merge['module'] = df_merge['module'].astype('str').str.replace("\[|\]|\'| ","")
df_merge
Output:
CodePudding user response:
You can simply do:
df2.merge(df1, how='left').groupby('vin').agg({'module': lambda x: ', '.join(x)})
It gives you:
vin | module |
---|---|
aaa | ABS, ABS, IPMA |
bbb | BCCM |
ccc | HPOC, ABS |
ddd | ABS |
eee | HPOC, ABS |
fff | ABS |