Comparing the columns df1_user
and df2_user
. If multiple values of df2_user
is found in df1_user
then merge those values in a new column with a delimiter.
I have a dataframe (df1) look like this:
df1:
df1_ID df1_Tag df1_Info df1_user
1 Test1 Pass 100,200,300
2 Test2 Pass 400,500,600,700
3 Test3 Fail 800,900,1000,1100,1200
4 Test4 Pass 1300
5 Test5 Pass 1400,1500
df2:
df2_user
100
300
500
600
700
1100
1200
1300
1400
1600
The result dataframe should look like this:
df3:
df1_ID df1_Tag df1_Info df3_user
1 Test1 Pass 100,300
2 Test2 Pass 500,600,700
3 Test3 Fail 1100,1200
4 Test4 Pass 1300
5 Test5 Pass 1400
The code looks like this:
for name in df2['df2_user'].to_list():
df1.loc[ df1['df1_user'].str.contains(name), 'df3_user' ] = name
I am having issues merging the multiple values with a delimiter.
CodePudding user response:
Use custom lambda function for test if match splitted values in set from column df2['df2_user']
:
s = set(df2['df2_user'].astype(str))
f = lambda x: (','.join(y for y in x.split(',') if y in s))
df1['df3_user'] = df1['df1_user'].apply(f)
print (df1)
df1_ID df1_Tag df1_Info df1_user df3_user
0 1 Test1 Pass 100,200,300 100,300
1 2 Test2 Pass 400,500,600,700 500,600,700
2 3 Test3 Fail 800,900,1000,1100,1200 1100,1200
3 4 Test4 Pass 1300 1300
4 5 Test5 Pass 1400,1500 1400
Or use DataFrame.explode
with filter rows by Series.isin
, last join back by GroupBy.agg
with join
:
df1['df3_user'] = (df1.assign(df1_user = df1['df1_user'].str.split(','))
.explode('df1_user')
.loc[lambda x: x['df1_user'].isin(df2['df2_user'].astype(str))]
.groupby(level=0)['df1_user']
.agg(','.join))
print (df1)
df1_ID df1_Tag df1_Info df1_user df3_user
0 1 Test1 Pass 100,200,300 100,300
1 2 Test2 Pass 400,500,600,700 500,600,700
2 3 Test3 Fail 800,900,1000,1100,1200 1100,1200
3 4 Test4 Pass 1300 1300
4 5 Test5 Pass 1400,1500 1400