df1
Place Location
Delhi,Punjab,Jaipur Delhi,Punjab,Noida,Lucknow
Delhi,Punjab,Jaipur Delhi,Bhopal,Jaipur,Rajkot
Delhi,Punjab,Kerala Delhi,Jaipur,Madras
df2
Target1 Target2 Strength
Jaipur Rajkot 0.94
Jaipur Punjab 0.84
Jaipur Noida 0.62
Jaipur Jodhpur 0.59
Punjab Amritsar 0.97
Punjab Delhi 0.85
Punjab Bhopal 0.91
Punjab Jodhpur 0.75
Kerala Varkala 0.85
Kerala Kochi 0.88
The task is to match 'Place' value with 'Location' values and assign score '1' in case of direct match and refer df2 in case of indirect match and assign strength score from that. For Ex: In Row1 Delhi and Punjab are direct match as both are present in 'Place' and 'Location' wherein Jaipur is there in 'Place' but not in 'Location. So, Jaipur will be iterated in df2 Target1 and try to find the corresponding 'Location' values of Row1 in Target2. In df2 Jaipur is related to Punjab and Noida which there in ROW1 Location values. So, corresponding to Jaipur, Punjab strength will be alloted as 0.84 is higher than Noida's 0.62. Final score is calculated as (1 1 0.84)/3 i.e sum of direct and indirect matches divided by number of 'Place' items.
Expected output is :
Place Location Avg. Score
Delhi,Punjab,Jaipur Delhi,Punjab,Noida,Lucknow (1 1 0.84)/3 = 0.95
Delhi,Punjab,Jaipur Delhi,Bhopal,Jaipur,Rajkot (1 0.91 1)/3 = 0.97
Delhi,Punjab,Kerala Delhi,Jaipur,Madras (1 0.85 0)/3 = 0.67
My try
data1 = df1['Place'].to_list()
data2 = df1['Location'].to_list()
dict3 = {}
exac_match = []
for el in data1:
#print(el)
el=[x.strip() for x in el.split(',')]
for ell in data2:
ell=[x.strip() for x in ell.split(',')]
dict1 = {}
dict2 = {}
for elll in el:
if elll in ell:
#print("Exact match:::", elll)
dict1[elll]=1
dict2[elll]=elll
CodePudding user response:
Use:
#convert splitted values of df1['Place'] to rows
df = df1.assign(Place = df1['Place'].str.split(',')).explode('Place').reset_index()
#test if match Place in Location (splitted values)
mask = [a in b for a, b in zip(df.Place, df['Location'].str.split(','))]
#filter matched and remove duplicates, assign 1 to final column
df11 = df[mask].drop_duplicates(['index','Place','Location']).assign(final=1)
#filter not matched rows (indirect match) and join df2
df12 = df[~np.array(mask)].merge(df2, left_on='Place', right_on='Target1')
#test if Target2 in Location
mask = [a in b for a, b in zip(df12.Target2, df12['Location'].str.split(','))]
#get maximal Strength per Place
df12 = df12[mask].copy()
df12 = (df12.loc[df12.groupby(['index','Place'])['Strength'].idxmax()]
.assign(final = lambda x: x['Strength']))
#join together
df3 = pd.concat([df11, df12[['index','Place','final','Location']]])
#join to exploded DataFrame with replace NaN to 0 in final column
df = df.merge(df3, how='left', on=['index','Place']).fillna({'final':0})
print (df)
index Place Location_x Location_y \
0 0 Delhi Delhi,Punjab,Noida,Lucknow Delhi,Punjab,Noida,Lucknow
1 0 Punjab Delhi,Punjab,Noida,Lucknow Delhi,Punjab,Noida,Lucknow
2 0 Jaipur Delhi,Punjab,Noida,Lucknow Delhi,Punjab,Noida,Lucknow
3 1 Delhi Delhi,Bhopal,Jaipur,Rajkot Delhi,Bhopal,Jaipur,Rajkot
4 1 Punjab Delhi,Bhopal,Jaipur,Rajkot Delhi,Bhopal,Jaipur,Rajkot
5 1 Jaipur Delhi,Bhopal,Jaipur,Rajkot Delhi,Bhopal,Jaipur,Rajkot
6 2 Delhi Delhi,Jaipur,Madras Delhi,Jaipur,Madras
7 2 Punjab Delhi,Jaipur,Madras Delhi,Jaipur,Madras
8 2 Kerala Delhi,Jaipur,Madras NaN
final
0 1.00
1 1.00
2 0.84
3 1.00
4 0.91
5 1.00
6 1.00
7 0.85
8 0.00
#last aggregate mean and assign to df1['Score']
df1['Score'] = df.groupby('index')['final'].mean()
print (df1)
Place Location Score
0 Delhi,Punjab,Jaipur Delhi,Punjab,Noida,Lucknow 0.946667
1 Delhi,Punjab,Jaipur Delhi,Bhopal,Jaipur,Rajkot 0.970000
2 Delhi,Punjab,Kerala Delhi,Jaipur,Madras 0.616667