I have two dataframes:
df1:
Lower | Upper | Rank |
---|---|---|
0 | 7 | first |
8 | 14 | second |
15 | 23 | third |
df2:
Score |
---|
5 |
3 |
8 |
13 |
20 |
I want a third result dataframe df3 such that if the score in df2 is between the lower and upper values of df1, that row gets assigned the corresponding rank from df1
Score | Rank |
---|---|
5 | first |
3 | first |
8 | second |
13 | second |
20 | third |
CodePudding user response:
Try this.
df1 = pd.DataFrame( [[0,7,'first'],[8,14,'second'],[15,23,'third']], columns = ['Lower', 'Upper', 'Rank'])
df2 = pd.DataFrame( [5,3,8,13,20], columns = ['Score'])
result = []
for index, val in df2.iterrows():
for id, rank in df1.iterrows():
if val['Score'] >= rank['Lower'] and val['Score'] <= rank['Upper']:
result.append(rank['Rank'])
break
df_result = pd.DataFrame(columns = ['Score','Rank'])
df_result['Score'] = df2['Score']
df_result['Rank'] = result
CodePudding user response:
A faster option is to use pd.cut
as suggested by @psidom:
# create an intervalIndex, luckily your data does not have overlaps
intervals = pd.IntervalIndex.from_arrays(df1.Lower, df1.Upper, closed="both")
# map intervals to `Rank` column
mapping = pd.Series(df1.Rank.array, index=intervals)
# compute new column:
outcome = pd.cut(df2.Score, bins=intervals).map(mapping)
df2.assign(Rank = outcome)
Score Rank
0 5 first
1 3 first
2 8 second
3 13 second
4 20 third