I have a DataFrame df1
containing daily time-series of IDs
and Scores
in different countries C
. For the countries, I have an additional DataFrame df2
which contains for each country 4 quartiles Q
with quantile scores Q_Scores
.
df1:
Date ID C Score
20220102 A US 12.6
20220103 A US 11.3
20220104 A US 13.2
20220105 A US 14.5
20220102 B US 9.8
20220103 B US 19.8
20220104 B US 12.3
20220105 B US 15.1
20220102 C GB 13.5
20220103 C GB 14.5
20220104 C GB 11.5
20220105 C GB 14.8
df2:
Date C Q Q_Score
20220102 US 1 10
20220103 US 2 13
20220104 US 3 16
20220105 US 4 20
20220102 GB 1 12
20220103 GB 2 13
20220104 GB 3 14
20220105 GB 4 15
I try to lookup the quartile scores Q_Score
and create df3
with an additional column called Q_Scores
. A specific score should lookup the next bigger quartile score from df2
for a specific country. For example:
- 20220104 / A / US: Score = 13.2 --> next bigger quartile score on that date in the US is 16 --> Q-Score: 16
df3:
Date ID C Score Q_Score
20220102 A US 12.6 13
20220103 A US 11.3 13
20220104 A US 13.2 16
20220105 A US 14.5 16
20220102 B US 9.8 10
20220103 B US 19.8 20
20220104 B US 12.3 13
20220105 B US 15.1 16
20220102 C GB 13.5 14
20220103 C GB 14.5 15
20220104 C GB 11.5 12
20220105 C GB 14.8 15
Because the Score
and Q_Score
don't match, I wasn't able to do it with a simple pd.merge()
.
CodePudding user response:
You can use pd.merge_asof
, but you need some processing:
# two data must have the same data type
df2['Q_Score'] = df2['Q_Score'].astype('float64')
# keys must be sorted
pd.merge_asof(df1.sort_values('Score'),
df2.drop(['Date','Q'], axis=1).sort_values('Q_Score'),
by=['C'],
left_on='Score',
right_on='Q_Score',
direction='forward'
).sort_values(['ID','Date'])
Output:
Date ID C Score Q_Score
4 20220102 A US 12.6 13.0
1 20220103 A US 11.3 13.0
5 20220104 A US 13.2 16.0
7 20220105 A US 14.5 16.0
0 20220102 B US 9.8 10.0
11 20220103 B US 19.8 20.0
3 20220104 B US 12.3 13.0
10 20220105 B US 15.1 16.0
6 20220102 C GB 13.5 14.0
8 20220103 C GB 14.5 15.0
2 20220104 C GB 11.5 12.0
9 20220105 C GB 14.8 15.0