Given two dataframes df_1
and df_2
, how to aggregate values of df_2
into rows of df_1
such that date
in df_1
is between open
and close
in df_2
print df_1
date A B
0 2021-11-01 0.020228 0.026572
1 2021-11-02 0.057780 0.175499
2 2021-11-03 0.098808 0.620986
3 2021-11-04 0.158789 1.014819
4 2021-11-05 0.038129 2.384590
print df_2
open close location division size
0 2021-11-07 2021-11-14 LDN Alpha 120
1 2021-11-01 2021-11-14 PRS Alpha 450
2 2021-10-14 2021-11-27 HK Beta 340
I have tried this solution to joining my dataframes, now I need to find a way to aggregate. What I did so far is:
df_2.index = pd.IntervalIndex.from_arrays(df_2['open'],df_2['close'],closed='both')
df_1['events'] = df_1['date'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)])
print(calls['code'].iloc[0].groupby(['location', 'division'])['size'].sum())
location division
LDN Alpha 421.0
LDN Beta 515.0
NY Alpha 369.0
PRQ Alpha 132.0
Gamma 110.0
I need something that looks like this:
date A B LDN_Alpha LDN_Beta LDN_Gamma PRS_Alpha ...
0 2021-11-01 0.020228 0.026572 120 300 0 530
1 2021-11-02 0.057780 0.175499 ...
2 2021-11-03 0.098808 0.620986
3 2021-11-04 0.158789 1.014819
4 2021-11-05 0.038129 2.384590
Where the created columns are the sum of size
grouped by location
and division
CodePudding user response:
Idea is first repeat date range by open
and close
columns, add original columns from df_2
and then use DataFrame.pivot_table
with DataFrame.join
:
df_1['date'] = pd.to_datetime(df_1['date'])
s=pd.concat([pd.Series(r.Index,pd.date_range(r.open, r.close)) for r in df_2.itertuples()])
df = df_2.join(pd.Series(s.index, s).rename('date'))
df = df.pivot_table(index='date',
columns=['location', 'division'],
values='size',
aggfunc='sum',
fill_value=0)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df_1.join(df, on='date')
print (df)
date A B HK_Beta LDN_Alpha PRS_Alpha
0 2021-11-01 0.020228 0.026572 340 0 450
1 2021-11-02 0.057780 0.175499 340 0 450
2 2021-11-03 0.098808 0.620986 340 0 450
3 2021-11-04 0.158789 1.014819 340 0 450
4 2021-11-05 0.038129 2.384590 340 0 450