How to aggregate dataframe into a row-CodePudding

Given two dataframes df_1 and df_2, how to aggregate values of df_2 into rows of df_1 such that date in df_1 is between open and close in df_2

print df_1

  date          A          B
0 2021-11-01    0.020228   0.026572
1 2021-11-02    0.057780   0.175499
2 2021-11-03    0.098808   0.620986
3 2021-11-04    0.158789   1.014819
4 2021-11-05    0.038129   2.384590


print df_2

  open        close       location     division     size    
0 2021-11-07  2021-11-14  LDN          Alpha        120
1 2021-11-01  2021-11-14  PRS          Alpha        450
2 2021-10-14  2021-11-27  HK           Beta         340

I have tried this solution to joining my dataframes, now I need to find a way to aggregate. What I did so far is:

df_2.index = pd.IntervalIndex.from_arrays(df_2['open'],df_2['close'],closed='both')
df_1['events'] = df_1['date'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)])


print(calls['code'].iloc[0].groupby(['location', 'division'])['size'].sum())

location  division              
LDN       Alpha                     421.0
LDN       Beta                      515.0
NY        Alpha                     369.0
PRQ       Alpha                     132.0
          Gamma                     110.0

I need something that looks like this:

  date          A          B          LDN_Alpha   LDN_Beta   LDN_Gamma   PRS_Alpha   ...
0 2021-11-01    0.020228   0.026572   120         300        0           530
1 2021-11-02    0.057780   0.175499   ...
2 2021-11-03    0.098808   0.620986
3 2021-11-04    0.158789   1.014819
4 2021-11-05    0.038129   2.384590

Where the created columns are the sum of size grouped by location and division

CodePudding user response：

Idea is first repeat date range by open and close columns, add original columns from df_2 and then use DataFrame.pivot_table with DataFrame.join:

df_1['date'] = pd.to_datetime(df_1['date'])

s=pd.concat([pd.Series(r.Index,pd.date_range(r.open, r.close)) for r in df_2.itertuples()])
df = df_2.join(pd.Series(s.index, s).rename('date'))

df = df.pivot_table(index='date', 
                    columns=['location', 'division'], 
                    values='size', 
                    aggfunc='sum', 
                    fill_value=0)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')

df = df_1.join(df, on='date')
print (df)
        date         A         B  HK_Beta  LDN_Alpha  PRS_Alpha
0 2021-11-01  0.020228  0.026572      340          0        450
1 2021-11-02  0.057780  0.175499      340          0        450
2 2021-11-03  0.098808  0.620986      340          0        450
3 2021-11-04  0.158789  1.014819      340          0        450
4 2021-11-05  0.038129  2.384590      340          0        450