Home > Enterprise >  Joining dataframe on itself with different filtering
Joining dataframe on itself with different filtering

Time:10-18

I have a df that follows this structure:

store       day               type              sales       orders

amazon      2021-10-10        web               10          1
amazon      2021-10-10        retail            500         50
facebook    2021-10-10        retail            300         50
facebook    2021-09-05        retail            10          50
apple       2021-09-01        web               5           1
uber        2021-08-01        web               50          1
uber        2021-08-01        retail            60          1
...

I am trying to have a df_res that would have sales & orders by store, day & type weekly so that the output would look like so:

day             type      sales_amazon        orders_amazon       sales_facebook       orders_facebook     sales_apple        orders_apple        sales_uber      orders_uber
2021-08-01      web       0                   0                   0                    0                   0                  0                   50              1     
2021-08-01      rtail     0                   0                   0                    0                   0                  0                   60              1    
2021-10-10      web       10                  1                   0                    0                   0                  0                   0               0      
2021-10-10      retail    500                 50                  300                  50                  0                  0                   0               0 
...   

I tried:

# main df to be joined on
df_res = df[df.store.isin(['amazon'])].groupby(['store','type', pd.Grouper(key = 'day', freq = 'W-MON', 
                                               label = 'right')])[['store','day','orders','sales','type']].sum().reset_index()

# merging on main df each store df
for branch in ['facebook', 'apple', 'apple', 'uber']:
    df_res = df_res.merge(df[df.store.isin([branch])].groupby(['store','type', pd.Grouper(key = 'day', freq = 'W-MON',
                          label = 'right')])[['store','day','orders','sales','type']].sum().reset_index(), 
                          on =['day','type'], suffixes= [f'_{branch}', f'_{branch}'], how = 'outer')

But this does not produce the structure I desire, I tried using join but that throws a different length error because there are occasions when there is no sale for a particular date & type combination for a given store.

CodePudding user response:

You can pivot and rework the MultiIndex:

df2 = (df.pivot_table(index=['day', 'type'], columns='store',
                      values=['sales', 'orders'], fill_value=0)
         .sort_index(axis=1, level=1)
       )

df2.columns = df2.columns.map('_'.join)
df2.reset_index()

output:

          day    type  orders_amazon  sales_amazon  orders_apple  sales_apple  orders_facebook  sales_facebook  orders_uber  sales_uber
0  2021-08-01  retail              0             0             0            0                0               0            1          60
1  2021-08-01     web              0             0             0            0                0               0            1          50
2  2021-09-01     web              0             0             1            5                0               0            0           0
3  2021-09-05  retail              0             0             0            0               50              10            0           0
4  2021-10-10  retail             50           500             0            0               50             300            0           0
5  2021-10-10     web              1            10             0            0                0               0            0           0
  • Related