Home > Enterprise >  group/merge/pivot data by varied weight ranges in Pandas
group/merge/pivot data by varied weight ranges in Pandas

Time:05-25

Is there a way in Pandas to fit in the value according to weight ranges when pivoting the dataframe? I see some answers with setting bins but these are varied weight ranges depending on how the data is entered.

Here's my dataset.

import pandas as pd
df = pd.DataFrame({'tier': [1,1,1,1,1,1,1,1,1], 
    'services': ["A","A","A","A","A","A","A","A","A"],
    'weight_start': [1,61,161,201,1,1,61,161,201],
    'weight_end': [60,160,200,500,500,60,160,200,500],
    'location': [1,1,1,1,2,3,3,3,3],
    'discount': [70,30,10,0,0,60,20,5,0]})
pivot_df = df.pivot(index=['tier','services','weight_start','weight_end'],columns='location',values='discount')
display(pivot_df)

Output: df

Desired Output: desired df

Since location 2 is 0 percent covering the ranges 1 to 500, I want it to populate 0 based on the ranges prescribed for tier 1 service A instead of having its own row.

Edit: Mozway's answer works when there is one service. When I added a second service, the dataframe ungrouped.

Here's the new dataset with service B.

import pandas as pd
df = pd.DataFrame({'tier': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], 
    'services': ["A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B"],
    'weight_start': [1,61,161,201,1,1,61,161,201,1,1,81,101,1,61,161,201],
    'weight_end': [60,160,200,500,500,60,160,200,500,500,80,100,200,60,160,200,500],
    'location': [1,1,1,1,2,3,3,3,3,1,2,2,2,3,3,3,3],
    'discount': [70,30,10,0,0,60,20,5,0,50,70,50,10,65,55,45,5]})
pivot_df = df.pivot(index=['tier','services','weight_start','weight_end'],columns='location',values='discount')
display(pivot_df)

Output:

                                                  location  1    2    3
tier    services    weight_start    weight_end          
1              A               1            60           70.0   NaN   60.0
                                            500          NaN    0.0   NaN
                              61            160          30.0   NaN   20.0
                             161            200          10.0   NaN   5.0
                             201            500          0.0    NaN   0.0
               B               1            60           NaN    NaN   65.0
                                            80           NaN    70.0  NaN
                                            500          50.0   NaN   NaN
                              61            160          NaN    NaN   55.0
                              81            100          NaN    50.0    NaN
                             101            200          NaN    10.0    NaN
                             161            200          NaN    NaN    45.0
                             201            500          NaN    NaN    5.0

Desired Output:

                                                  location  1    2    3
tier    services    weight_start    weight_end          
1              A               1            60           70.0   0.0   60.0
                              61            160          30.0   0.0   20.0
                             161            200          10.0   0.0   5.0
                             201            500          0.0    0.0   0.0
               B               1            60           50     70    65.0
                                            80           50     70.0  55
                              61            160          50     NaN   55.0
                              81            100          50     50.0  55
                             101            200          50     10.0  NaN
                             161            200          50     10     45.0
                             201            500          50     NaN    5.0

CodePudding user response:

This will work

data = (df.set_index(['tier','services','weight_start','weight_end'])
    .pivot(columns='location')['discount']
    .reset_index()
    .rename_axis(None, axis=1)
 )

CodePudding user response:

IIUC, you can (temporarily) exclude the columns with 0/nan and check if all remaining values are only NaNs per row. If so, drop those rows:

mask = ~pivot_df.loc[:, pivot_df.any()].isna().all(1)
out = pivot_df[mask].fillna(0)

output:

location                                  1    2     3
tier services weight_start weight_end                 
1    A        1            60          70.0  0.0  60.0
              61           160         30.0  0.0  20.0
              161          200         10.0  0.0   5.0
              201          500          0.0  0.0   0.0

per group:

def drop(d):
    mask = ~ d.loc[:, d.any()].isna().all(1)
    return d[mask].fillna(0)
out = pivot_df.groupby(['services']).apply(drop)

output:

location                                           1     2     3
services tier services weight_start weight_end                  
A        1    A        1            60          70.0   0.0  60.0
                       61           160         30.0   0.0  20.0
                       161          200         10.0   0.0   5.0
                       201          500          0.0   0.0   0.0
B        1    B        1            60           0.0   0.0  65.0
                                    80           0.0  70.0   0.0
                                    500         50.0   0.0   0.0
                       61           160          0.0   0.0  55.0
                       81           100          0.0  50.0   0.0
                       101          200          0.0  10.0   0.0
                       161          200          0.0   0.0  45.0
                       201          500          0.0   0.0   5.0

  • Related