Home > Enterprise >  Pandas Groupby and generate "duplicate" columns for each groupby value
Pandas Groupby and generate "duplicate" columns for each groupby value

Time:10-24

I have a vertical data frame that I am looking to make more horizontal by "duplicating" columns for each item in the groupby column. I have the following data frame:

    pd.DataFrame({'posteam': {0: 'ARI', 1: 'ARI', 2: 'ARI', 3: 'ARI', 4: 'ARI'},
     'offense_grouping': {0: 'personnel_00',
      1: 'personnel_01',
      2: 'personnel_02',
      3: 'personnel_10',
      4: 'personnel_11'},
     'snap_ct': {0: 1, 1: 6, 2: 4, 3: 396, 4: 1441},
     'personnel_epa': {0: 0.1539720594882965,
      1: 0.7805194854736328,
      2: -0.2678736448287964,
      3: 0.1886662095785141,
      4: 0.005721719935536385}})

And in its current state, there are 5 duplicate values in the 'posteam' column and 5 different values in the 'offense_grouping' column. Ideally, I would like to group by 'posteam' (so the team only has one row) and by 'offense_grouping'. Each 'offense_grouping' value is corresponded with 'snap_ct' and 'personnel_epa' values. I would like the end result of this group to be something like this:

posteam personnel_00_snap_ct personnel_00_personnel_epa personnel_01_snap_ct personnel_01_personnel_epa personnel_02_snap_ct personnel_02_personnel_epa
ARI 1 .1539... 6 .7805... 4 -.2679

And so on. How can this be achieved?

CodePudding user response:

Given the data you provide, the following would give the expected result. But there might be more complex cases in your data.

z = (
    df
    .set_index(['posteam', 'offense_grouping'])
    .unstack('offense_grouping')
    .swaplevel(axis=1)
    .sort_index(axis=1, ascending=[True, False])
)

# or, alternatively (might be better if you have multiple values
# for some given indices./columns):

z = (
    df
    .pivot_table(index='posteam', columns='offense_grouping', values=['snap_ct', 'personnel_epa'])
    .swaplevel(axis=1)
    .sort_index(axis=1, ascending=[True, False])
)
>>> z
offense_grouping personnel_00               personnel_01                \
                      snap_ct personnel_epa      snap_ct personnel_epa   
posteam                                                                  
ARI                         1      0.153972            6      0.780519   

offense_grouping personnel_02               personnel_10                \
                      snap_ct personnel_epa      snap_ct personnel_epa   
posteam                                                                  
ARI                         4     -0.267874          396      0.188666   

offense_grouping personnel_11                
                      snap_ct personnel_epa  
posteam                                      
ARI                      1441      0.005722 

Then you can join the two levels of columns:

res = z.set_axis([f'{b}_{a}' for a, b in z.columns], axis=1)
>>> res
         snap_ct_personnel_00  personnel_epa_personnel_00  snap_ct_personnel_01  personnel_epa_personnel_01  snap_ct_personnel_02  personnel_epa_personnel_02  snap_ct_personnel_10  personnel_epa_personnel_10  snap_ct_personnel_11  personnel_epa_personnel_11
posteam                                                                                                                                                                                                                                                          
ARI      1                     0.153972                    6                     0.780519                    4                    -0.267874                    396                   0.188666                    1441                  0.005722                  
​```
  • Related