Pandas Groupby and generate "duplicate" columns for each groupby value-CodePudding

I have a vertical data frame that I am looking to make more horizontal by "duplicating" columns for each item in the groupby column. I have the following data frame:

    pd.DataFrame({'posteam': {0: 'ARI', 1: 'ARI', 2: 'ARI', 3: 'ARI', 4: 'ARI'},
     'offense_grouping': {0: 'personnel_00',
      1: 'personnel_01',
      2: 'personnel_02',
      3: 'personnel_10',
      4: 'personnel_11'},
     'snap_ct': {0: 1, 1: 6, 2: 4, 3: 396, 4: 1441},
     'personnel_epa': {0: 0.1539720594882965,
      1: 0.7805194854736328,
      2: -0.2678736448287964,
      3: 0.1886662095785141,
      4: 0.005721719935536385}})

And in its current state, there are 5 duplicate values in the 'posteam' column and 5 different values in the 'offense_grouping' column. Ideally, I would like to group by 'posteam' (so the team only has one row) and by 'offense_grouping'. Each 'offense_grouping' value is corresponded with 'snap_ct' and 'personnel_epa' values. I would like the end result of this group to be something like this:

posteam	personnel_00_snap_ct	personnel_00_personnel_epa	personnel_01_snap_ct	personnel_01_personnel_epa	personnel_02_snap_ct	personnel_02_personnel_epa
ARI	1	.1539...	6	.7805...	4	-.2679

And so on. How can this be achieved?

CodePudding user response：

Given the data you provide, the following would give the expected result. But there might be more complex cases in your data.

z = (
    df
    .set_index(['posteam', 'offense_grouping'])
    .unstack('offense_grouping')
    .swaplevel(axis=1)
    .sort_index(axis=1, ascending=[True, False])
)

# or, alternatively (might be better if you have multiple values
# for some given indices./columns):

z = (
    df
    .pivot_table(index='posteam', columns='offense_grouping', values=['snap_ct', 'personnel_epa'])
    .swaplevel(axis=1)
    .sort_index(axis=1, ascending=[True, False])
)
>>> z
offense_grouping personnel_00               personnel_01                \
                      snap_ct personnel_epa      snap_ct personnel_epa   
posteam                                                                  
ARI                         1      0.153972            6      0.780519   

offense_grouping personnel_02               personnel_10                \
                      snap_ct personnel_epa      snap_ct personnel_epa   
posteam                                                                  
ARI                         4     -0.267874          396      0.188666   

offense_grouping personnel_11                
                      snap_ct personnel_epa  
posteam                                      
ARI                      1441      0.005722

Then you can join the two levels of columns:

res = z.set_axis([f'{b}_{a}' for a, b in z.columns], axis=1)
>>> res
         snap_ct_personnel_00  personnel_epa_personnel_00  snap_ct_personnel_01  personnel_epa_personnel_01  snap_ct_personnel_02  personnel_epa_personnel_02  snap_ct_personnel_10  personnel_epa_personnel_10  snap_ct_personnel_11  personnel_epa_personnel_11
posteam                                                                                                                                                                                                                                                          
ARI      1                     0.153972                    6                     0.780519                    4                    -0.267874                    396                   0.188666                    1441                  0.005722                  
```