I have a vertical data frame that I am looking to make more horizontal by "duplicating" columns for each item in the groupby column. I have the following data frame:
pd.DataFrame({'posteam': {0: 'ARI', 1: 'ARI', 2: 'ARI', 3: 'ARI', 4: 'ARI'},
'offense_grouping': {0: 'personnel_00',
1: 'personnel_01',
2: 'personnel_02',
3: 'personnel_10',
4: 'personnel_11'},
'snap_ct': {0: 1, 1: 6, 2: 4, 3: 396, 4: 1441},
'personnel_epa': {0: 0.1539720594882965,
1: 0.7805194854736328,
2: -0.2678736448287964,
3: 0.1886662095785141,
4: 0.005721719935536385}})
And in its current state, there are 5 duplicate values in the 'posteam' column and 5 different values in the 'offense_grouping' column. Ideally, I would like to group by 'posteam' (so the team only has one row) and by 'offense_grouping'. Each 'offense_grouping' value is corresponded with 'snap_ct' and 'personnel_epa' values. I would like the end result of this group to be something like this:
posteam | personnel_00_snap_ct | personnel_00_personnel_epa | personnel_01_snap_ct | personnel_01_personnel_epa | personnel_02_snap_ct | personnel_02_personnel_epa |
---|---|---|---|---|---|---|
ARI | 1 | .1539... | 6 | .7805... | 4 | -.2679 |
And so on. How can this be achieved?
CodePudding user response:
Given the data you provide, the following would give the expected result. But there might be more complex cases in your data.
z = (
df
.set_index(['posteam', 'offense_grouping'])
.unstack('offense_grouping')
.swaplevel(axis=1)
.sort_index(axis=1, ascending=[True, False])
)
# or, alternatively (might be better if you have multiple values
# for some given indices./columns):
z = (
df
.pivot_table(index='posteam', columns='offense_grouping', values=['snap_ct', 'personnel_epa'])
.swaplevel(axis=1)
.sort_index(axis=1, ascending=[True, False])
)
>>> z
offense_grouping personnel_00 personnel_01 \
snap_ct personnel_epa snap_ct personnel_epa
posteam
ARI 1 0.153972 6 0.780519
offense_grouping personnel_02 personnel_10 \
snap_ct personnel_epa snap_ct personnel_epa
posteam
ARI 4 -0.267874 396 0.188666
offense_grouping personnel_11
snap_ct personnel_epa
posteam
ARI 1441 0.005722
Then you can join the two levels of columns:
res = z.set_axis([f'{b}_{a}' for a, b in z.columns], axis=1)
>>> res
snap_ct_personnel_00 personnel_epa_personnel_00 snap_ct_personnel_01 personnel_epa_personnel_01 snap_ct_personnel_02 personnel_epa_personnel_02 snap_ct_personnel_10 personnel_epa_personnel_10 snap_ct_personnel_11 personnel_epa_personnel_11
posteam
ARI 1 0.153972 6 0.780519 4 -0.267874 396 0.188666 1441 0.005722
```