I need to calculate Total Hours and Hours by Status per Week using Python / Pandas GROUP BY.
Id Week Status Hours
1 01/10/2022 - 01/16/2022 On 5
2 01/10/2022 - 01/16/2022 Off 2
3 01/17/2022 - 01/23/2022 Off 6
4 01/17/2022 - 01/23/2022 On 1
5 01/17/2022 - 01/23/2022 On 5
6 01/03/2022 - 01/09/2022 On 10
7 01/10/2022 - 01/16/2022 Off 9
8 01/03/2022 - 01/09/2022 On 3
9 01/24/2022 - 01/30/2022 Off 4
10 01/24/2022 - 01/30/2022 On 7
test_data = {'Id': [1,2,3,4,5,6,7,8,9,10],
'Week': ['01/10/2022 - 01/16/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/03/2022 - 01/09/2022', '01/24/2022 - 01/30/2022', '01/24/2022 - 01/30/2022'],
'Status': ['On', 'Off', 'Off', 'On', 'On', 'On', 'Off', 'On', 'Off', 'On'],
'Hours': [5,2,6,1,5,10,9,3,4,7]}
test_df = pd.DataFrame(data=test_data)
I can get Total Hours by each Week:
test_df.groupby(by=['Week'], as_index=False).agg({"Hours": "sum"})
But I don't know how to also group by Status, so it will be 2 additional columns (On Status Hours and Off Status Hours)
If I add Status column just to the groupby part, it creates extra rows (I understand why)
test_df.groupby(by=['Week', 'Status'], as_index=False).agg({"Hours": "sum"})
Output I want:
Week | Total Hours | On Status Hours | Off Status Hours |
---|---|---|---|
01/03/2022 - 01/09/2022 | 13 | 13 | 0 |
01/10/2022 - 01/16/2022 | 16 | 5 | 11 |
01/17/2022 - 01/23/2022 | 12 | 6 | 6 |
01/24/2022 - 01/30/2022 | 11 | 7 | 4 |
CodePudding user response:
You can use:
(test_df
.groupby(['Week', 'Status'])['Hours']
.sum()
.unstack(1, fill_value=0)
.add_suffix(' Status Hours')
.assign(**{'Total Hours': lambda d: d.sum(1)})
)
Output:
Status Off Status Hours On Status Hours Total Hours
Week
01/03/2022 - 01/09/2022 0 13 13
01/10/2022 - 01/16/2022 11 5 16
01/17/2022 - 01/23/2022 6 6 12
01/24/2022 - 01/30/2022 4 7 11
CodePudding user response:
You can use pd.pivot_table
to get your result:
x = pd.pivot_table(
test_df,
index="Week",
columns="Status",
values="Hours",
aggfunc="sum",
fill_value=0,
).add_suffix(" Status Hours")
x["Total Hours"] = x.sum(axis=1)
print(x)
Prints:
Status Off Status Hours On Status Hours Total Hours
Week
01/03/2022 - 01/09/2022 0 13 13
01/10/2022 - 01/16/2022 11 5 16
01/17/2022 - 01/23/2022 6 6 12
01/24/2022 - 01/30/2022 4 7 11