I have a DataFrame with employee time sheets, total hour worked as well as overtime hours.
example = pd.DataFrame({'Employee': ["Alex", "Alex", "Alex", "Bob", "Peter", "Peter"], 'date': ['2021-01-01', '2021-01-01', '2021-01-03', '2021-01-02', '2021-01-01', '2021-01-02'],
'Total Hour': [1.5, 2.2, 7, 1, 3, 6], 'Overtime': [1.5, 0, 1.2, 2.3, 1.7, 5]})
print(example)
Employee date Total Hour Overtime
0 Alex 2021-01-01 1.5 1.5
1 Alex 2021-01-01 2.2 0.0
2 Alex 2021-01-03 7.0 1.2
3 Bob 2021-01-02 1.0 2.3
4 Peter 2021-01-01 3.0 1.7
5 Peter 2021-01-02 6.0 5.0
I want to create a monthly DataFrame with all the days of the month and fill only available timesheets in it, something like this:
2021-01-01 2021-01-02 2021-01-03 2021-01-04 2021-01-05 ... 2021-01-31
Employee
0 Alex Total Hour 3.7 7.0
1 Alex Overtime 1.5 1.2
2 Bob Total Hour 1
3 Bob Overtime 2.3
4 Peter Total Hour 3.0 6
5 Peter Overtime 1.7 5
So as employees enter their hours, this will become more complete.
I tried to figure it out but I think I am missing something very fundamental here.
CodePudding user response:
Let's use pivot_table
then stack
level 0 to convert column MultiIndex to row MultiIndex:
result_df = example.pivot_table(
index='Employee',
columns='date',
values=['Total Hour', 'Overtime'],
aggfunc='sum'
).stack(level=0)
Or equivalently with groupby sum
then stack
unstack
to swap column and row MultiIndexes:
result_df = example.groupby(['Employee', 'date']).sum().stack().unstack(level=1)
result_df
:
date 2021-01-01 2021-01-02 2021-01-03
Employee
Alex Overtime 1.5 NaN 1.2
Total Hour 3.7 NaN 7.0
Bob Overtime NaN 2.3 NaN
Total Hour NaN 1.0 NaN
Peter Overtime 1.7 5.0 NaN
Total Hour 3.0 6.0 NaN
Either approach can be followed with reindex
to order level 1 so that Overtime appears first and Total Hour appears second per Employee. Then rename_axis
and reset_index
for some clean up of index and column labels:
result_df = result_df.reindex(
['Overtime', 'Total Hour'], level=1
).rename_axis(
index=['Employee', 'Hours'], columns=None
).reset_index()
result_df
:
Employee Hours 2021-01-01 2021-01-02 2021-01-03
0 Alex Overtime 1.5 NaN 1.2
1 Alex Total Hour 3.7 NaN 7.0
2 Bob Overtime NaN 2.3 NaN
3 Bob Total Hour NaN 1.0 NaN
4 Peter Overtime 1.7 5.0 NaN
5 Peter Total Hour 3.0 6.0 NaN