I got the following df which is kind of an absence/vacation table of employees:
Index Name Start End
0 Eric 20.07.2022 20.07.2022
1 Ric 21.07.2022 17.08.2022
2 Seth 15.07.2022 21.07.2022
3 Eric 16.07.2022 16.07.2022
...
The final df should look like this and should give information whether an employee is on vacation or not:
Name 15.07.2022 16.07.2022 17.07.2022 18.07.2022 19.07.2022 20.07.2022 21.07.2022 22.07.2022 ... 17.08.2022
Eric No Yes No No No Yes No No No
Ric No No No No No No Yes Yes Yes
Seth Yes Yes Yes Yes Yes Yes Yes No No
Any suggestions how to do this? I tried it with explode
and groupby
, but failed so far.
CodePudding user response:
You can use:
# ensure datetime
df[['Start', 'End']] = df[['Start', 'End']].apply(pd.to_datetime, dayfirst=True)
out = (df
# assign range of dates in the desired format
.assign(date=[pd.date_range(s, e, freq='D').strftime('%d.%m.%Y')
for s,e in zip(df['Start'], df['End'])],
value='Yes' # assign default "Yes"
)
.explode('date') # create rows
.pivot_table(index='Name', columns='date', values='value',
# get first "Yes", fill missing with "No"
aggfunc='first', fill_value='No')
# cosmetic changes
.rename_axis(columns=None).reset_index()
)
output:
Name 01.08.2022 02.08.2022 03.08.2022 04.08.2022 05.08.2022 06.08.2022 07.08.2022 08.08.2022 09.08.2022 10.08.2022 11.08.2022 12.08.2022 13.08.2022 14.08.2022 15.07.2022 15.08.2022 16.07.2022 16.08.2022 17.07.2022 17.08.2022 18.07.2022 19.07.2022 20.07.2022 21.07.2022 22.07.2022 23.07.2022 24.07.2022 25.07.2022 26.07.2022 27.07.2022 28.07.2022 29.07.2022 30.07.2022 31.07.2022
0 Eric No No No No No No No No No No No No No No No No Yes No No No No No Yes No No No No No No No No No No No
1 Ric Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes No Yes No Yes No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
2 Seth No No No No No No No No No No No No No No Yes No Yes No Yes No Yes Yes Yes Yes No No No No No No No No No No