I have a dataframe that has columns like these:
Date earnings workingday length_week first_wday_week last_wdayweek
01.01.2000 10000 1 1
02.01.2000 0 0 1
03.01.2000 0 0 2
04.01.2000 0 0 2
05.01.2000 0 0 2
06.01.2000 23000 1 2
07.01.2000 1000 1 2
08.01.2000 0 0 2
09.01.2000 0 0 2
..
..
..
30.01.2000 0 0 0
31.01.2000 0 1 3
01.02.2000 0 1 3
02.02.2000 2500 1 3
working day indicates there earnings present on that particular day. I am trying to generate last three column from the date.
length_week : gives number of working days in that week
first_working_day_of_week : 1 if its first working day of a week
last_working_day_of_week : 1 if its last working day of a week
Can anyone help me with this?
CodePudding user response:
I first changed the format of your date
column as pd.to_datetime
couldn't infer the right date format:
df.Date.str.replace('.', '-', regex=True)
df.Date = pd.to_datetime(df.Date, format='%d-%m-%Y')
Then use isocalendar
so that we can work with weeks and days more easily:
df[['year', 'week', 'weekday']] = df.Date.dt.isocalendar()
Now length_week
is just the sum of workingdays
for each seperate weeks:
df['length_week'] = df.groupby(['year', 'week']).workingday.transform('sum')
and we can get frst_worday_week
with idxmax
:
min_indexes = df.groupby(['year', 'week'], as_index=False).workingday.transform('idxmax')
df['frst_worday_week'] = np.where(df.index == min_indexes.workingday, 1, 0)
Lastly, last_workdayweek
is similar but a bit tricky. We need the last occurence of idxmax
, so we will reverse each week inside groupby
:
max_indexes = df.groupby(['year', 'week'], as_index=False).\
workingday.transform(lambda x: x[::-1].idxmax())
df['last_workdayweek'] = np.where(df.index == max_indexes.workingday, 1, 0)