extract week columns from date in pandas-CodePudding

I have a dataframe that has columns like these:

Date          earnings    workingday   length_week  first_wday_week last_wdayweek            

01.01.2000    10000              1              1                            
02.01.2000    0                  0              1                                 
03.01.2000    0                  0              2                                   
04.01.2000    0                  0              2                                   
05.01.2000    0                  0              2                                   
06.01.2000    23000              1              2                                  
07.01.2000    1000               1              2                                  
08.01.2000    0                  0              2                                   
09.01.2000    0                  0              2                                  
..
..
..
30.01.2000    0                  0              0                                   
31.01.2000     0                 1              3                                   
01.02.2000     0                 1              3                                  
02.02.2000    2500               1              3

working day indicates there earnings present on that particular day. I am trying to generate last three column from the date.

length_week : gives number of working days in that week

first_working_day_of_week : 1 if its first working day of a week

last_working_day_of_week : 1 if its last working day of a week

Can anyone help me with this?

CodePudding user response：

I first changed the format of your date column as pd.to_datetime couldn't infer the right date format:

df.Date.str.replace('.', '-', regex=True)
df.Date = pd.to_datetime(df.Date, format='%d-%m-%Y')

Then use isocalendar so that we can work with weeks and days more easily:

df[['year', 'week', 'weekday']] = df.Date.dt.isocalendar()

Now length_week is just the sum of workingdays for each seperate weeks:

df['length_week'] = df.groupby(['year', 'week']).workingday.transform('sum')

and we can get frst_worday_week with idxmax:

min_indexes = df.groupby(['year', 'week'], as_index=False).workingday.transform('idxmax')
df['frst_worday_week'] = np.where(df.index == min_indexes.workingday, 1, 0)

Lastly, last_workdayweek is similar but a bit tricky. We need the last occurence of idxmax, so we will reverse each week inside groupby:

max_indexes = df.groupby(['year', 'week'], as_index=False).\
                        workingday.transform(lambda x: x[::-1].idxmax()) 
df['last_workdayweek'] = np.where(df.index == max_indexes.workingday, 1, 0)