Home > Software engineering >  Record Average Consecutive Days by Group
Record Average Consecutive Days by Group

Time:12-02

I have a tibble with the following columns with over a million rows:

   Emp_ID Start_Date Consec_Days
    <int> <date>           <dbl>
 1   4    2018-02-19          NA
 2   4    2018-02-20           1
 3   4    2018-02-21           1
 4   3    2016-12-26          NA
 5   3    2017-01-02           0
 6   3    2017-01-03           1
 7   3    2017-01-04           1
 8   3    2017-01-06           0
 9   3    2017-01-07           1
10   3    2017-01-09           0

Start_Date is the day an employee goes to work. I'd like to create a column that calculates the average number of consecutive days worked for each unique employee ID. Each NA value denotes the very first day a particular employee went to work. The problem is I don't know how to account for the NA values and 0's in the Consec_Days column to get an average of consecutive days.

I've tried referencing the following posts but haven't been able to make it work on my end:

How to find mean of n consecutive days in each group r

Any help would be greatly appreciated!

CodePudding user response:

Something like this?

df %>% 
  group_by(Emp_ID) %>% 
  mutate(average_cons_days = sum(Consec_Days, na.rm = TRUE)/n())
   Emp_ID Start_Date Consec_Days average_cons_days
    <int> <chr>            <int>             <dbl>
 1      4 2018-02-19          NA             0.667
 2      4 2018-02-20           1             0.667
 3      4 2018-02-21           1             0.667
 4      3 2016-12-26          NA             0.429
 5      3 2017-01-02           0             0.429
 6      3 2017-01-03           1             0.429
 7      3 2017-01-04           1             0.429
 8      3 2017-01-06           0             0.429
 9      3 2017-01-07           1             0.429
10      3 2017-01-09           0             0.429
  • Related