Home > Mobile >  Is there an R function that calculates the amount of days since the most recent date in a data frame
Is there an R function that calculates the amount of days since the most recent date in a data frame

Time:12-17

I have the data frame below and want to calculate the number of days since the student's last (most recent) absence from the present date and add that to the original data frame.

 Student ID       Absent Date       Subject        

    4567           08/30/2018          M
    4567           09/22/2019          M
    8345           09/01/2019          S
    8345           03/30/2019         PE         
    8345           07/18/2017          M
    5601           01/08/2019         SS

This is the desired output:

 Student ID       Absent Date       Subject       # of Days Since Last Absence            

    4567           08/30/2018          M                 816
    4567           09/22/2019          M                 816
    8345           09/01/2019          S                 837
    8345           03/30/2019         PE                 837        
    8345           07/18/2017          M                 837
    5601           01/08/2019         SS                 1073

Thank you for any help.

CodePudding user response:

Package dplyr and a max on the date will give you the answer. current date in R is Sys.Date()

library(dplyr)

df1 %>% 
  group_by(Student_ID) %>% 
  mutate(days_since_last_absence = Sys.Date() -  max(as.Date(Absent_Date, format = "%m/%d/%Y")))


# A tibble: 6 x 4
# Groups:   Student_ID [3]
  Student_ID Absent_Date Subject days_since_last_absence
       <int> <chr>       <chr>   <drtn>                 
1       4567 08/30/2018  M        816 days              
2       4567 09/22/2019  M        816 days              
3       8345 09/01/2019  S        837 days              
4       8345 03/30/2019  PE       837 days              
5       8345 07/18/2017  M        837 days              
6       5601 01/08/2019  SS      1073 days  

data:

df1 <- structure(list(Student_ID = c(4567L, 4567L, 8345L, 8345L, 8345L, 
5601L), Absent_Date = c("08/30/2018", "09/22/2019", "09/01/2019", 
"03/30/2019", "07/18/2017", "01/08/2019"), Subject = c("M", "M", 
"S", "PE", "M", "SS")), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

Assuming your dataframe is called df, in base R, try

df$difference<-as.numeric(difftime(Sys.Date(),as.POSIXct(paste(df[,2]),format="%m/%d/%Y"),tz="UTC"))

tz has to be adjusted for your location.

  • Related