Home > database >  Imputing date based on next(or previous) available date grouped by another column
Imputing date based on next(or previous) available date grouped by another column

Time:02-17

I have a dataframe that looks like this:

ID grp    date1
a   1   2014-03-27
a   2   2014-04---
b   1   2014-03-24
b   2   2014-04---
b   3   2014-05---
c   1   2014-04---
c   2   2014-04-30

The goal is to fill in the missing day for a given ID using either an earlier or later date and add/subtract 28 from that. The date1 variable was originally in the dataframe as a character type. I have tried to code it as follows:

mutate(rowwise(df),
       newdate = case_when( str_count(date1, pattern = "\\W") >2 ~ lag(as.Date.character(date1, "%Y-%m-%d"),1)   days(28)))

But I need to incorporate it by ID by grp.

An example of my data could be made like this:

ID  <- c("a","a","b","b","b","c","c","d","d","d","e","e","f","f")
grp <- c(1,2,1,2,3,1,2,1,2,3,1,2,1,2)
date1 <- c("2014-03-27","2014-04---", "2014-03-24","2014-04---","2014-05---","2014-03---","2014-04-30","2013-02-01","2013-03---","2013-03---","2014-05-02","2014-06---","2014-01---","2014-02-20")
df <- as.data.frame(cbind(ID,grp,date1))

The output I am after would look like:

ID  grp      date1        date2
  a    1    2014-03-27  2014-03-27
  a    1    2014-04---  2014-04-24
  b    1    2014-03-24  2014-03-24
  b    2    2014-04---  2014-04-21
  b    3    2014-05---  2014-05-19
  c    1    2014-04---  2104-04-02
  c    2    2014-04-30  2014-04-30

I have also started to make a long conditional, but I wanted to ask here and see if anyone new of a more straight forward way to do it, instead of explicitly writing out all of the possible conditions.

mutate(rowwise(df),
              newdate = case_when(
                              grp == 1 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1)  ~ lead(date1,1) - days(28),
                              grp == 2 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1)) ~ lead(date1,1) - days(28),
                              grp == 3 & str_count(date1, pattern = "\\W") >2 & ...)))

CodePudding user response:

Function to fill dates forward and backwards

filldates <- function(dates) {
  m = which(!is.na(dates))
  if(length(m)>0 & length(m)!=length(dates)) {
    if(m[1]>1) for(i in seq(m,1,-1)) if(is.na(dates[i])) dates[i]=dates[i 1]-28
    if(sum(is.na(dates))>0) for(i in seq_along(dates)) if(is.na(dates[i])) dates[i] = dates[i-1] 28
  } 
  return(dates)
}

Usage:

data %>%
  arrange(ID, grp) %>%
  group_by(ID) %>%
  mutate(date2=filldates(as.Date(date1,"%Y-%m-%d")))

Ouput:

  ID      grp date1      date2   
  <chr> <dbl> <chr>      <date>    
1 a         1 2014-03-27 2014-03-27
2 a         2 2014-04--- 2014-04-24
3 b         1 2014-03-24 2014-03-24
4 b         2 2014-04--- 2014-04-21
5 b         3 2014-05--- 2014-05-19
6 c         1 2014-03--- 2014-04-02
7 c         2 2014-04-30 2014-04-30

CodePudding user response:

An option using purrr::accumulate().

library(tidyverse)

center <- df %>%
  group_by(ID) %>%
  mutate(helpDate = ymd(str_replace(date1, '---', '-01')),
         refDate = max(ymd(date1), na.rm = T))

backward <- center %>%
  filter(refDate == max(helpDate)) %>%
  mutate(date2 = accumulate(refDate, ~ . - days(28), .dir = 'backward'))

forward <- center %>%
  filter(refDate == min(helpDate)) %>%
  mutate(date2 = accumulate(refDate, ~ .   days(28)))

bind_rows(forward, backward) %>%
  ungroup() %>%
  mutate(date2 = as_date(date2)) %>%
  select(-c('helpDate', 'refDate'))

# # A tibble: 7 x 4
#   ID      grp date1      date2     
#   <chr> <int> <chr>      <date>    
# 1 a         1 2014-03-27 2014-03-27
# 2 a         2 2014-04--- 2014-04-24
# 3 b         1 2014-03-24 2014-03-24
# 4 b         2 2014-04--- 2014-04-21
# 5 b         3 2014-05--- 2014-05-19
# 6 c         1 2014-03--- 2014-04-02
# 7 c         2 2014-04-30 2014-04-30
  • Related