I have a dataframe that looks like this:
ID grp date1
a 1 2014-03-27
a 2 2014-04---
b 1 2014-03-24
b 2 2014-04---
b 3 2014-05---
c 1 2014-04---
c 2 2014-04-30
The goal is to fill in the missing day for a given ID using either an earlier or later date and add/subtract 28 from that. The date1 variable was originally in the dataframe as a character type. I have tried to code it as follows:
mutate(rowwise(df),
newdate = case_when( str_count(date1, pattern = "\\W") >2 ~ lag(as.Date.character(date1, "%Y-%m-%d"),1) days(28)))
But I need to incorporate it by ID by grp.
An example of my data could be made like this:
ID <- c("a","a","b","b","b","c","c","d","d","d","e","e","f","f")
grp <- c(1,2,1,2,3,1,2,1,2,3,1,2,1,2)
date1 <- c("2014-03-27","2014-04---", "2014-03-24","2014-04---","2014-05---","2014-03---","2014-04-30","2013-02-01","2013-03---","2013-03---","2014-05-02","2014-06---","2014-01---","2014-02-20")
df <- as.data.frame(cbind(ID,grp,date1))
The output I am after would look like:
ID grp date1 date2
a 1 2014-03-27 2014-03-27
a 1 2014-04--- 2014-04-24
b 1 2014-03-24 2014-03-24
b 2 2014-04--- 2014-04-21
b 3 2014-05--- 2014-05-19
c 1 2014-04--- 2104-04-02
c 2 2014-04-30 2014-04-30
I have also started to make a long conditional, but I wanted to ask here and see if anyone new of a more straight forward way to do it, instead of explicitly writing out all of the possible conditions.
mutate(rowwise(df),
newdate = case_when(
grp == 1 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1) ~ lead(date1,1) - days(28),
grp == 2 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1)) ~ lead(date1,1) - days(28),
grp == 3 & str_count(date1, pattern = "\\W") >2 & ...)))
CodePudding user response:
Function to fill dates forward and backwards
filldates <- function(dates) {
m = which(!is.na(dates))
if(length(m)>0 & length(m)!=length(dates)) {
if(m[1]>1) for(i in seq(m,1,-1)) if(is.na(dates[i])) dates[i]=dates[i 1]-28
if(sum(is.na(dates))>0) for(i in seq_along(dates)) if(is.na(dates[i])) dates[i] = dates[i-1] 28
}
return(dates)
}
Usage:
data %>%
arrange(ID, grp) %>%
group_by(ID) %>%
mutate(date2=filldates(as.Date(date1,"%Y-%m-%d")))
Ouput:
ID grp date1 date2
<chr> <dbl> <chr> <date>
1 a 1 2014-03-27 2014-03-27
2 a 2 2014-04--- 2014-04-24
3 b 1 2014-03-24 2014-03-24
4 b 2 2014-04--- 2014-04-21
5 b 3 2014-05--- 2014-05-19
6 c 1 2014-03--- 2014-04-02
7 c 2 2014-04-30 2014-04-30
CodePudding user response:
An option using purrr::accumulate()
.
library(tidyverse)
center <- df %>%
group_by(ID) %>%
mutate(helpDate = ymd(str_replace(date1, '---', '-01')),
refDate = max(ymd(date1), na.rm = T))
backward <- center %>%
filter(refDate == max(helpDate)) %>%
mutate(date2 = accumulate(refDate, ~ . - days(28), .dir = 'backward'))
forward <- center %>%
filter(refDate == min(helpDate)) %>%
mutate(date2 = accumulate(refDate, ~ . days(28)))
bind_rows(forward, backward) %>%
ungroup() %>%
mutate(date2 = as_date(date2)) %>%
select(-c('helpDate', 'refDate'))
# # A tibble: 7 x 4
# ID grp date1 date2
# <chr> <int> <chr> <date>
# 1 a 1 2014-03-27 2014-03-27
# 2 a 2 2014-04--- 2014-04-24
# 3 b 1 2014-03-24 2014-03-24
# 4 b 2 2014-04--- 2014-04-21
# 5 b 3 2014-05--- 2014-05-19
# 6 c 1 2014-03--- 2014-04-02
# 7 c 2 2014-04-30 2014-04-30