I need to find the previous date for which value is not NA and then also use the value on that row. I have tried to use shift, but I have met a problem because shift works well for row 9 but not for when there are consecutive non-NAs on type, such as on rows 5,6.
dtihave = data.table(date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-03", "2020-04-02", "2020-05-09", "2020-06-10", "2020-07-18", "2020-08-23", "2020-09-09")),
type = c(1,1,NA,NA,1,1,NA,NA,1),
value = c(7,NA,6,8,NA,NA,5,9,NA))
> dtihave
date type value
1: 2020-01-01 1 7
2: 2020-02-01 1 NA
3: 2020-03-03 NA 6
4: 2020-04-02 NA 8
5: 2020-05-09 1 NA
6: 2020-06-10 1 NA
7: 2020-07-18 NA 5
8: 2020-08-23 NA 9
9: 2020-09-09 1 NA
dtiwant = data.table(date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-03", "2020-04-02", "2020-05-09", "2020-06-10", "2020-07-18", "2020-08-23", "2020-09-09")),
type = c(1,1,NA,NA,1,1,NA,NA,1),
value = c(7,NA,6,8,NA,NA,5,9,NA),
iwantdate = c(NA, as.Date("2020-01-01"), NA, NA, as.Date("2020-04-02"), as.Date("2020-04-02"), NA, NA, as.Date("2020-08-23")),
iwantvalue = c(NA,7,NA,NA,8,8,NA,NA,9))
dtiwant[, iwantdate := as.Date(iwantdate, origin = "1970-01-01")]
> dtiwant
date type value iwantdate iwantvalue
1: 2020-01-01 1 7 <NA> NA
2: 2020-02-01 1 NA 2020-01-01 7
3: 2020-03-03 NA 6 <NA> NA
4: 2020-04-02 NA 8 <NA> NA
5: 2020-05-09 1 NA 2020-04-02 8
6: 2020-06-10 1 NA 2020-04-02 8
7: 2020-07-18 NA 5 <NA> NA
8: 2020-08-23 NA 9 <NA> NA
9: 2020-09-09 1 NA 2020-08-23 9
My current progress using shift, but I need row 6's iwantdate = "2020-04-02". The number of shifts I need to make is unknown, so I can not just use n=2 in shift.
dtprogress = copy(dtihave)
dtprogress[, iwantdate := ifelse(!is.na(type) & is.na(value), shift(date), NA)]
dtprogress[, iwantdate := ifelse(!is.na(type) & !is.na(value), date, iwantdate)]
dtprogress[, iwantdate := as.Date(iwantdate, origin = "1970-01-01")]
> dtprogress
date type value iwantdate
1: 2020-01-01 1 7 2020-01-01
2: 2020-02-01 1 NA 2020-01-01
3: 2020-03-03 NA 6 <NA>
4: 2020-04-02 NA 8 <NA>
5: 2020-05-09 1 NA 2020-04-02
6: 2020-06-10 1 NA 2020-05-09
7: 2020-07-18 NA 5 <NA>
8: 2020-08-23 NA 9 <NA>
9: 2020-09-09 1 NA 2020-08-23
CodePudding user response:
You could do:
dtihave[, idx := cummax((!is.na(value)) * .I) * NA^!is.na(value)][,
c('want_date', 'want_value') := lapply(.SD, '[', idx),
.SDcols = c('date', 'value')][, idx:=NULL]
dtihave
date type value want_date want_value
1: 2020-01-01 1 7 <NA> NA
2: 2020-02-01 1 NA 2020-01-01 7
3: 2020-03-03 NA 6 <NA> NA
4: 2020-04-02 NA 8 <NA> NA
5: 2020-05-09 1 NA 2020-04-02 8
6: 2020-06-10 1 NA 2020-04-02 8
7: 2020-07-18 NA 5 <NA> NA
8: 2020-08-23 NA 9 <NA> NA
9: 2020-09-09 1 NA 2020-08-23 9
with tidyverse. Hopefully this solves the grouping. ie just add %>%group_by(...)
before mutate
and you are good to go
dtihave %>%
mutate(val_na = !is.na(value),
idx = nafill(na_if(row_number() * val_na, 0), "locf"),
idx = idx * NA ^ val_na,
date1 = date[idx], value1 = value[idx],
val_na = NULL, idx = NULL)
CodePudding user response:
You can use lag
to get previous values, e.g.
library(dplyr)
dtihave %>%
mutate(iwantdate = ifelse(is.na(value), lag(date), NA) %>% as.Date(., origin = "1970-01-01"),
iwantvalue = ifelse(is.na(value), lag(value), NA))
date type value iwantdate iwantvalue
1: 2020-01-01 1 7 <NA> NA
2: 2020-02-01 1 NA 2020-01-01 7
3: 2020-03-03 NA 6 <NA> NA
4: 2020-04-02 NA 8 <NA> NA
5: 2020-05-09 1 NA 2020-04-02 8
6: 2020-06-10 1 NA 2020-05-09 NA
7: 2020-07-18 NA 5 <NA> NA
8: 2020-08-23 NA 9 <NA> NA
9: 2020-09-09 1 NA 2020-08-23 9