Home > Mobile >  How to search upwards a column for a value based on whether another column is NA or not?
How to search upwards a column for a value based on whether another column is NA or not?

Time:04-28

I need to find the previous date for which value is not NA and then also use the value on that row. I have tried to use shift, but I have met a problem because shift works well for row 9 but not for when there are consecutive non-NAs on type, such as on rows 5,6.

dtihave = data.table(date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-03", "2020-04-02", "2020-05-09", "2020-06-10", "2020-07-18", "2020-08-23", "2020-09-09")),
                     type = c(1,1,NA,NA,1,1,NA,NA,1),
                     value = c(7,NA,6,8,NA,NA,5,9,NA))

> dtihave
         date type value
1: 2020-01-01    1     7
2: 2020-02-01    1    NA
3: 2020-03-03   NA     6
4: 2020-04-02   NA     8
5: 2020-05-09    1    NA
6: 2020-06-10    1    NA
7: 2020-07-18   NA     5
8: 2020-08-23   NA     9
9: 2020-09-09    1    NA
dtiwant = data.table(date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-03", "2020-04-02", "2020-05-09", "2020-06-10", "2020-07-18", "2020-08-23", "2020-09-09")),
                     type = c(1,1,NA,NA,1,1,NA,NA,1),
                     value = c(7,NA,6,8,NA,NA,5,9,NA),
                     iwantdate = c(NA, as.Date("2020-01-01"), NA, NA, as.Date("2020-04-02"), as.Date("2020-04-02"), NA, NA, as.Date("2020-08-23")),
                     iwantvalue = c(NA,7,NA,NA,8,8,NA,NA,9))
dtiwant[, iwantdate := as.Date(iwantdate, origin = "1970-01-01")]

> dtiwant
         date type value  iwantdate iwantvalue
1: 2020-01-01    1     7       <NA>         NA
2: 2020-02-01    1    NA 2020-01-01          7
3: 2020-03-03   NA     6       <NA>         NA
4: 2020-04-02   NA     8       <NA>         NA
5: 2020-05-09    1    NA 2020-04-02          8
6: 2020-06-10    1    NA 2020-04-02          8
7: 2020-07-18   NA     5       <NA>         NA
8: 2020-08-23   NA     9       <NA>         NA
9: 2020-09-09    1    NA 2020-08-23          9

My current progress using shift, but I need row 6's iwantdate = "2020-04-02". The number of shifts I need to make is unknown, so I can not just use n=2 in shift.

dtprogress = copy(dtihave)
dtprogress[, iwantdate := ifelse(!is.na(type) & is.na(value), shift(date), NA)]
dtprogress[, iwantdate := ifelse(!is.na(type) & !is.na(value), date, iwantdate)]
dtprogress[, iwantdate := as.Date(iwantdate, origin = "1970-01-01")]

> dtprogress
         date type value  iwantdate
1: 2020-01-01    1     7 2020-01-01
2: 2020-02-01    1    NA 2020-01-01
3: 2020-03-03   NA     6       <NA>
4: 2020-04-02   NA     8       <NA>
5: 2020-05-09    1    NA 2020-04-02
6: 2020-06-10    1    NA 2020-05-09
7: 2020-07-18   NA     5       <NA>
8: 2020-08-23   NA     9       <NA>
9: 2020-09-09    1    NA 2020-08-23

CodePudding user response:

You could do:

dtihave[, idx := cummax((!is.na(value)) * .I) * NA^!is.na(value)][,
          c('want_date', 'want_value') := lapply(.SD, '[', idx),
         .SDcols = c('date', 'value')][, idx:=NULL]

dtihave
         date type value  want_date want_value
1: 2020-01-01    1     7       <NA>         NA
2: 2020-02-01    1    NA 2020-01-01          7
3: 2020-03-03   NA     6       <NA>         NA
4: 2020-04-02   NA     8       <NA>         NA
5: 2020-05-09    1    NA 2020-04-02          8
6: 2020-06-10    1    NA 2020-04-02          8
7: 2020-07-18   NA     5       <NA>         NA
8: 2020-08-23   NA     9       <NA>         NA
9: 2020-09-09    1    NA 2020-08-23          9

with tidyverse. Hopefully this solves the grouping. ie just add %>%group_by(...) before mutate and you are good to go

dtihave %>%
  mutate(val_na = !is.na(value),
         idx = nafill(na_if(row_number() * val_na, 0), "locf"),
         idx = idx * NA ^ val_na,
         date1 = date[idx], value1 = value[idx],
         val_na = NULL, idx = NULL)

CodePudding user response:

You can use lag to get previous values, e.g.

library(dplyr)
dtihave %>% 
  mutate(iwantdate = ifelse(is.na(value), lag(date), NA) %>% as.Date(., origin = "1970-01-01"), 
         iwantvalue = ifelse(is.na(value), lag(value), NA)) 

         date type value  iwantdate iwantvalue
1: 2020-01-01    1     7       <NA>         NA
2: 2020-02-01    1    NA 2020-01-01          7
3: 2020-03-03   NA     6       <NA>         NA
4: 2020-04-02   NA     8       <NA>         NA
5: 2020-05-09    1    NA 2020-04-02          8
6: 2020-06-10    1    NA 2020-05-09         NA
7: 2020-07-18   NA     5       <NA>         NA
8: 2020-08-23   NA     9       <NA>         NA
9: 2020-09-09    1    NA 2020-08-23          9
  •  Tags:  
  • r
  • Related