Home > Net >  Fill both numerical and character columns, conditional on the previous and next value being equal
Fill both numerical and character columns, conditional on the previous and next value being equal

Time:04-21

Please note that my question is different from this one:

In the following data, I would like to fill both numerical and character columns which are NA, based on the conditions that areacode and type of the zipcode before the NA is the same areacode and type of the zipcode after the NA.

In words: "Because zipcode 1002 had clay and zipcode 1004 has clay, we assume that zipcode 1003 has clay."

I wanted to use this approach, but na.fill only fills numerical values.

dat <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4, 
4, NA, 4), type = structure(c(3L, 3L, NA, 3L), .Label = c("", 
"sand", "clay", "na2"), class = "factor"), region = c(3, 3, 
NA, 3)), class = c("data.table", "data.frame"), row.names = c(NA, 
-4L))

   zipcode areacode type region
1:    1001        4 clay      3
2:    1002        4 clay      3
3:    1003       NA <NA>     NA
4:    1004        4 clay      3

dat2 <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4, 
4, NA, 1), type = structure(c(3L, 3L, NA, 2L), .Label = c("", 
"sand", "clay", "na2"), class = "factor"), region = c(3, 3, NA, 
3)), class = c("data.table", "data.frame"), row.names = c(NA, 
-4L))

   zipcode areacode type region
1:    1001        4 clay      3
2:    1002        4 clay      3
3:    1003       NA <NA>     NA
4:    1004        1 sand      3

What would be the best approach for this?

Desired output dat:

   zipcode areacode type region
1:    1001        4 clay      3
2:    1002        4 clay      3
3:    1003        4 clay      3
4:    1004        4 clay      3

Desired output dat2:

   zipcode areacode type region
1:    1001        4 clay      3
2:    1002        4 clay      3
3:    1003       NA <NA>     NA
4:    1004        1 sand      3

EDIT:

The following would not be enough, because it would fill in clay even if the fourth row says sand.

dat2 %>%
  fill(areacode, type, region)

   zipcode areacode type region
1:    1001        4 clay      3
2:    1002        4 clay      3
3:    1003        4 clay      3
4:    1004        1 sand      3

dat2[, lapply(.SD, zoo::na.locf)]

   zipcode areacode type region
1:    1001        4 clay      3
2:    1002        4 clay      3
3:    1003        4 clay      3
4:    1004        1 sand      3

CodePudding user response:

Using dplyr:

library(dplyr)
dat2 |> 
  mutate(type = as.character(type)) |> 
  mutate(across(2:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
                         lag(.),
                         .)))

  zipcode areacode type region
1    1001        4 clay      3
2    1002        4 clay      3
3    1003       NA <NA>     NA
4    1004        1 sand      3

dat |> 
  mutate(type = as.character(type)) |> 
  mutate(across(2:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
                         lag(.),
                         .)))

  zipcode areacode type region
1    1001        4 clay      3
2    1002        4 clay      3
3    1003        4 clay      3
4    1004        4 clay      3
  • Related