Please note that my question is different from this one:
In the following data, I would like to fill both numerical and character columns which are NA
, based on the conditions that areacode
and type
of the zipcode before the NA
is the same areacode
and type
of the zipcode after the NA
.
In words: "Because zipcode 1002 had clay and zipcode 1004 has clay, we assume that zipcode 1003 has clay."
I wanted to use this approach, but na.fill
only fills numerical values.
dat <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4,
4, NA, 4), type = structure(c(3L, 3L, NA, 3L), .Label = c("",
"sand", "clay", "na2"), class = "factor"), region = c(3, 3,
NA, 3)), class = c("data.table", "data.frame"), row.names = c(NA,
-4L))
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 4 clay 3
dat2 <- structure(list(zipcode = c(1001, 1002, 1003, 1004), areacode = c(4,
4, NA, 1), type = structure(c(3L, 3L, NA, 2L), .Label = c("",
"sand", "clay", "na2"), class = "factor"), region = c(3, 3, NA,
3)), class = c("data.table", "data.frame"), row.names = c(NA,
-4L))
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 1 sand 3
What would be the best approach for this?
Desired output dat
:
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 4 clay 3
Desired output dat2
:
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 NA <NA> NA
4: 1004 1 sand 3
EDIT:
The following would not be enough, because it would fill in clay
even if the fourth row says sand
.
dat2 %>%
fill(areacode, type, region)
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 1 sand 3
dat2[, lapply(.SD, zoo::na.locf)]
zipcode areacode type region
1: 1001 4 clay 3
2: 1002 4 clay 3
3: 1003 4 clay 3
4: 1004 1 sand 3
CodePudding user response:
Using dplyr
:
library(dplyr)
dat2 |>
mutate(type = as.character(type)) |>
mutate(across(2:4,
~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
lag(.),
.)))
zipcode areacode type region
1 1001 4 clay 3
2 1002 4 clay 3
3 1003 NA <NA> NA
4 1004 1 sand 3
dat |>
mutate(type = as.character(type)) |>
mutate(across(2:4,
~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & lag(type) == lead(type),
lag(.),
.)))
zipcode areacode type region
1 1001 4 clay 3
2 1002 4 clay 3
3 1003 4 clay 3
4 1004 4 clay 3