Home > Net >  Filling rows of multiple columns based on multiple conditions
Filling rows of multiple columns based on multiple conditions

Time:04-21

I have example data as follows:

dat <- structure(list(
zipcode = c(1001, 1002, 1003, 1004, 1101, 1102, 1103, 1104, 1201, 1202, 1203, 1302), 
areacode = c(4, 4, NA, 4, 4, 4, NA, 1, 4, 4, NA, 4), 
type = structure(c(1L, 1L, NA, 1L, 2L, 2L, NA, 1L, 1L, 1L, NA, 1L), 
.Label = c("clay", "sand"), class = "factor"), 
region = c(3, 3, NA, 3, 3, 3, NA, 3, 3, 3, NA, 3), 
do_not_fill = c(1, NA, NA, 1, 1, NA, NA, 1, NA, NA, NA, 1)), 
class = c("data.table", "data.frame"), row.names = c(NA, -4L))

    zipcode areacode type region do_not_fill
 1:    1001        4 clay      3           1
 2:    1002        4 clay      3          NA
 3:    1003       NA <NA>     NA          NA
 4:    1004        4 clay      3           1
 5:    1101        4 sand      3           1
 6:    1102        4 sand      3          NA
 7:    1103       NA <NA>     NA          NA
 8:    1104        1 clay      3           1
 9:    1201        4 clay      3          NA
10:    1202        4 clay      3          NA
11:    1203       NA <NA>     NA          NA
12:    1302        4 clay      3           1

I want to fill ONLY the columns areacode, type and region based on two conditions.

  1. The areacode has to be the same before and after the NA.
  2. The first two digits of the zipcode have to be the same before and after the NA.

Based on this solution, and this solution, I attempted following (however data.table solutions are welcomed and even preferred):

library(dplyr)
dat |> 
  mutate(type = as.character(type)) |> 
  mutate(across(1:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & 
                         lag(as.numeric(substr(zipcode, 1, 2))) == lead(as.numeric(substr(zipcode, 1, 2))),
                         lag(.),
                         .)))

But somewhere I am doing something wrong, because I get:

Error:
! `n` and `row.names` must be consistent.
Run `rlang::last_error()` to see where the error occurred.

Desired output:

    zipcode areacode type region do_not_fill
 1:    1001        4 clay      3           1
 2:    1002        4 clay      3          NA
 3:    1003        4 clay      3          NA
 4:    1004        4 clay      3           1
 5:    1101        4 sand      3           1
 6:    1102        4 sand      3          NA
 7:    1103       NA <NA>     NA          NA
 8:    1104        1 clay      3           1
 9:    1201        4 clay      3          NA
10:    1202        4 clay      3          NA
11:    1203       NA <NA>     NA          NA
12:    1302        4 clay      3           1

EDIT

as_tibble(dat) |>
  mutate(type = as.character(areacode)) |> 
  mutate(across(1:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & 
                           lag(as.numeric(substr(zipcode, 1, 2))) == lead(as.numeric(substr(zipcode, 1, 2))),
                         lag(.),
                         .)))


# A tibble: 12 x 5
   zipcode areacode type  region do_not_fill
     <dbl>    <dbl> <chr>  <dbl>       <dbl>
 1    1001        4 4          3           1
 2    1002        4 4          3          NA
 3    1003        4 4          3          NA
 4    1004        4 4          3           1
 5    1101        4 4          3           1
 6    1102        4 4          3          NA
 7    1103       NA NA        NA          NA
 8    1104        1 1          3           1
 9    1201        4 4          3          NA
10    1202        4 4          3          NA
11    1203       NA NA        NA          NA
12    1302        4 4          3           1

CodePudding user response:

You need to convert it to a tibble first. I think this is because data.table has extra attributes

Have a look at the rownames,

rownames(as_tibble(dat))
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12"
rownames(dat)
 [1] "1" "2" "3" "4"

as_tibble(dat) |>
  mutate(type = as.character(type)) |> 
  mutate(across(1:4,
                ~ ifelse(is.na(.) & lag(areacode) == lead(areacode) & 
                           lag(as.numeric(substr(zipcode, 1, 2))) == lead(as.numeric(substr(zipcode, 1, 2))),
                         lag(.),
                         .)))

# A tibble: 12 x 5
   zipcode areacode type  region do_not_fill
     <dbl>    <dbl> <chr>  <dbl>       <dbl>
 1    1001        4 clay       3           1
 2    1002        4 clay       3          NA
 3    1003        4 clay       3          NA
 4    1004        4 clay       3           1
 5    1101        4 sand       3           1
 6    1102        4 sand       3          NA
 7    1103       NA NA        NA          NA
 8    1104        1 clay       3           1
 9    1201        4 clay       3          NA
10    1202        4 clay       3          NA
11    1203       NA NA        NA          NA
12    1302        4 clay       3           1
  • Related