Return values after iterating a loop R-CodePudding

I have two dataframes, with the same variables (but from another source). I want to combine the two dataframes in such a manner that if there is a value in data, this value is kept, and else the value from dospert is filled in.

The following function works:

ifelse(is.na(data$DOSPERT_01),dospert$DOSPERT_01_1,data$DOSPERT_01)

However, as I have a large number of columns I want to do this in a loop.

To do this, I wrote the following loop:

out <- list() 

for (i in 1:8){
  x <-  paste("data$DOSPERT_0",i, sep="")
  y <- paste("dospert$DOSPERT_0",i, "_01", sep="")
  z <- ifelse(is.na(x),y,x)
  out <- c(out,z)
}

This loop prints the names of the columns (dat$DOSPERT_030), but not the corresponding values. How can I get the values?

The dataframes look like this:

str(dospert)
 $ DOSPERT_01_1             : int  NA 5 NA NA NA 3 6 7 7 7 ...
 $ DOSPERT_02_1             : int  NA 5 NA NA NA 7 6 2 7 6 ...
 $ DOSPERT_03_1             : int  NA 1 NA NA NA 1 1 1 2 1 ...
 $ DOSPERT_04_1             : int  NA 1 NA NA NA 5 3 4 1 4 ...
 $ DOSPERT_05_1             : int  NA 1 NA NA NA 1 3 6 1 1 ...
 $ DOSPERT_06_1             : int  NA 1 NA NA NA 1 2 6 1 1 ...
 $ DOSPERT_07_1             : int  NA 6 NA NA NA 6 5 7 6 5 ...
 $ DOSPERT_08_1             : int  NA 1 NA NA NA 1 1 3 5 1 ...

and

str(data)
 $ DOSPERT_01: int  NA NA NA 7 NA 7 NA 7 NA NA ...
 $ DOSPERT_02: int  NA NA NA 5 NA 7 NA 7 NA NA ...
 $ DOSPERT_03: int  NA NA NA 1 NA 1 NA 1 NA NA ...
 $ DOSPERT_04: int  NA NA NA 6 NA 5 NA 1 NA NA ...
 $ DOSPERT_05: int  NA NA NA 1 NA 5 NA 4 NA NA ...
 $ DOSPERT_06: int  NA NA NA 1 NA 3 NA 3 NA NA ...
 $ DOSPERT_07: int  NA NA NA 7 NA 5 NA 6 NA NA ...
 $ DOSPERT_08: int  NA NA NA 1 NA 1 NA 1 NA NA ...

CodePudding user response：

Sample data in a more consumable format:

data <- structure(list(DOSPERT_02_1 = c(NA, 5L, NA, NA, NA, 7L, 6L, 2L, 7L, 6L), DOSPERT_03_1 = c(NA, 1L, NA, NA, NA, 1L, 1L, 1L, 2L, 1L), DOSPERT_04_1 = c(NA, 1L, NA, NA, NA, 5L, 3L, 4L, 1L, 4L), DOSPERT_05_1 = c(NA, 1L, NA, NA, NA, 1L, 3L, 6L, 1L, 1L), DOSPERT_06_1 = c(NA, 1L, NA, NA, NA, 1L, 2L, 6L, 1L, 1L), DOSPERT_07_1 = c(NA, 6L, NA, NA, NA, 6L, 5L, 7L, 6L, 5L), DOSPERT_08_1 = c(NA, 1L, NA, NA, NA, 1L, 1L, 3L, 5L, 1L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",  "10"))
dospert <- structure(list(DOSPERT_02 = c(NA, NA, NA, 5L, NA, 7L, NA, 7L, NA, NA), DOSPERT_03 = c(NA, NA, NA, 1L, NA, 1L, NA, 1L, NA, NA), DOSPERT_04 = c(NA, NA, NA, 6L, NA, 5L, NA, 1L, NA, NA), DOSPERT_05 = c(NA, NA, NA, 1L, NA, 5L, NA, 4L, NA, NA), DOSPERT_06 = c(NA, NA, NA, 1L, NA, 3L, NA, 3L, NA, NA), DOSPERT_07 = c(NA, NA, NA, 7L, NA, 5L, NA, 6L, NA, NA), DOSPERT_08 = c(NA, NA, NA, 1L, NA, 1L, NA, 1L, NA, NA)), class = "data.frame", row.names = c(NA, -10L))

Since the names of dospert perfectly match (adding _1) the names of data, this is rather straight-forward:

all(ncol(data) == ncol(dospert) && names(data) == paste0(names(dospert), "_1"))
# [1] TRUE
data[] <- Map(data.table::fcoalesce, data, dospert)
data
#    DOSPERT_02_1 DOSPERT_03_1 DOSPERT_04_1 DOSPERT_05_1 DOSPERT_06_1 DOSPERT_07_1 DOSPERT_08_1
# 1            NA           NA           NA           NA           NA           NA           NA
# 2             5            1            1            1            1            6            1
# 3            NA           NA           NA           NA           NA           NA           NA
# 4             5            1            6            1            1            7            1
# 5            NA           NA           NA           NA           NA           NA           NA
# 6             7            1            5            1            1            6            1
# 7             6            1            3            3            2            5            1
# 8             2            1            4            6            6            7            3
# 9             7            2            1            1            1            6            5
# 10            6            1            4            1            1            5            1

You can replace data.table::fcoalesce with dplyr::coalesce if more convenient, or write your own coalesce-like function.

If you're curious, the data[] <- ... is because Map returns a list, which would mean data would be a list and not a data.frame. While we could wrap Map is data.frame to get the same effect, using data[] <- replaces the contents of the columns but not the class of data itself, preserving it being a frame. The example below better shows why using data[...] <- may be a good idea anyway.

If the assumption of same-names and same-order is not true, then we need to be a bit more careful about which columns are compared and assigned. To demonstrate why this is true, I'll change the order of columns, omit one, and rename another, so that we have: different columns; at least one column in data not in dospert; and vice versa, at least one column in dospert that is not in data.

set.seed(42)
dospert2 <- sample(dospert[,-2])
names(dospert2)[3] <- "QUUX"
dospert2
#    DOSPERT_02 DOSPERT_07 QUUX DOSPERT_06 DOSPERT_04 DOSPERT_05
# 1          NA         NA   NA         NA         NA         NA
# 2          NA         NA   NA         NA         NA         NA
# 3          NA         NA   NA         NA         NA         NA
# 4           5          7    1          1          6          1
# 5          NA         NA   NA         NA         NA         NA
# 6           7          5    1          3          5          5
# 7          NA         NA   NA         NA         NA         NA
# 8           7          6    1          3          1          4
# 9          NA         NA   NA         NA         NA         NA
# 10         NA         NA   NA         NA         NA         NA

From here, we need to find where names match names.

dospert2_cols <- which(paste0(names(dospert2), "_1") %in% names(data))
data_cols <- na.omit(match(paste0(names(dospert2), "_1"), names(data)))
dospert2_cols
# [1] 1 2 4 5 6
data_cols
# [1] 1 6 5 3 4
# attr(,"na.action")
# [1] 3
# attr(,"class")
# [1] "omit"

From here, it's a matter of subsetting the columns:

data[data_cols] <- Map(data.table::fcoalesce, data[data_cols], dospert2[dospert2_cols])
data
#    DOSPERT_02_1 DOSPERT_03_1 DOSPERT_04_1 DOSPERT_05_1 DOSPERT_06_1 DOSPERT_07_1 DOSPERT_08_1
# 1            NA           NA           NA           NA           NA           NA           NA
# 2             5            1            1            1            1            6            1
# 3            NA           NA           NA           NA           NA           NA           NA
# 4             5           NA            6            1            1            7           NA
# 5            NA           NA           NA           NA           NA           NA           NA
# 6             7            1            5            1            1            6            1
# 7             6            1            3            3            2            5            1
# 8             2            1            4            6            6            7            3
# 9             7            2            1            1            1            6            5
# 10            6            1            4            1            1            5            1

CodePudding user response：

Since you tagged data.table, you can make use of fcoalesce(), which returns that first non-missing value, and avoid the verbose loop by using Map()

library(data.table)

dt1 <- data.table(x = c(1, 2, NA, 3),
                  y = c(3, NA, NA, 5))
dt2 <- data.table(k = c(10, 11, 12, 13),
                  s = c(20, 21, 22, 23))

as.data.table(Map(dt1, dt2, f = data.table::fcoalesce))
#>     x  y
#> 1:  1  3
#> 2:  2 21
#> 3: 12 22
#> 4:  3  5