I have two dataframes, with the same variables (but from another source). I want to combine the two dataframes in such a manner that if there is a value in data
, this value is kept, and else the value from dospert
is filled in.
The following function works:
ifelse(is.na(data$DOSPERT_01),dospert$DOSPERT_01_1,data$DOSPERT_01)
However, as I have a large number of columns I want to do this in a loop.
To do this, I wrote the following loop:
out <- list()
for (i in 1:8){
x <- paste("data$DOSPERT_0",i, sep="")
y <- paste("dospert$DOSPERT_0",i, "_01", sep="")
z <- ifelse(is.na(x),y,x)
out <- c(out,z)
}
This loop prints the names of the columns (dat$DOSPERT_030
), but not the corresponding values. How can I get the values?
The dataframes look like this:
str(dospert)
$ DOSPERT_01_1 : int NA 5 NA NA NA 3 6 7 7 7 ...
$ DOSPERT_02_1 : int NA 5 NA NA NA 7 6 2 7 6 ...
$ DOSPERT_03_1 : int NA 1 NA NA NA 1 1 1 2 1 ...
$ DOSPERT_04_1 : int NA 1 NA NA NA 5 3 4 1 4 ...
$ DOSPERT_05_1 : int NA 1 NA NA NA 1 3 6 1 1 ...
$ DOSPERT_06_1 : int NA 1 NA NA NA 1 2 6 1 1 ...
$ DOSPERT_07_1 : int NA 6 NA NA NA 6 5 7 6 5 ...
$ DOSPERT_08_1 : int NA 1 NA NA NA 1 1 3 5 1 ...
and
str(data)
$ DOSPERT_01: int NA NA NA 7 NA 7 NA 7 NA NA ...
$ DOSPERT_02: int NA NA NA 5 NA 7 NA 7 NA NA ...
$ DOSPERT_03: int NA NA NA 1 NA 1 NA 1 NA NA ...
$ DOSPERT_04: int NA NA NA 6 NA 5 NA 1 NA NA ...
$ DOSPERT_05: int NA NA NA 1 NA 5 NA 4 NA NA ...
$ DOSPERT_06: int NA NA NA 1 NA 3 NA 3 NA NA ...
$ DOSPERT_07: int NA NA NA 7 NA 5 NA 6 NA NA ...
$ DOSPERT_08: int NA NA NA 1 NA 1 NA 1 NA NA ...
CodePudding user response:
Sample data in a more consumable format:
data <- structure(list(DOSPERT_02_1 = c(NA, 5L, NA, NA, NA, 7L, 6L, 2L, 7L, 6L), DOSPERT_03_1 = c(NA, 1L, NA, NA, NA, 1L, 1L, 1L, 2L, 1L), DOSPERT_04_1 = c(NA, 1L, NA, NA, NA, 5L, 3L, 4L, 1L, 4L), DOSPERT_05_1 = c(NA, 1L, NA, NA, NA, 1L, 3L, 6L, 1L, 1L), DOSPERT_06_1 = c(NA, 1L, NA, NA, NA, 1L, 2L, 6L, 1L, 1L), DOSPERT_07_1 = c(NA, 6L, NA, NA, NA, 6L, 5L, 7L, 6L, 5L), DOSPERT_08_1 = c(NA, 1L, NA, NA, NA, 1L, 1L, 3L, 5L, 1L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
dospert <- structure(list(DOSPERT_02 = c(NA, NA, NA, 5L, NA, 7L, NA, 7L, NA, NA), DOSPERT_03 = c(NA, NA, NA, 1L, NA, 1L, NA, 1L, NA, NA), DOSPERT_04 = c(NA, NA, NA, 6L, NA, 5L, NA, 1L, NA, NA), DOSPERT_05 = c(NA, NA, NA, 1L, NA, 5L, NA, 4L, NA, NA), DOSPERT_06 = c(NA, NA, NA, 1L, NA, 3L, NA, 3L, NA, NA), DOSPERT_07 = c(NA, NA, NA, 7L, NA, 5L, NA, 6L, NA, NA), DOSPERT_08 = c(NA, NA, NA, 1L, NA, 1L, NA, 1L, NA, NA)), class = "data.frame", row.names = c(NA, -10L))
Since the names of dospert
perfectly match (adding _1
) the names of data
, this is rather straight-forward:
all(ncol(data) == ncol(dospert) && names(data) == paste0(names(dospert), "_1"))
# [1] TRUE
data[] <- Map(data.table::fcoalesce, data, dospert)
data
# DOSPERT_02_1 DOSPERT_03_1 DOSPERT_04_1 DOSPERT_05_1 DOSPERT_06_1 DOSPERT_07_1 DOSPERT_08_1
# 1 NA NA NA NA NA NA NA
# 2 5 1 1 1 1 6 1
# 3 NA NA NA NA NA NA NA
# 4 5 1 6 1 1 7 1
# 5 NA NA NA NA NA NA NA
# 6 7 1 5 1 1 6 1
# 7 6 1 3 3 2 5 1
# 8 2 1 4 6 6 7 3
# 9 7 2 1 1 1 6 5
# 10 6 1 4 1 1 5 1
You can replace data.table::fcoalesce
with dplyr::coalesce
if more convenient, or write your own coalesce-like function.
If you're curious, the data[] <- ...
is because Map
returns a list
, which would mean data
would be a list
and not a data.frame
. While we could wrap Map
is data.frame
to get the same effect, using data[] <-
replaces the contents of the columns but not the class of data
itself, preserving it being a frame. The example below better shows why using data[...] <-
may be a good idea anyway.
If the assumption of same-names and same-order is not true, then we need to be a bit more careful about which columns are compared and assigned. To demonstrate why this is true, I'll change the order of columns, omit one, and rename another, so that we have: different columns; at least one column in data
not in dospert
; and vice versa, at least one column in dospert
that is not in data
.
set.seed(42)
dospert2 <- sample(dospert[,-2])
names(dospert2)[3] <- "QUUX"
dospert2
# DOSPERT_02 DOSPERT_07 QUUX DOSPERT_06 DOSPERT_04 DOSPERT_05
# 1 NA NA NA NA NA NA
# 2 NA NA NA NA NA NA
# 3 NA NA NA NA NA NA
# 4 5 7 1 1 6 1
# 5 NA NA NA NA NA NA
# 6 7 5 1 3 5 5
# 7 NA NA NA NA NA NA
# 8 7 6 1 3 1 4
# 9 NA NA NA NA NA NA
# 10 NA NA NA NA NA NA
From here, we need to find where names match names.
dospert2_cols <- which(paste0(names(dospert2), "_1") %in% names(data))
data_cols <- na.omit(match(paste0(names(dospert2), "_1"), names(data)))
dospert2_cols
# [1] 1 2 4 5 6
data_cols
# [1] 1 6 5 3 4
# attr(,"na.action")
# [1] 3
# attr(,"class")
# [1] "omit"
From here, it's a matter of subsetting the columns:
data[data_cols] <- Map(data.table::fcoalesce, data[data_cols], dospert2[dospert2_cols])
data
# DOSPERT_02_1 DOSPERT_03_1 DOSPERT_04_1 DOSPERT_05_1 DOSPERT_06_1 DOSPERT_07_1 DOSPERT_08_1
# 1 NA NA NA NA NA NA NA
# 2 5 1 1 1 1 6 1
# 3 NA NA NA NA NA NA NA
# 4 5 NA 6 1 1 7 NA
# 5 NA NA NA NA NA NA NA
# 6 7 1 5 1 1 6 1
# 7 6 1 3 3 2 5 1
# 8 2 1 4 6 6 7 3
# 9 7 2 1 1 1 6 5
# 10 6 1 4 1 1 5 1
CodePudding user response:
Since you tagged data.table
, you can make use of fcoalesce()
, which returns that first non-missing value, and avoid the verbose loop by using Map()
library(data.table)
dt1 <- data.table(x = c(1, 2, NA, 3),
y = c(3, NA, NA, 5))
dt2 <- data.table(k = c(10, 11, 12, 13),
s = c(20, 21, 22, 23))
as.data.table(Map(dt1, dt2, f = data.table::fcoalesce))
#> x y
#> 1: 1 3
#> 2: 2 21
#> 3: 12 22
#> 4: 3 5