I want to used values of two columns in one dataframe and use these values as column * row combination in another dataframe. Sounds terrible, so I rather explain with example below.(simplified version, the actual dataset is much larger)
Data1
# ID Date
# 1 A 2022-02-01
# 2 B 2022-02-02
# 3 C 2022-02-03
# 4 D 2022-02-04
# 5 E 2022-02-05
# 6 F 2022-02-06
# 7 G 2022-02-07
# 8 H 2022-02-08
Data2
# ID X2022.02.01 X2022.02.02 X2022.02.03 X2022.02.04 X2022.02.05 X2022.02.06 X2022.02.07 X2022.02.08
# 1 A 1 9 17 25 33 41 49 57
# 2 B 2 10 18 26 34 42 50 58
# 3 C 3 11 19 27 35 43 51 59
# 4 D 4 12 20 28 36 44 52 60
# 5 E 5 13 21 29 37 45 53 61
# 6 F 6 14 22 30 38 46 54 62
# 7 G 7 15 23 31 39 47 55 63
# 8 H 8 16 24 32 40 48 56 64
and I would like to use ID and Date combination in Data1 to find value in Data2 So I would like to have the following outcome:
# ID Date Value
# 1 A 2022-02-01 1
# 2 B 2022-02-02 10
# 3 C 2022-02-03 19
# 4 D 2022-02-04 28
# 5 E 2022-02-05 37
# 6 F 2022-02-06 46
# 7 G 2022-02-07 55
# 8 H 2022-02-08 64
so far, I used the following code, but it took too many time as the original dataset (both Data1
and Data2
) is huge.
for (i in 1:nrow(Data1)) {
a <- Data1[[1]][[i]]
b <- Data1[[2]][[i]]
c <- Data2[b, a]
Data1$Value[i] <- c
}
Could someone kindly help my code?? :)
Data
Data1 <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G", "H"),
Date=c("2022-02-01", "2022-02-02", "2022-02-03", "2022-02-04",
"2022-02-05", "2022-02-06","2022-02-07", "2022-02-08"))
Data2 <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G", "H"),
"2022-02-01"=c(1:8),
'2022-02-02'=c(9:16),
'2022-02-03'=c(17:24),
'2022-02-04'=c(25:32),
'2022-02-05'=c(33:40),
'2022-02-06'=c(41:48),
'2022-02-07'=c(49:56),
'2022-02-08'=c(57:64))
CodePudding user response:
Consider to match
IDs together as well as dates with column names. To let the dates look like the (valid) column names we may use make.names
.
transform(Data1, Value=mapply(\(i, j) Data2[i, j], match(Data1$ID, Data2$ID),
match(make.names(Data1$Date), names(Data2))))
# ID Date Value
# 1 A 2022-02-01 1
# 2 B 2022-02-02 10
# 3 C 2022-02-03 19
# 4 D 2022-02-04 28
# 5 E 2022-02-05 37
# 6 F 2022-02-06 46
# 7 G 2022-02-07 55
# 8 H 2022-02-08 64
Column names should not start with a number, actually they are no valid names, so R internally changes them (using make.names
). This is also why you need to use quotes to get those "wrong" names. So I wonder if your for
loop actually works. I only brought it to work with your data like so:
for (i in 1:nrow(Data1)) {
a <- Data1[[1]][[i]]
b <- make.names(Data1[[2]][[i]])
c <- Data2[Data2$ID == a, b]
Data1$Value[i] <- c
}
If the column names of your real data start with numbers, please convert them using
names(data) <- make.names(names(data))
Data:
Data1 <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H"),
Date = c("2022-02-01", "2022-02-02", "2022-02-03", "2022-02-04",
"2022-02-05", "2022-02-06", "2022-02-07", "2022-02-08")), class = "data.frame", row.names = c(NA,
-8L))
Data2 <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H"),
X2022.02.01 = 1:8, X2022.02.02 = 9:16, X2022.02.03 = 17:24,
X2022.02.04 = 25:32, X2022.02.05 = 33:40, X2022.02.06 = 41:48,
X2022.02.07 = 49:56, X2022.02.08 = 57:64), class = "data.frame", row.names = c(NA,
-8L))