Home > Enterprise >  How to find value in dataframe with column and row combination of another data frame?
How to find value in dataframe with column and row combination of another data frame?

Time:09-23

I want to used values of two columns in one dataframe and use these values as column * row combination in another dataframe. Sounds terrible, so I rather explain with example below.(simplified version, the actual dataset is much larger)

Data1
#   ID       Date
# 1  A 2022-02-01
# 2  B 2022-02-02
# 3  C 2022-02-03
# 4  D 2022-02-04
# 5  E 2022-02-05
# 6  F 2022-02-06
# 7  G 2022-02-07
# 8  H 2022-02-08

Data2
#   ID X2022.02.01 X2022.02.02 X2022.02.03 X2022.02.04 X2022.02.05 X2022.02.06 X2022.02.07 X2022.02.08
# 1  A           1           9          17          25          33          41          49          57
# 2  B           2          10          18          26          34          42          50          58
# 3  C           3          11          19          27          35          43          51          59
# 4  D           4          12          20          28          36          44          52          60
# 5  E           5          13          21          29          37          45          53          61
# 6  F           6          14          22          30          38          46          54          62
# 7  G           7          15          23          31          39          47          55          63
# 8  H           8          16          24          32          40          48          56          64

and I would like to use ID and Date combination in Data1 to find value in Data2 So I would like to have the following outcome:

#   ID       Date Value
# 1  A 2022-02-01     1
# 2  B 2022-02-02    10
# 3  C 2022-02-03    19
# 4  D 2022-02-04    28
# 5  E 2022-02-05    37
# 6  F 2022-02-06    46
# 7  G 2022-02-07    55
# 8  H 2022-02-08    64

so far, I used the following code, but it took too many time as the original dataset (both Data1 and Data2) is huge.

for (i in 1:nrow(Data1)) {
  a <- Data1[[1]][[i]]
  b <- Data1[[2]][[i]]
  c <- Data2[b, a]
  Data1$Value[i] <- c
}

Could someone kindly help my code?? :)

Data

Data1 <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G", "H"), 
                    Date=c("2022-02-01", "2022-02-02", "2022-02-03", "2022-02-04",
                           "2022-02-05", "2022-02-06","2022-02-07", "2022-02-08"))
Data2 <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G", "H"), 
                    "2022-02-01"=c(1:8),
                    '2022-02-02'=c(9:16),
                    '2022-02-03'=c(17:24),
                    '2022-02-04'=c(25:32),
                    '2022-02-05'=c(33:40),
                    '2022-02-06'=c(41:48),
                    '2022-02-07'=c(49:56),
                    '2022-02-08'=c(57:64))

CodePudding user response:

Consider to match IDs together as well as dates with column names. To let the dates look like the (valid) column names we may use make.names.

transform(Data1, Value=mapply(\(i, j) Data2[i, j], match(Data1$ID, Data2$ID), 
                              match(make.names(Data1$Date), names(Data2))))
#   ID       Date Value
# 1  A 2022-02-01     1
# 2  B 2022-02-02    10
# 3  C 2022-02-03    19
# 4  D 2022-02-04    28
# 5  E 2022-02-05    37
# 6  F 2022-02-06    46
# 7  G 2022-02-07    55
# 8  H 2022-02-08    64

Column names should not start with a number, actually they are no valid names, so R internally changes them (using make.names). This is also why you need to use quotes to get those "wrong" names. So I wonder if your for loop actually works. I only brought it to work with your data like so:

for (i in 1:nrow(Data1)) {
  a <- Data1[[1]][[i]]
  b <- make.names(Data1[[2]][[i]])
  c <- Data2[Data2$ID == a, b]
  Data1$Value[i] <- c
}

If the column names of your real data start with numbers, please convert them using

names(data) <- make.names(names(data))

Data:

Data1 <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H"), 
    Date = c("2022-02-01", "2022-02-02", "2022-02-03", "2022-02-04", 
    "2022-02-05", "2022-02-06", "2022-02-07", "2022-02-08")), class = "data.frame", row.names = c(NA, 
-8L))

Data2 <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H"), 
    X2022.02.01 = 1:8, X2022.02.02 = 9:16, X2022.02.03 = 17:24, 
    X2022.02.04 = 25:32, X2022.02.05 = 33:40, X2022.02.06 = 41:48, 
    X2022.02.07 = 49:56, X2022.02.08 = 57:64), class = "data.frame", row.names = c(NA, 
-8L))
  • Related