Add value to a Row by matching column name in one table to column value in another in R-CodePudding

Df1:

variant ID1 ID2 ID3 ID4 .... ID80000
123     0    1   2   1         0
321     1    2   1   1         1
543     1    1   2   1         1
6542    1    0   0   1         0  
243     1    0   2   1         1
654     0    1   1   2         1 
342     1    2   1   2         1
present 0    1   0   1         0

Df2:

ID  sex    yob         disease
ID1  M    10/10/1910    cancer
ID2  F     05/02/2000   CML
ID3  F     01/01/1983   gout

I would like to add the columns from DF2 as rows onto the DF1 putting the column names into the variant column of DF1 by matching onto IDs

Desired outcome

variant ID1            ID2       ID3     ID4 .... ID80000
123     0               1         2       1         0
321     1               2         1       1         1
543     1               1         2       1         1
6542    1               0         0       1         0  
243     1               0         2       1         1
654     0               1         1       2         1 
342     1               2         1       2         1
present 0               1         0       1         0
sex     M               F         F       NA        NA
yob     10/10/1910  05/02/2000 01/01/1983 NA        NA
disease cancer         CML       gout     NA        NA

I have tried:

df1["sex",] <- df2$sex[match(df2$ID, colnames(df1),]

Which doesn't work.

I have got this to work:

df1["sex",] <- ifelse(colnames(df1) %in% df2$ID, df2$sex, NA)

I don't even know how to approach doing this with more than one column at a time.

Any help would be much appreciated

CodePudding user response：

Using data.table:

Although this works for this example, you can't use it as-is for "any" other data set. It requires some knowledge of the data, which can be adjusted easily when following the preparation steps (see explanation).

library(data.table)

rbindlist(list(df1, cbind( variant=names(df2)[2:ncol(df2)],
  setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] ))), fill=T)

    variant        ID1        ID2        ID3 ID4
 1:     123          0          1          2   1
 2:     321          1          2          1   1
 3:     543          1          1          2   1
 4:    6542          1          0          0   1
 5:     243          1          0          2   1
 6:     654          0          1          1   2
 7:     342          1          2          1   2
 8: present          0          1          0   1
 9:     sex          M          F          F  NA
10:     yob 10/10/1910 05/02/2000 01/01/1983  NA
11: disease     cancer        CML       gout  NA

Explanation

df1 is fine but df2 needs some care, since we do not have a variant column.

# first part of df2, all "ID" columns [2->end]
setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] )
#               ID1        ID2        ID3
#sex              M          F          F
#yob     10/10/1910 05/02/2000 01/01/1983
#disease     cancer        CML       gout

# second part of df2, prepare first column
names(df2)[2:ncol(df2)]
#[1] "sex"     "yob"     "disease"

# put together with name variant
cbind( variant=names(df2)[2:ncol(df2)], 
  setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] ))
#        variant        ID1        ID2        ID3
#sex         sex          M          F          F
#yob         yob 10/10/1910 05/02/2000 01/01/1983
#disease disease     cancer        CML       gout

# now df2 is ready to be matched with df1s column names using rbindlist like above

Data

df1 <- structure(list(variant = c("123", "321", "543", "6542", "243", 
"654", "342", "present"), ID1 = c(0L, 1L, 1L, 1L, 1L, 0L, 1L, 
0L), ID2 = c(1L, 2L, 1L, 0L, 0L, 1L, 2L, 1L), ID3 = c(2L, 1L, 
2L, 0L, 2L, 1L, 1L, 0L), ID4 = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
1L)), class = "data.frame", row.names = c(NA, -8L))

df2 <- structure(list(ID = c("ID1", "ID2", "ID3"), sex = c("M", "F", 
"F"), yob = c("10/10/1910", "05/02/2000", "01/01/1983"), disease = c("cancer", 
"CML", "gout")), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response：

Another way, using dplyr for adjust df2, magrittr for pipe operator and data.table for join both df's

library(dplyr)
library(magrittr)

df2 <- as_tibble(t(df2[, -1])) %>% 
  `colnames<-` (df2[["ID"]]) %>% 
  mutate(variant = rownames(t(df2[, -1]))) %>% 
  relocate(variant)

library(data.table)
rbindlist(list(df1, df2), fill = TRUE)