Df1:
variant ID1 ID2 ID3 ID4 .... ID80000
123 0 1 2 1 0
321 1 2 1 1 1
543 1 1 2 1 1
6542 1 0 0 1 0
243 1 0 2 1 1
654 0 1 1 2 1
342 1 2 1 2 1
present 0 1 0 1 0
Df2:
ID sex yob disease
ID1 M 10/10/1910 cancer
ID2 F 05/02/2000 CML
ID3 F 01/01/1983 gout
I would like to add the columns from DF2 as rows onto the DF1 putting the column names into the variant column of DF1 by matching onto IDs
Desired outcome
variant ID1 ID2 ID3 ID4 .... ID80000
123 0 1 2 1 0
321 1 2 1 1 1
543 1 1 2 1 1
6542 1 0 0 1 0
243 1 0 2 1 1
654 0 1 1 2 1
342 1 2 1 2 1
present 0 1 0 1 0
sex M F F NA NA
yob 10/10/1910 05/02/2000 01/01/1983 NA NA
disease cancer CML gout NA NA
I have tried:
df1["sex",] <- df2$sex[match(df2$ID, colnames(df1),]
Which doesn't work.
I have got this to work:
df1["sex",] <- ifelse(colnames(df1) %in% df2$ID, df2$sex, NA)
I don't even know how to approach doing this with more than one column at a time.
Any help would be much appreciated
CodePudding user response:
Using data.table
:
Although this works for this example, you can't use it as-is for "any" other data set. It requires some knowledge of the data, which can be adjusted easily when following the preparation steps (see explanation).
library(data.table)
rbindlist(list(df1, cbind( variant=names(df2)[2:ncol(df2)],
setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] ))), fill=T)
variant ID1 ID2 ID3 ID4
1: 123 0 1 2 1
2: 321 1 2 1 1
3: 543 1 1 2 1
4: 6542 1 0 0 1
5: 243 1 0 2 1
6: 654 0 1 1 2
7: 342 1 2 1 2
8: present 0 1 0 1
9: sex M F F NA
10: yob 10/10/1910 05/02/2000 01/01/1983 NA
11: disease cancer CML gout NA
Explanation
df1
is fine but df2
needs some care, since we do not have a variant column.
# first part of df2, all "ID" columns [2->end]
setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] )
# ID1 ID2 ID3
#sex M F F
#yob 10/10/1910 05/02/2000 01/01/1983
#disease cancer CML gout
# second part of df2, prepare first column
names(df2)[2:ncol(df2)]
#[1] "sex" "yob" "disease"
# put together with name variant
cbind( variant=names(df2)[2:ncol(df2)],
setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] ))
# variant ID1 ID2 ID3
#sex sex M F F
#yob yob 10/10/1910 05/02/2000 01/01/1983
#disease disease cancer CML gout
# now df2 is ready to be matched with df1s column names using rbindlist like above
Data
df1 <- structure(list(variant = c("123", "321", "543", "6542", "243",
"654", "342", "present"), ID1 = c(0L, 1L, 1L, 1L, 1L, 0L, 1L,
0L), ID2 = c(1L, 2L, 1L, 0L, 0L, 1L, 2L, 1L), ID3 = c(2L, 1L,
2L, 0L, 2L, 1L, 1L, 0L), ID4 = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L)), class = "data.frame", row.names = c(NA, -8L))
df2 <- structure(list(ID = c("ID1", "ID2", "ID3"), sex = c("M", "F",
"F"), yob = c("10/10/1910", "05/02/2000", "01/01/1983"), disease = c("cancer",
"CML", "gout")), class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
Another way, using dplyr for adjust df2, magrittr for pipe operator and data.table for join both df's
library(dplyr)
library(magrittr)
df2 <- as_tibble(t(df2[, -1])) %>%
`colnames<-` (df2[["ID"]]) %>%
mutate(variant = rownames(t(df2[, -1]))) %>%
relocate(variant)
library(data.table)
rbindlist(list(df1, df2), fill = TRUE)