I have 2 dataframes.
The first dataframe, df1, has 1246 rows and looks like this:
gene1 gene2 gene3
AAAB.P1 1.23 2.28 -2.85
AABC.P1 4.06 -0.59 -2.42
ABCD.P1 3.26 2.19 -3.01
ABCD.R1 1.23 0.15 -2.30
DBCA.P1 1.67 -0.51 -3.24
The second dataframe, df2, also has 1246 rows and looks like this:
id primary_diagnosis
1 ABCD carcinoma
2 ABCD carcinoma
3 AAAB AS
4 DBCA carcinoma
5 EFGH other
6 LMNO AS
I need to rearrange the second df so that it is in the same order as the first df, based on the id column. I tried to use
df2<-df2[order(match(df2$id, substr(rownames(df1), 1, 4))), ]
But it only gave me 21 rows that were in the correct position. I think the problem is that df2 contains some duplicates. For example, ABCD has both ABCD.P1 and ABCD.R1 in df1 but only one entry in df2. It doesn't matter which order they go in because both P1 and R1 will match up with carcinoma. How can I reorder df2 to match whatever order df1 is already in?
CodePudding user response:
We could use simple arrange
in this way:
library(dplyr)
df2 %>%
arrange(id, df1$id)
id primary_diagnosis
3 AAAB AS
1 ABCD carcinoma
2 ABCD carcinoma
4 DBCA carcinoma
5 EFGH other
6 LMNO AS
#df1
df1 <- structure(list(gene1 = c(1.23, 4.06, 3.26, 1.23, 1.67), gene2 = c(2.28,
-0.59, 2.19, 0.15, -0.51), gene3 = c(-2.85, -2.42, -3.01, -2.3,
-3.24)), class = "data.frame", row.names = c("AAAB.P1", "AABC.P1",
"ABCD.P1", "ABCD.R1", "DBCA.P1"))
#df2
df2 <- structure(list(id = c("ABCD", "ABCD", "AAAB", "DBCA", "EFGH",
"LMNO"), primary_diagnosis = c("carcinoma", "carcinoma", "AS",
"carcinoma", "other", "AS")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
CodePudding user response:
How about this:
df1 <- read.table(text = "id gene1 gene2 gene3
AAAB.P1 1.23 2.28 -2.85
AABC.P1 4.06 -0.59 -2.42
ABCD.P1 3.26 2.19 -3.01
ABCD.R1 1.23 0.15 -2.30
DBCA.P1 1.67 -0.51 -3.24", sep=" ", header=TRUE)
rownames(df1) <- df1$id
df1 <- df1[,-1]
df2 <- read.table(text="id primary_diagnosis
ABCD carcinoma
ABCD carcinoma
AAAB AS
DBCA carcinoma
EFGH other
LMNO AS", sep=" ", header=TRUE)
df2[match(substr(rownames(df1), 1, 4), df2$id), ]
#> id primary_diagnosis
#> 3 AAAB AS
#> NA <NA> <NA>
#> 1 ABCD carcinoma
#> 1.1 ABCD carcinoma
#> 4 DBCA carcinoma
Created on 2022-05-23 by the reprex package (v2.0.1)