I am working in R to clean some data in order to analyze it. I have a dataframe (df) that looks like this:
Strain1 Strain2
p1 p2
p2 p3
p3 p4
p4 p5
p5 p1
and a matrix (distmat) that looks like this:
p1 p2 p3 p4 p5
p1 0 0.1 0.3 0.4 0.9
p2 0.1 0 0.5 0.1 0.6
p3 0.3 0.5 0 0.8 0.3
p4 0.4 0.1 0.8 0 0.2
p5 0.9 0.6 0.3 0.2 0
I want to add a column to my data frame which would take Strain1 and Strain2, find the value in the data matrix, pull the value, and put it in a new column and the same row. I need to do this for over 1000 data points.
For reference of the data frame I would get from this example:
Strain1 Strain2 dist
p1 p2 0.1
p2 p3 0.5
p3 p4 0.8
p4 p5 0.2
p5 p1 0.9
CodePudding user response:
An option in base R
df$dist <- mat1[as.matrix(df)]
-output
> df
Strain1 Strain2 dist
1 p1 p2 0.1
2 p2 p3 0.5
3 p3 p4 0.8
4 p4 p5 0.2
5 p5 p1 0.9
data
df <- structure(list(Strain1 = c("p1", "p2", "p3", "p4", "p5"), Strain2 = c("p2",
"p3", "p4", "p5", "p1")), class = "data.frame", row.names = c(NA,
-5L))
mat1 <- structure(c(0, 0.1, 0.3, 0.4, 0.9, 0.1, 0, 0.5, 0.1, 0.6, 0.3,
0.5, 0, 0.8, 0.3, 0.4, 0.1, 0.8, 0, 0.2, 0.9, 0.6, 0.3, 0.2,
0), dim = c(5L, 5L), dimnames = list(c("p1", "p2", "p3", "p4",
"p5"), c("p1", "p2", "p3", "p4", "p5")))
CodePudding user response:
A possible solution in base R
:
df$dist <- apply(df, 1, \(x) mat[x[1], x[2]])
df
#> Strain1 Strain2 dist
#> 1 p1 p2 0.1
#> 2 p2 p3 0.5
#> 3 p3 p4 0.8
#> 4 p4 p5 0.2
#> 5 p5 p1 0.9
CodePudding user response:
df<-tribble(~Strain1, ~Strain2,
"p1", "p2",
"p2", "p3",
"p3", "p4",
"p4", "p5",
"p5", "p1")
distmat<-matrix(runif(25), nrow=5, ncol=5,
dimnames = list(c(paste0("p", 1:5)),
c(paste0("p", 1:5))))
df <- df %>%
rowwise() %>%
mutate(dist = distmat[Strain2, Strain1])
CodePudding user response:
taking a wild guess here, but since you called it distmat, maybe have a look if the convenience functions shave()
and stretch()
from the corrr
package may be useful, reducing the distmat to one triangle and bringing it to long format.
corrr::shave(corrr::as_cordf(mat1)) %>%
corrr::stretch(na.rm = TRUE)