Home > Mobile >  Loop through the rows of two columns in a dataframe to obtain values from a matrix
Loop through the rows of two columns in a dataframe to obtain values from a matrix


I am working in R to clean some data in order to analyze it. I have a dataframe (df) that looks like this:

Strain1 Strain2
p1      p2
p2      p3
p3      p4
p4      p5
p5      p1

and a matrix (distmat) that looks like this:

       p1     p2     p3     p4     p5
p1     0      0.1    0.3     0.4    0.9
p2     0.1     0     0.5     0.1    0.6
p3     0.3    0.5     0      0.8    0.3
p4     0.4    0.1    0.8      0     0.2
p5     0.9    0.6    0.3     0.2     0

I want to add a column to my data frame which would take Strain1 and Strain2, find the value in the data matrix, pull the value, and put it in a new column and the same row. I need to do this for over 1000 data points.

For reference of the data frame I would get from this example:

Strain1 Strain2 dist
p1      p2       0.1
p2      p3       0.5
p3      p4       0.8
p4      p5       0.2
p5      p1       0.9

CodePudding user response:

An option in base R

df$dist <- mat1[as.matrix(df)]


> df
  Strain1 Strain2 dist
1      p1      p2  0.1
2      p2      p3  0.5
3      p3      p4  0.8
4      p4      p5  0.2
5      p5      p1  0.9


df <- structure(list(Strain1 = c("p1", "p2", "p3", "p4", "p5"), Strain2 = c("p2", 
"p3", "p4", "p5", "p1")), class = "data.frame", row.names = c(NA, 

mat1 <- structure(c(0, 0.1, 0.3, 0.4, 0.9, 0.1, 0, 0.5, 0.1, 0.6, 0.3, 
0.5, 0, 0.8, 0.3, 0.4, 0.1, 0.8, 0, 0.2, 0.9, 0.6, 0.3, 0.2, 
0), dim = c(5L, 5L), dimnames = list(c("p1", "p2", "p3", "p4", 
"p5"), c("p1", "p2", "p3", "p4", "p5")))

CodePudding user response:

A possible solution in base R:

df$dist <- apply(df, 1, \(x) mat[x[1], x[2]])

#>   Strain1 Strain2 dist
#> 1      p1      p2  0.1
#> 2      p2      p3  0.5
#> 3      p3      p4  0.8
#> 4      p4      p5  0.2
#> 5      p5      p1  0.9

CodePudding user response:

df<-tribble(~Strain1, ~Strain2,
                "p1",      "p2",
                "p2",      "p3",
                "p3",      "p4",
                "p4",      "p5",
                "p5",      "p1")

distmat<-matrix(runif(25), nrow=5, ncol=5,
                dimnames = list(c(paste0("p", 1:5)),
                                c(paste0("p", 1:5))))

df <- df %>% 
  rowwise() %>% 
  mutate(dist = distmat[Strain2, Strain1])

CodePudding user response:

taking a wild guess here, but since you called it distmat, maybe have a look if the convenience functions shave() and stretch() from the corrr package may be useful, reducing the distmat to one triangle and bringing it to long format.

corrr::shave(corrr::as_cordf(mat1)) %>% 
  corrr::stretch(na.rm = TRUE)
  • Related