Home > Enterprise >  Apply function to list of data frames in R
Apply function to list of data frames in R

Time:08-18

I have a list of data frames, each having rows with a 3-dimensional vector (3 columns). I would like to compute the cosine similarity (lsa::cosine) on each subsequent pair of rows in each data frame (e.g., rows 1 and 2, 2 and 3, 3 and 4, etc.). How can I loop through each data frame in the list to calculate the cosine similarities of subsequent rows, keeping the cosine values separate for each data frame?

Here is some easy fake data for reproducibility purposes:

df1 = data.frame(y1 = c(1,2,3,4,5), y2 = c(2,3,4,5,6), y3 = c(5,4,3,2,1))
df2 = data.frame(y1 = c(6,7,8,9,10), y2 = c(6,5,4,3,2), y3 = c(1,3,5,7,9))
dflist = list(df1, df2)

Thanks in advance!

CodePudding user response:

We may use lapply/sapply

library(lsa)
sapply(dflist, function(x) mapply(function(u, v)
   c(cosine(as.vector(u), as.vector(v))), 
   asplit(x[-nrow(x), ], 1), asplit(x[-1, ], 1)))
       [,1]      [,2]
1 0.9492889 0.9635201
2 0.9553946 0.9747824
3 0.9714890 0.9850197
4 0.9844672 0.9915254

CodePudding user response:

If your data.frames/matrices aren't big, you could transpose each one, calculate the similarity between each row and then subset the returned matrix's first off-diagonal to only compare subsequent rows:

library(lsa)
lapply(dflist, \(x) {
  m <- cosine(as.matrix(t(x)))
  m[(col(m)-row(m)) == 1]
})
#[[1]]
#[1] 0.9492889 0.9553946 0.9714890 0.9844672
#
#[[2]]
#[1] 0.9635201 0.9747824 0.9850197 0.9915254
  • Related