I have a list of data frames, each having rows with a 3-dimensional vector (3 columns). I would like to compute the cosine similarity (lsa::cosine) on each subsequent pair of rows in each data frame (e.g., rows 1 and 2, 2 and 3, 3 and 4, etc.). How can I loop through each data frame in the list to calculate the cosine similarities of subsequent rows, keeping the cosine values separate for each data frame?
Here is some easy fake data for reproducibility purposes:
df1 = data.frame(y1 = c(1,2,3,4,5), y2 = c(2,3,4,5,6), y3 = c(5,4,3,2,1))
df2 = data.frame(y1 = c(6,7,8,9,10), y2 = c(6,5,4,3,2), y3 = c(1,3,5,7,9))
dflist = list(df1, df2)
Thanks in advance!
CodePudding user response:
We may use lapply/sapply
library(lsa)
sapply(dflist, function(x) mapply(function(u, v)
c(cosine(as.vector(u), as.vector(v))),
asplit(x[-nrow(x), ], 1), asplit(x[-1, ], 1)))
[,1] [,2]
1 0.9492889 0.9635201
2 0.9553946 0.9747824
3 0.9714890 0.9850197
4 0.9844672 0.9915254
CodePudding user response:
If your data.frames/matrices aren't big, you could t
ranspose each one, calculate the similarity between each row and then subset the returned matrix's first off-diagonal to only compare subsequent rows:
library(lsa)
lapply(dflist, \(x) {
m <- cosine(as.matrix(t(x)))
m[(col(m)-row(m)) == 1]
})
#[[1]]
#[1] 0.9492889 0.9553946 0.9714890 0.9844672
#
#[[2]]
#[1] 0.9635201 0.9747824 0.9850197 0.9915254