I have a large data frame that I need to compare if a value in, say, row 1 column 2 is within 25 percent of row 1 column 1 and then repeat for each column and each row.
Edit: Not all cells are compared to 1,1. They are compared to the one before it, i.e. (1,2) is compared to (1,1), (1,3) is compared to (1,2), (2,2) is compared to (2,1), and (2,3) is compared to (2,2)
Quick example:
1 2 3
1 40 50 90
2 25 60 43
In this case I would need to return something like (1,3), (2,2), (2,3).
Here's what I coded but it's incredibly slow for large data frames (as I expected) and while I know how to speed this up in C, C , Python, etc. I am newer to R and not sure what to do.
off = data.frame(matrix(ncol=2,nrow=0))
colnames(off) = c("Row", "Col")
for (row in 1:nrow(data)) {
for (col in 2:ncol(data)) {
orig = data[[row, col]]
comp = data[[row, col-1]]
if ((orig > comp & orig > 1.1*comp) |
(orig < comp & orig < 0.9*comp)) {
off[nrow(off) 1,] = c(row, col)
}
}
}
Thank you for any help in advance and please ask any clarifying questions.
CodePudding user response:
Let's do this column-wise (no for
loops required):
mtx <- structure(c(40L, 25L, 50L, 60L, 90L, 43L), dim = 2:3)
which(cbind(FALSE,
mapply(function(a, b) abs(mtx[,b] / mtx[,a] - 1) <= 0.25,
1:(ncol(mtx)-1), 2:ncol(mtx))),
arr.ind = TRUE)
# row col
# [1,] 1 2
Breakdown:
mapply(...)
iterates the function over two vectors/lists. In this case, we iterate over1:(ncol(mtx)-1)
joined with2:ncol(mtx)
, so the anon-function is called with(1,2)
,(2,3)
(and more if the matrix had more columns).In the internal anon-function,
mtx[,b] / mtx[,a]
computes the ratio for a whole column at a time, so in the first call it'smtx[,2] / mtx[,1]
. Since this is a ratio, we can reduce to %-change by subtracting1
. Since we need to find those with 25% or less change, we end up withabs(mtx[,b] / mtx[,a] - 1) <= 0.25
.That step is for each pair of consecutive columns.
The
which(..., arr.ind=TRUE)
returns a two-column matrix with column namesrow
andcol
, indicating where in the provided matrix theTRUE
cells are found.The
mapply(..)
is reductive in that it returnsncol(mtx)-1
columns; sincearr.ind=
'scol
column will be one-off, we can either add 1 tocol
afterwards, or we can simply add a column of false to the left of the matrix returned frommapply
, I opted for that option.