Example data to copy
df <- data.frame(
AA = c(100, 200, 300, 400),
X1 = c(2, 1, 3, 1),
X2 = c(1, 3, 4, 1)
)
Based on the index of AA
, and it's values, I would like to calculate the sum of indicators based on the condition df$AA[i] > df[df$X1[i], c('AA')]
(here for X1
) for every row on a fluctuating number of variables.
My probably naive approach is to use a for-loop, which works perfectly for a fixed number of variables (columns), in the given example X1, X2
. My problem is that I do not know the number of variables beforehand. Theoretically, any number 1, 2, 3, ... is possibly.
for (i in 1:nrow(df)) {
df$index[i] <- sum(df$AA[i] > df[df$X1[i], c('AA')],
df$AA[i] > df[df$X2[i], c('AA')])
}
Which gives the desired output for a fixed number of variables X1, X2
:
df
#> AA X1 X2 index
#> 1 100 2 1 0
#> 2 200 1 3 1
#> 3 300 3 4 0
#> 4 400 1 1 2
Is there a smooth base R approach which translates my approach to a flexible number of variables X1, ..., Xn?
Note, the reason why I am interested in a base R
approach is my aim to extend an existing package, which is fully written in base R
. So I would like to keep it like that.
Loops or *apply
-family approaches are both very welcome.
I am aware of the fact that operations on dataframes
are often considered to be slower. Since all variables AA, X1, ...
are of the same length, a solution which does not rely on a dataframe
structure would also be great!
Created on 2022-04-06 by the reprex package (v2.0.1)
CodePudding user response:
You don't need to loop through rows. You can use Reduce
.
Reduce(` `, lapply(df[-1], function(x) df$AA > df$AA[x]))
#> [1] 0 1 0 2
CodePudding user response:
Does this correspond to what you're looking for ?
df$index <- apply(df, 1, function(x){sum(x[1] > df$AA[x[-1]])})
assuming that AA is the column 1 and all your Xi are all the other columns.
CodePudding user response:
The following one-liner will work especially because df
is a data-frame:
rowSums( # To sum over a non-specified number of columns
mapply(
df[,- which(names(df) == "AA")], # Everything except AA
df[,"AA", drop = FALSE], # Only AA, but in a data-frame
FUN = \(index, aa) aa[index] < aa)) # Compare