Home > OS >  Smooth way to calculate index based on several variable comparisons in base R
Smooth way to calculate index based on several variable comparisons in base R

Time:04-07

Example data to copy

df <- data.frame(
  AA = c(100, 200, 300, 400), 
  X1 = c(2, 1, 3, 1),
  X2 = c(1, 3, 4, 1)
)

Based on the index of AA, and it's values, I would like to calculate the sum of indicators based on the condition df$AA[i] > df[df$X1[i], c('AA')] (here for X1) for every row on a fluctuating number of variables.

My probably naive approach is to use a for-loop, which works perfectly for a fixed number of variables (columns), in the given example X1, X2. My problem is that I do not know the number of variables beforehand. Theoretically, any number 1, 2, 3, ... is possibly.

for (i in 1:nrow(df)) {
  df$index[i] <- sum(df$AA[i] > df[df$X1[i], c('AA')],
                     df$AA[i] > df[df$X2[i], c('AA')])
}

Which gives the desired output for a fixed number of variables X1, X2:

df
#>    AA X1 X2 index
#> 1 100  2  1     0
#> 2 200  1  3     1
#> 3 300  3  4     0
#> 4 400  1  1     2

Is there a smooth base R approach which translates my approach to a flexible number of variables X1, ..., Xn?

Note, the reason why I am interested in a base R approach is my aim to extend an existing package, which is fully written in base R. So I would like to keep it like that. Loops or *apply-family approaches are both very welcome. I am aware of the fact that operations on dataframes are often considered to be slower. Since all variables AA, X1, ... are of the same length, a solution which does not rely on a dataframe structure would also be great!

Created on 2022-04-06 by the reprex package (v2.0.1)

CodePudding user response:

You don't need to loop through rows. You can use Reduce.

Reduce(` `, lapply(df[-1], function(x) df$AA > df$AA[x]))
#> [1] 0 1 0 2

CodePudding user response:

Does this correspond to what you're looking for ?

df$index <- apply(df, 1, function(x){sum(x[1] > df$AA[x[-1]])})

assuming that AA is the column 1 and all your Xi are all the other columns.

CodePudding user response:

The following one-liner will work especially because df is a data-frame:

rowSums( # To sum over a non-specified number of columns
  mapply(
    df[,- which(names(df) == "AA")], # Everything except AA
    df[,"AA", drop = FALSE],         # Only AA, but in a data-frame
    FUN = \(index, aa) aa[index] < aa)) # Compare
  • Related