Counting the number of observations with values of multiple columns that are higher/lower in R-CodePudding

To illustrate, an example data is:

structure(list(V1 = structure(c(100, 50, -110, 120, 150, 230), format.spss = "F8.0"), 
    V2 = structure(c(150, 70, 160, 160, 155, 71), format.spss = "F8.0")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))

The outcome I'm trying to get at are columns S and I below:

 ------ ----- --- --- 
| V1   | V2  | S | I |
 ------ ----- --- --- 
| 100  | 150 | 2 | 1 |
 ------ ----- --- --- 
| 50   | 70  | 4 | 0 |
 ------ ----- --- --- 
| -110 | 160 | 0 | 0 |
 ------ ----- --- --- 
| 120  | 160 | 0 | 1 |
 ------ ----- --- --- 
| 150  | 155 | 0 | 2 |
 ------ ----- --- --- 
| 230  | 71  | 0 | 1 |
 ------ ----- --- ---

Column S indicates the number of observations that have V1 and V2 higher than a given row/observation of V1 and V2. For example, for the first observation (V1: 100, V2: 150), there are 2 other observations that have V1 and V2 higher (120/160 and 150/155).

On the other hand, column I indicates the opposite--the number of observations that have V1 and V2 that are lower than a given row/observation.

CodePudding user response：

You may try

library(dplyr)
df %>%
  rowwise %>%
  mutate(S = sum((V1 < df$V1) & (V2 < df$V2)),
         I = sum((V1 > df$V1) & (V2 > df$V2)))

     V1    V2     S     I
  <dbl> <dbl> <int> <int>
1   100   150     2     1
2    50    70     4     0
3  -110   160     0     0
4   120   160     0     2
5   150   155     0     2
6   230    71     0     1

CodePudding user response：

data.table answer for fun, which will allow for all the names of the comparisons to be generated automatically in case you had many V1-n variables:

library(data.table)
setDT(dat)

dat[,
    c("S","I") := lapply(c(">", "<"),
      \(x) dat[dat, on=sprintf("%1$s%2$s%1$s", names(dat), x), .N, by=.EACHI]$N)
]
##     V1  V2 S I
##1:  100 150 2 1
##2:   50  70 4 0
##3: -110 160 0 0
##4:  120 160 0 2
##5:  150 155 0 2
##6:  230  71 0 1