To illustrate, an example data is:
structure(list(V1 = structure(c(100, 50, -110, 120, 150, 230), format.spss = "F8.0"),
V2 = structure(c(150, 70, 160, 160, 155, 71), format.spss = "F8.0")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
The outcome I'm trying to get at are columns S and I below:
------ ----- --- ---
| V1 | V2 | S | I |
------ ----- --- ---
| 100 | 150 | 2 | 1 |
------ ----- --- ---
| 50 | 70 | 4 | 0 |
------ ----- --- ---
| -110 | 160 | 0 | 0 |
------ ----- --- ---
| 120 | 160 | 0 | 1 |
------ ----- --- ---
| 150 | 155 | 0 | 2 |
------ ----- --- ---
| 230 | 71 | 0 | 1 |
------ ----- --- ---
Column S indicates the number of observations that have V1 and V2 higher than a given row/observation of V1 and V2. For example, for the first observation (V1: 100, V2: 150), there are 2 other observations that have V1 and V2 higher (120/160 and 150/155).
On the other hand, column I indicates the opposite--the number of observations that have V1 and V2 that are lower than a given row/observation.
CodePudding user response:
You may try
library(dplyr)
df %>%
rowwise %>%
mutate(S = sum((V1 < df$V1) & (V2 < df$V2)),
I = sum((V1 > df$V1) & (V2 > df$V2)))
V1 V2 S I
<dbl> <dbl> <int> <int>
1 100 150 2 1
2 50 70 4 0
3 -110 160 0 0
4 120 160 0 2
5 150 155 0 2
6 230 71 0 1
CodePudding user response:
data.table answer for fun, which will allow for all the names of the comparisons to be generated automatically in case you had many V1-n
variables:
library(data.table)
setDT(dat)
dat[,
c("S","I") := lapply(c(">", "<"),
\(x) dat[dat, on=sprintf("%1$s%2$s%1$s", names(dat), x), .N, by=.EACHI]$N)
]
## V1 V2 S I
##1: 100 150 2 1
##2: 50 70 4 0
##3: -110 160 0 0
##4: 120 160 0 2
##5: 150 155 0 2
##6: 230 71 0 1