Home > Back-end >  calculate the percentage of times the highest value of a row corresponds in each variable
calculate the percentage of times the highest value of a row corresponds in each variable

Time:08-04

I have a data frame in R as follows:

set.seed(123)
    A <- as.data.frame(matrix(rnorm(20 * 5, mean = 0, sd = 1), 20, 5))

which results in:

> A
            V1          V2          V3          V4           V5
1  -0.56047565 -1.06782371 -0.69470698  0.37963948  0.005764186
2  -0.23017749 -0.21797491 -0.20791728 -0.50232345  0.385280401
3   1.55870831 -1.02600445 -1.26539635 -0.33320738 -0.370660032
4   0.07050839 -0.72889123  2.16895597 -1.01857538  0.644376549
5   0.12928774 -0.62503927  1.20796200 -1.07179123 -0.220486562
6   1.71506499 -1.68669331 -1.12310858  0.30352864  0.331781964
7   0.46091621  0.83778704 -0.40288484  0.44820978  1.096839013
8  -1.26506123  0.15337312 -0.46665535  0.05300423  0.435181491
9  -0.68685285 -1.13813694  0.77996512  0.92226747 -0.325931586
10 -0.44566197  1.25381492 -0.08336907  2.05008469  1.148807618
11  1.22408180  0.42646422  0.25331851 -0.49103117  0.993503856
12  0.35981383 -0.29507148 -0.02854676 -2.30916888  0.548396960
13  0.40077145  0.89512566 -0.04287046  1.00573852  0.238731735
14  0.11068272  0.87813349  1.36860228 -0.70920076 -0.627906076
15 -0.55584113  0.82158108 -0.22577099 -0.68800862  1.360652449
16  1.78691314  0.68864025  1.51647060  1.02557137 -0.600259587
17  0.49785048  0.55391765 -1.54875280 -0.28477301  2.187332993
18 -1.96661716 -0.06191171  0.58461375 -1.22071771  1.532610626
19  0.70135590 -0.30596266  0.12385424  0.18130348 -0.235700359
20 -0.47279141 -0.38047100  0.21594157 -0.13889136 -1.026420900

I want to find in each row the location of the highest value and display the percentage of times that the highest value was in the specific column. i.e.,

V1  V2  V3  V4  V5
2%  25% 40% 30% 3%

How can I calculate this in R?

CodePudding user response:

max.col and table:

max.col(A)
#  [1] 4 5 1 3 3 1 5 5 4 4 1 5 4 3 5 1 5 5 1 3
table(max.col(A))
# 1 3 4 5 
# 5 4 4 7 
table(names(A)[max.col(A)])/nrow(A)
#   V1   V3   V4   V5 
# 0.25 0.20 0.20 0.35 

Though this doesn't match your expected output, I suspect that that's because you were just demonstrating what it would look like ...

CodePudding user response:

Similar to r2evans's solution, but filling in the 0s:

max.col(A) |> 
  factor(levels = seq_along(A), labels = names(A)) |>
  table() |> 
  prop.table()
#   V1   V2   V3   V4   V5 
# 0.25 0.00 0.20 0.20 0.35 
  • Related