Figuring out the max value from a series of columns in R-CodePudding

For each observation, there are three volume values, volume_1, volume_2, and volume_3. Sometimes volume_1 is the biggest while sometimes volume_3 is the biggest. Although I feel like this code should be easy, how do I calculate the overall percentage of the time when volume_1 is the biggest, volume_2 is the biggest, and then volume_3 is the biggest.

Here is a little bit of my data

structure(list(PVC = c("29A", "2D5", "2HX", "38A", "3CN", "6021051"
), Age = c(6, 5, 6, 10, 5, 7), volume_1 = c(59.44244884, 51.69518257, 
63.17950819, 56.4269955, 64.05189184, 61.82983473), volume_2 = c(54.74897726, 
56.64778447, 51.86880673, 59.71146472, 58.96633234, 63.39471043
), volume_3 = c(58.97596791, 51.75711362, 61.36142512, 57.09629745, 
64.25164825, 63.19407463), Year = c("2003", "2002", "2003", "2008", 
"2003", "1994"), averageAB = c(57.09571305, 54.17148352, 57.52415746, 
58.06923011, 61.50911209, 62.61227258), diff_AB_C = c(-1.88025486, 
2.4143699, -3.83726766, 0.972932659999998, -2.74253616, -0.58180205
), sumAB = c(114.1914261, 108.34296704, 115.04831492, 116.13846022, 
123.01822418, 125.22454516), ratioAB_C = c(1.93623657477333, 
2.09329615703558, 1.87492899154491, 2.03408041163622, 1.91463141461122, 
1.98158681637776)), row.names = c(NA, 6L), class = "data.frame")

Thank you as always for your help!!

CodePudding user response：

This is pretty easily done using the max.col() function:

df$highest_col <- max.col(df[,3:5])

head(df)
      PVC Age volume_1 volume_2 volume_3 Year averageAB
1     29A   6 59.44245 54.74898 58.97597 2003  57.09571
2     2D5   5 51.69518 56.64778 51.75711 2002  54.17148
3     2HX   6 63.17951 51.86881 61.36143 2003  57.52416
4     38A  10 56.42700 59.71146 57.09630 2008  58.06923
5     3CN   5 64.05189 58.96633 64.25165 2003  61.50911
6 6021051   7 61.82983 63.39471 63.19407 1994  62.61227
   diff_AB_C    sumAB ratioAB_C highest_col
1 -1.8802549 114.1914  1.936237           1
2  2.4143699 108.3430  2.093296           2
3 -3.8372677 115.0483  1.874929           1
4  0.9729327 116.1385  2.034080           2
5 -2.7425362 123.0182  1.914631           3
6 -0.5818020 125.2245  1.981587           2

Then you can just run table() on that new column to get the counts.

CodePudding user response：

There may be an easier way but using tidyverse I'd go

df %>% 
  rowwise() %>%
  mutate(max = case_when(max(volume_1, volume_2, volume_3) == volume_1 ~ "volume_1",
                         max(volume_1, volume_2, volume_3) == volume_2 ~ "volume_2",
                         max(volume_1, volume_2, volume_3) == volume_3 ~ "volume_3") %>% 
  ungroup() %>%
  count(max)

CodePudding user response：

data.table solution.

It finds the name of the column (it's name starting with "volums") thet contains the maximum value of each row. This is stord in te maxCol column. After that,m using table and prop.table to get the percentages.

library(data.table)
setDT(mydata)
mydata[, maxCol := names(.SD)[max.col(.SD)], .SDcols = patterns("^volume")][]
prop.table(table(mydata$maxCol))
# volume_1  volume_2  volume_3 
# 0.3333333 0.5000000 0.1666667

of in a oneliner

prop.table(table(setDT(mydata)[, .(col = names(.SD)[max.col(.SD)]), .SDcols = patterns("^volume")]))

CodePudding user response：

vol <- df %>% select(starts_with("volume"))
vol$max <- colnames(vol)[max.col(vol)]
vol <- 
  vol%>%group_by(max)%>%
  mutate(rep = n(),
  prop = 100* rep/nrow(df)) %>%
  ungroup()

left_join(df,vol)  

Joining, by = c("volume_1", "volume_2", "volume_3")
      PVC Age volume_1 volume_2 volume_3 Year averageAB  diff_AB_C    sumAB ratioAB_C
1     29A   6 59.44245 54.74898 58.97597 2003  57.09571 -1.8802549 114.1914  1.936237
2     2D5   5 51.69518 56.64778 51.75711 2002  54.17148  2.4143699 108.3430  2.093296
3     2HX   6 63.17951 51.86881 61.36143 2003  57.52416 -3.8372677 115.0483  1.874929
4     38A  10 56.42700 59.71146 57.09630 2008  58.06923  0.9729327 116.1385  2.034080
5     3CN   5 64.05189 58.96633 64.25165 2003  61.50911 -2.7425362 123.0182  1.914631
6 6021051   7 61.82983 63.39471 63.19407 1994  62.61227 -0.5818020 125.2245  1.981587
       max rep     prop
1 volume_1   2 33.33333
2 volume_2   3 50.00000
3 volume_1   2 33.33333
4 volume_2   3 50.00000
5 volume_3   1 16.66667
6 volume_2   3 50.00000
>

CodePudding user response：

Extract the volume columns, rank the columns in each row and then find the fraction of rank 3 for each. Multiply that by 100 if percentages are wanted.

How to handle ties was not defined so we used ties="first". See ?rank for other possibilities.

rowMeans(apply(DF[grep("volume", names(DF))], 1, rank, ties = "first") == 3)
##  volume_1  volume_2  volume_3 
## 0.3333333 0.5000000 0.1666667

Here is a variation that gives the same result if there are no ties.

vol <- DF[grep("volume", names(DF))]
colMeans(vol == apply(vol, 1, max))
##  volume_1  volume_2  volume_3 
## 0.3333333 0.5000000 0.1666667