Home > OS >  Append one row with average values for selected columns and counting percent for another based on co
Append one row with average values for selected columns and counting percent for another based on co

Time:04-14

I have a df below, I need to calculate Pass percent after excluding - rows and the average values of pred1 and pred2:

df <- data.frame(
  name = c('A', 'B', 'C', 'D', 'E'), 
  status = c('Pass', 'Fail', '-', 'Pass', 'Pass'), 
  real = c(10, NA, 8, 9, 4), 
  pred1 = c(50, 20, NA, 14, 11),
  pred2 = c(12, 12, 8, NA, 6)
)

df:

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6

The expected result:

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6
6 total  0.75   NA 23.75   9.5

I thought to bind to the result below to df, but it's not concise and beautiful solutions:

pass_percent <- nrow(df %>% filter(status == 'Pass')) / nrow(df %>% filter(status != '-'))
avg_pred1 <- mean(df$pred1, na.rm = T)
avg_pred2 <- mean(df$pred2, na.rm = T)

How could I acheive that in a more concise way with R's pipe?

CodePudding user response:

What about tibble::add_row:

df %>% 
  add_row(name = "total",
          status = as.character(mean(df$status[df$status != "-"] == "Pass")),
          real = mean(df$real),
          pred1 = mean(df$pred1, na.rm = T),
          pred2 = mean(df$pred2, na.rm = T))

   name status real pred1 pred2
1     A   Pass   10 50.00  12.0
2     B   Fail   NA 20.00  12.0
3     C      -    8    NA   8.0
4     D   Pass    9 14.00    NA
5     E   Pass    4 11.00   6.0
6 total   0.75   NA 23.75   9.5

Explanation of as.character(mean(df$status[df$status != "-"] == "Pass")):

  • df$status[df$status != "-"] is the vector of df$status without the element equal to "-" (so only Pass and Fail).
  • df$status[df$status != "-"] == "Pass" is TRUE if df$status is "Pass", FALSE otherwise.
  • mean(...) is possible because TRUE and FALSE values are coerced to numeric when the mean is computed.
  • as.character(...) is needed because df$status is a character variable.
  • Related