Append one row with average values for selected columns and counting percent for another based on co-CodePudding

I have a df below, I need to calculate Pass percent after excluding - rows and the average values of pred1 and pred2:

df <- data.frame(
  name = c('A', 'B', 'C', 'D', 'E'), 
  status = c('Pass', 'Fail', '-', 'Pass', 'Pass'), 
  real = c(10, NA, 8, 9, 4), 
  pred1 = c(50, 20, NA, 14, 11),
  pred2 = c(12, 12, 8, NA, 6)
)

df:

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6

The expected result:

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6
6 total  0.75   NA 23.75   9.5

I thought to bind to the result below to df, but it's not concise and beautiful solutions:

pass_percent <- nrow(df %>% filter(status == 'Pass')) / nrow(df %>% filter(status != '-'))
avg_pred1 <- mean(df$pred1, na.rm = T)
avg_pred2 <- mean(df$pred2, na.rm = T)

How could I acheive that in a more concise way with R's pipe?

CodePudding user response：

What about tibble::add_row:

df %>% 
  add_row(name = "total",
          status = as.character(mean(df$status[df$status != "-"] == "Pass")),
          real = mean(df$real),
          pred1 = mean(df$pred1, na.rm = T),
          pred2 = mean(df$pred2, na.rm = T))

   name status real pred1 pred2
1     A   Pass   10 50.00  12.0
2     B   Fail   NA 20.00  12.0
3     C      -    8    NA   8.0
4     D   Pass    9 14.00    NA
5     E   Pass    4 11.00   6.0
6 total   0.75   NA 23.75   9.5

Explanation of as.character(mean(df$status[df$status != "-"] == "Pass")):

df$status[df$status != "-"] is the vector of df$status without the element equal to "-" (so only Pass and Fail).
df$status[df$status != "-"] == "Pass" is TRUE if df$status is "Pass", FALSE otherwise.
mean(...) is possible because TRUE and FALSE values are coerced to numeric when the mean is computed.
as.character(...) is needed because df$status is a character variable.