I have a list of questions, and I want to know how many rows have non-NA values using summarize. I want to use summarize because I'm already using that to calculate the average, which works in the below code. Why does the below code not work and how can I fix it?
library(dplyr)
test <- tibble(student = c("j", "c", "s"),
q1 = c(1, 2, 3),
q2 = c(NA_real_, NA_real_, 4),
q3 = c(43, NA_real_, 232))
test %>%
dplyr::summarise(n = across(starts_with("q"), ~n(.x)),
avg = across(contains("q"), ~ round(mean(.x, na.rm = T), 2)))
expected_outcome <- tibble(n_q1 = 3,
n_q2 = 1,
n_q3 = 2,
avg_q1 = 2,
avg_q2 = 4,
avg_q3 = 138)
CodePudding user response:
library(dplyr)
test %>%
summarize(across(starts_with("q"), list(n = ~sum(!is.na(.)),
avg = ~mean(., na.rm = T)),
.names = "{.fn}_{.col}"))
From the ?across
documentation, you can pass a list to the .fns
argument:
A list of functions/lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x))
This will apply every function in that list to the columns you have specified. You can then use the .names
argument of across
to set the column names how you desire.
Output
n_q1 avg_q1 n_q2 avg_q2 n_q3 avg_q3
<int> <dbl> <int> <dbl> <int> <dbl>
1 3 2 1 4 2 138.
CodePudding user response:
Update: Upps I missed the whole question. sorry: But here is an alternative just for fun: The preferred answer is already given by @LMc:
library(dplyr)
test %>%
summarise(across(starts_with("q"), list(avg = ~mean(., na.rm = T)),
.names = "{.fn}_{.col}")) %>%
bind_cols(test %>% purrr::map_df(~sum(!is.na(.))))
avg_q1 avg_q2 avg_q3 student q1 q2 q3
<dbl> <dbl> <dbl> <int> <int> <int> <int>
1 2 4 138. 3 3 1 2
test %>%
summarise(across(starts_with("q"), list(avg = ~mean(., na.rm = T)),
.names = "{.fn}_{.col}")) %>%
bind_cols(test %>% purrr::map_df(~sum(!is.na(.))))
First not full answer: To get the non-nas of the whole dataset, we could do this:
library(dplyr)
test %>%
purrr::map_df(~sum(!is.na(.)))
student q1 q2 q3
<int> <int> <int> <int>
1 3 3 1 2