I would like to calculate the number of observations within each group and also the number of observations (by group) that are zero (or NA) for several columns.
I can get the number of observations per group and can find a way to get the number of zero's for several columns. But when I do this, the number of observations per group shows how many non-zeros there are (see code below). I imagine I could do this creating two separate df and then combining them, but I imagine there has to be a more efficient way.
mtcars %>%
group_by(cyl) %>%
mutate (count = n()) %>%
summarise_each(funs(sum(0))
Thanks!
CodePudding user response:
I think you are looking for this:
library(tidyverse)
mtcars |>
group_by(cyl) |>
summarize(n = n(),
across(-n, ~sum(is.na(.) | . == 0)))
# A tibble: 3 × 12
cyl n mpg disp hp drat wt qsec vs am gear carb
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 4 11 0 0 0 0 0 0 1 3 0 0
2 6 7 0 0 0 0 0 0 3 4 0 0
3 8 14 0 0 0 0 0 0 14 12 0 0
First column gives you the grouping var, second one the number of observations per group, all other columns count how many zeros or NAs we have per column. You can adjust the columns for the latter part in the across
function.
CodePudding user response:
Number of observations by group (cyl)
Note that if there are missing values in the group variable, they will be counted as their own group.
library(dplyr)
mtcars %>% count(cyl)
#> cyl n
#> 1 4 11
#> 2 6 7
#> 3 8 14
# Add missing values
df <- mtcars
df$cyl[1:2] <- NA
df %>% count(cyl)
#> cyl n
#> 1 4 11
#> 2 6 5
#> 3 8 14
#> 4 NA 2
Number of 0 or NA values for each column
df %>% summarise(across(everything(),
~ sum(is.na(.x) | .x == 0)))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 0 2 0 0 0 0 0 18 19 0 0