I would like to calculate the number of observations within each group and also the number of observations (by group) that are zero (or NA) for several columns.

I can get the number of observations per group and can find a way to get the number of zero's for several columns. But when I do this, the number of observations per group shows how many non-zeros there are (see code below). I imagine I could do this creating two separate df and then combining them, but I imagine there has to be a more efficient way.

mtcars %>% 
group_by(cyl) %>% 
mutate (count = n()) %>%
summarise_each(funs(sum(0))

Thanks!

CodePudding user response：

I think you are looking for this:

library(tidyverse)

mtcars |> 
  group_by(cyl) |> 
  summarize(n = n(),
            across(-n, ~sum(is.na(.) | . == 0)))

# A tibble: 3 × 12
    cyl     n   mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     4    11     0     0     0     0     0     0     1     3     0     0
2     6     7     0     0     0     0     0     0     3     4     0     0
3     8    14     0     0     0     0     0     0    14    12     0     0

First column gives you the grouping var, second one the number of observations per group, all other columns count how many zeros or NAs we have per column. You can adjust the columns for the latter part in the across function.

CodePudding user response：

Number of observations by group (cyl)

Note that if there are missing values in the group variable, they will be counted as their own group.

library(dplyr)

mtcars %>% count(cyl)
#>   cyl  n
#> 1   4 11
#> 2   6  7
#> 3   8 14

# Add missing values
df <- mtcars
df$cyl[1:2] <- NA

df %>% count(cyl)
#>   cyl  n
#> 1   4 11
#> 2   6  5
#> 3   8 14
#> 4  NA  2

Number of 0 or NA values for each column

df %>% summarise(across(everything(),
                        ~ sum(is.na(.x) | .x == 0)))
#>   mpg cyl disp hp drat wt qsec vs am gear carb
#> 1   0   2    0  0    0  0    0 18 19    0    0