Is there a way to get a COUNTIF like summary in R that also shows proportions?-CodePudding

I am trying to summarise between my variables in R, and my data looks like this:

id  Function   V  t
1   adverb     0  1
2   discourse  1  1
3   filler     1  0
4   discourse  1  1
5   filler     0  0
6   adverb     1  1
7   adverb     1  1

What I need is a table which lists each function by the count and proportion of each variable (present/non-present).

I Excel I would use something like: COUNTIFS($B:$B, "adverb", $C:$C, ">"&0)/SUMIFS($B:$B, "adverb")

Ideally it looks like this:

Function  V  V_prop  t  t_prop
adverb    2  0.67    3  1
discourse 2  1       2  1
filler    1  0.5     0  0

I know I can use dplyr like so:

df %>% 
  group_by(Function) %>%
  dplyr::summarise_at(vars(V,t), function(x) (sum(x !=0)/n(x)))

But, this only gives me the raw counts, I need the proportions too.

CodePudding user response：

You were actually very close. TO get the proportion, just divide by the number of items in the group (with n() not n(x)). If you provide a list of functions to a _at function, it will apply them all to each variable selected:

df %>% 
    group_by(Function) %>%
    summarise_at(vars(V,t),
                 list('n' = ~ sum(. !=0),
                      'prop' = ~ (sum(. !=0)/n())))

  Function    V_n   t_n V_prop t_prop
  <chr>     <int> <int>  <dbl>  <dbl>
1 adverb        2     3  0.667      1
2 discourse     2     2  1          1
3 filler        1     0  0.5        0

Because of the way dplyr works, the new variables must have an added name (so you can't get V and t right away, they must be V_n and t_n. If you really want the same format, we can just rename them, either manually or with rename_at:

 ... %>%
    rename_at(vars(ends_with('_n')), ~ gsub('_n$', '', .))