Home > other >  Is there a way to get a COUNTIF like summary in R that also shows proportions?
Is there a way to get a COUNTIF like summary in R that also shows proportions?

Time:01-26

I am trying to summarise between my variables in R, and my data looks like this:

id  Function   V  t
1   adverb     0  1
2   discourse  1  1
3   filler     1  0
4   discourse  1  1
5   filler     0  0
6   adverb     1  1
7   adverb     1  1

What I need is a table which lists each function by the count and proportion of each variable (present/non-present).

I Excel I would use something like: COUNTIFS($B:$B, "adverb", $C:$C, ">"&0)/SUMIFS($B:$B, "adverb")

Ideally it looks like this:

Function  V  V_prop  t  t_prop
adverb    2  0.67    3  1
discourse 2  1       2  1
filler    1  0.5     0  0

I know I can use dplyr like so:

df %>% 
  group_by(Function) %>%
  dplyr::summarise_at(vars(V,t), function(x) (sum(x !=0)/n(x)))

But, this only gives me the raw counts, I need the proportions too.

CodePudding user response:

You were actually very close. TO get the proportion, just divide by the number of items in the group (with n() not n(x)). If you provide a list of functions to a _at function, it will apply them all to each variable selected:

df %>% 
    group_by(Function) %>%
    summarise_at(vars(V,t),
                 list('n' = ~ sum(. !=0),
                      'prop' = ~ (sum(. !=0)/n())))

  Function    V_n   t_n V_prop t_prop
  <chr>     <int> <int>  <dbl>  <dbl>
1 adverb        2     3  0.667      1
2 discourse     2     2  1          1
3 filler        1     0  0.5        0

Because of the way dplyr works, the new variables must have an added name (so you can't get V and t right away, they must be V_n and t_n. If you really want the same format, we can just rename them, either manually or with rename_at:

 ... %>%
    rename_at(vars(ends_with('_n')), ~ gsub('_n$', '', .))
  •  Tags:  
  • Related