I am trying to summarise between my variables in R, and my data looks like this:
id Function V t
1 adverb 0 1
2 discourse 1 1
3 filler 1 0
4 discourse 1 1
5 filler 0 0
6 adverb 1 1
7 adverb 1 1
What I need is a table which lists each function by the count and proportion of each variable (present/non-present).
I Excel I would use something like: COUNTIFS($B:$B, "adverb", $C:$C, ">"&0)/SUMIFS($B:$B, "adverb")
Ideally it looks like this:
Function V V_prop t t_prop
adverb 2 0.67 3 1
discourse 2 1 2 1
filler 1 0.5 0 0
I know I can use dplyr like so:
df %>%
group_by(Function) %>%
dplyr::summarise_at(vars(V,t), function(x) (sum(x !=0)/n(x)))
But, this only gives me the raw counts, I need the proportions too.
CodePudding user response:
You were actually very close. TO get the proportion, just divide by the number of items in the group (with n()
not n(x)
). If you provide a list of functions to a _at
function, it will apply them all to each variable selected:
df %>%
group_by(Function) %>%
summarise_at(vars(V,t),
list('n' = ~ sum(. !=0),
'prop' = ~ (sum(. !=0)/n())))
Function V_n t_n V_prop t_prop
<chr> <int> <int> <dbl> <dbl>
1 adverb 2 3 0.667 1
2 discourse 2 2 1 1
3 filler 1 0 0.5 0
Because of the way dplyr works, the new variables must have an added name (so you can't get V
and t
right away, they must be V_n
and t_n
. If you really want the same format, we can just rename them, either manually or with rename_at
:
... %>%
rename_at(vars(ends_with('_n')), ~ gsub('_n$', '', .))