Home > database >  Obtaining counts of factor variables in R
Obtaining counts of factor variables in R

Time:01-19

I have a dataset of mostly factor variables that I'd like to summarize the counts for in R using the summarise function from dplyr. This is from a pre and post treatment scenario so some levels might be missing in the post, depending on responses.

I can get both counts individually like so:

bla = data.frame(pre = c("a", "b", "c", "d", "e"), post = c("b", "d", "a", "a", "e"))
bla$pre = as.factor(bla$pre)
bla$post = as.factor(bla$post)
bla %>% group_by(pre) %>% summarise(Count = n())
bla %>% group_by(post) %>% summarise(Count = n())

which yields:

pre Count
a 1
b 1
c 1
d 1
e 1
post Count
a 2
b 1
d 1
e 1

But what I am after is:

Level pre Count post Count
a 1 2
b 1 1
c 1 0
d 1 1
e 1 1

CodePudding user response:

library(dplyr)
library(tidyr)

bla %>% 
  pivot_longer(everything(), names_to = "name", values_to = "Level") %>% 
  mutate(name = paste(name, "Count")) %>% 
  count(name, Level) %>% 
  pivot_wider(names_from=name, values_from = n, values_fill = 0) %>% 
  arrange(Level)
    Level `post Count` `pre Count`
  <fct>        <int>       <int>
1 a                2           1
2 b                1           1
3 c                0           1
4 d                1           1
5 e                1           1

CodePudding user response:

in base R

xtabs(~., stack(bla))

      ind
values pre post
     a   1    2
     b   1    1
     c   1    0
     d   1    1
     e   1    1

or even

sapply(bla, \(x)table(factor(x, unique(unlist(bla)))))

  pre post
a   1    2
b   1    1
c   1    0
d   1    1
e   1    1

If you need a dataframe:

as.data.frame.matrix(xtabs(~., stack(bla)))
  pre post
a   1    2
b   1    1
c   1    0
d   1    1
e   1    1
  • Related