Obtaining counts of factor variables in R-CodePudding

I have a dataset of mostly factor variables that I'd like to summarize the counts for in R using the summarise function from dplyr. This is from a pre and post treatment scenario so some levels might be missing in the post, depending on responses.

I can get both counts individually like so:

bla = data.frame(pre = c("a", "b", "c", "d", "e"), post = c("b", "d", "a", "a", "e"))
bla$pre = as.factor(bla$pre)
bla$post = as.factor(bla$post)
bla %>% group_by(pre) %>% summarise(Count = n())
bla %>% group_by(post) %>% summarise(Count = n())

which yields:

pre	Count
a	1
b	1
c	1
d	1
e	1

post	Count
a	2
b	1
d	1
e	1

But what I am after is:

Level	pre Count	post Count
a	1	2
b	1	1
c	1	0
d	1	1
e	1	1

CodePudding user response：

library(dplyr)
library(tidyr)

bla %>% 
  pivot_longer(everything(), names_to = "name", values_to = "Level") %>% 
  mutate(name = paste(name, "Count")) %>% 
  count(name, Level) %>% 
  pivot_wider(names_from=name, values_from = n, values_fill = 0) %>% 
  arrange(Level)

    Level `post Count` `pre Count`
  <fct>        <int>       <int>
1 a                2           1
2 b                1           1
3 c                0           1
4 d                1           1
5 e                1           1

CodePudding user response：

in base R

xtabs(~., stack(bla))

      ind
values pre post
     a   1    2
     b   1    1
     c   1    0
     d   1    1
     e   1    1

or even

sapply(bla, \(x)table(factor(x, unique(unlist(bla)))))

  pre post
a   1    2
b   1    1
c   1    0
d   1    1
e   1    1

If you need a dataframe:

as.data.frame.matrix(xtabs(~., stack(bla)))
  pre post
a   1    2
b   1    1
c   1    0
d   1    1
e   1    1