I have a dataset of mostly factor variables that I'd like to summarize the counts for in R using the summarise
function from dplyr. This is from a pre and post treatment scenario so some levels might be missing in the post, depending on responses.
I can get both counts individually like so:
bla = data.frame(pre = c("a", "b", "c", "d", "e"), post = c("b", "d", "a", "a", "e"))
bla$pre = as.factor(bla$pre)
bla$post = as.factor(bla$post)
bla %>% group_by(pre) %>% summarise(Count = n())
bla %>% group_by(post) %>% summarise(Count = n())
which yields:
pre | Count |
---|---|
a | 1 |
b | 1 |
c | 1 |
d | 1 |
e | 1 |
post | Count |
---|---|
a | 2 |
b | 1 |
d | 1 |
e | 1 |
But what I am after is:
Level | pre Count | post Count |
---|---|---|
a | 1 | 2 |
b | 1 | 1 |
c | 1 | 0 |
d | 1 | 1 |
e | 1 | 1 |
CodePudding user response:
library(dplyr)
library(tidyr)
bla %>%
pivot_longer(everything(), names_to = "name", values_to = "Level") %>%
mutate(name = paste(name, "Count")) %>%
count(name, Level) %>%
pivot_wider(names_from=name, values_from = n, values_fill = 0) %>%
arrange(Level)
Level `post Count` `pre Count`
<fct> <int> <int>
1 a 2 1
2 b 1 1
3 c 0 1
4 d 1 1
5 e 1 1
CodePudding user response:
in base R
xtabs(~., stack(bla))
ind
values pre post
a 1 2
b 1 1
c 1 0
d 1 1
e 1 1
or even
sapply(bla, \(x)table(factor(x, unique(unlist(bla)))))
pre post
a 1 2
b 1 1
c 1 0
d 1 1
e 1 1
If you need a dataframe:
as.data.frame.matrix(xtabs(~., stack(bla)))
pre post
a 1 2
b 1 1
c 1 0
d 1 1
e 1 1