I have this example dataframe in Rstudio :
mode sex age_group
1 neutral female middle
2 happy male senior
3 grumpy female middle
4 neutral female middle
5 grumpy female middle
6 neutral female middle
7 grumpy female middle
8 neutral female middle
9 neutral female middle
10 grumpy female middle
11 neutral female middle
12 neutral female middle
13 grumpy female middle
14 grumpy female middle
15 grumpy female middle
16 neutral female middle
17 grumpy female middle
18 happy male young
19 grumpy female middle
20 neutral male senior
21 neutral female middle
22 grumpy female middle
23 grumpy female middle
24 grumpy female middle
25 happy male young
26 grumpy female middle
27 neutral male senior
28 grumpy female middle
29 happy male senior
30 neutral female middle
31 grumpy female middle
32 neutral female middle
33 neutral female middle
34 neutral female middle
35 grumpy female middle
36 happy male senior
37 grumpy female middle
38 happy male senior
39 neutral male senior
40 happy male young
41 neutral male senior
42 grumpy female middle
43 neutral male senior
44 happy male young
45 neutral female middle
46 grumpy female middle
47 neutral female middle
48 happy male young
49 neutral male senior
50 happy male senior
And with the use of tidyr::expand
, I was able to create another dataframe with all possible variables combinations as follows :
mode sex age_group
1 grumpy female middle
2 grumpy female senior
3 grumpy female young
4 grumpy male middle
5 grumpy male senior
6 grumpy male young
7 happy female middle
8 happy female senior
9 happy female young
10 happy male middle
11 happy male senior
12 happy male young
13 neutral female middle
14 neutral female senior
15 neutral female young
16 neutral male middle
17 neutral male senior
18 neutral male young
However, for the combinations dataframe, I would like to add a column named "Frequencies" that includes the frequency of each combination group of variables (Meaning 18 different frequencies).
Can someone help me make that with a simple function?
Thanks
# the data frame is created as follows
set.seed(111)
mode = sample(c("happy","neutral","grumpy"),
size = 50,
replace=TRUE,
c(0.3,0.3,0.4))
set.seed(111)
sex = sample(c("female","male"),
size=50,
replace=TRUE,
c(0.6,0.4))
set.seed(111)
age_group = sample(c("young","middle","senior"),
size=50,
replace=TRUE,
c(0.2,0.6,0.2))
status = data.frame(mode=mode,
sex=sex,
age_group=age_group)
CodePudding user response:
With count
, you can set .drop = FALSE
to include all possible combinations (even if the count is 0):
library(dplyr)
status %>%
mutate(across(everything(), factor)) %>%
count(mode, sex, age_group, .drop = FALSE)
mode sex age_group n
1 grumpy female middle 19
2 grumpy female senior 0
3 grumpy female young 0
4 grumpy male middle 0
5 grumpy male senior 0
6 grumpy male young 0
7 happy female middle 0
8 happy female senior 0
9 happy female young 0
10 happy male middle 0
11 happy male senior 5
12 happy male young 5
13 neutral female middle 15
14 neutral female senior 0
15 neutral female young 0
16 neutral male middle 0
17 neutral male senior 6
18 neutral male young 0
CodePudding user response:
in BASE r:
data.frame(table(status))
mode sex age_group Freq
1 grumpy female middle 19
2 happy female middle 0
3 neutral female middle 15
4 grumpy male middle 0
5 happy male middle 0
6 neutral male middle 0
7 grumpy female senior 0
8 happy female senior 0
9 neutral female senior 0
10 grumpy male senior 0
11 happy male senior 5
12 neutral male senior 6
13 grumpy female young 0
14 happy female young 0
15 neutral female young 0
16 grumpy male young 0
17 happy male young 5
18 neutral male young 0
In Tidyverse
status %>%
mutate_all(factor) %>%
table() %>%
data.frame()
mode sex age_group Freq
1 grumpy female middle 19
2 happy female middle 0
3 neutral female middle 15
4 grumpy male middle 0
5 happy male middle 0
6 neutral male middle 0
7 grumpy female senior 0
8 happy female senior 0
9 neutral female senior 0
10 grumpy male senior 0
11 happy male senior 5
12 neutral male senior 6
13 grumpy female young 0
14 happy female young 0
15 neutral female young 0
16 grumpy male young 0
17 happy male young 5
18 neutral male young 0
CodePudding user response:
We could use
library(dplyr)
status %>%
mutate(across(everything(), factor)) %>%
count(across(everything()), .drop = FALSE)
-output
mode sex age_group n
1 grumpy female middle 19
2 grumpy female senior 0
3 grumpy female young 0
4 grumpy male middle 0
5 grumpy male senior 0
6 grumpy male young 0
7 happy female middle 0
8 happy female senior 0
9 happy female young 0
10 happy male middle 0
11 happy male senior 5
12 happy male young 5
13 neutral female middle 15
14 neutral female senior 0
15 neutral female young 0
16 neutral male middle 0
17 neutral male senior 6
18 neutral male young 0