I need to group my data by three columns - gender, year and employment status.
Here is my data:
ID <- c(1000, 1000, 1000, 1001, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002)
Gender <- as.factor(c("M","M","M","M","M","M","M","M","F","F","F","F","F"))
Employment_status <- as.factor(c("Other","Other","Other","Employed","Employed","Employed","Employed","Employed","Employed","Employed","Employed","Employed","Unemployed"))
Year <- c(2016, 2017, 2018, 2016, 2017, 2018, 2019, 2020, 2016, 2017, 2018, 2019, 2020)
my_data <- data.frame(ID, Gender, Employment_status, Year, stringsAsFactors=F)
I wish that my end result would entail a data table about the employment rate by gender and by year. How could I achieve this in R?
Expected output would be something like this:
Thank you!
CodePudding user response:
In base R you could do:
ftable(prop.table(table(my_data[-1]), c(1, 3)), col.vars = c("Gender", "Employment_status"))
Gender F M
Employment_status Employed Other Unemployed Employed Other Unemployed
Year
2016 1.0 0.0 0.0 0.5 0.5 0.0
2017 1.0 0.0 0.0 0.5 0.5 0.0
2018 1.0 0.0 0.0 0.5 0.5 0.0
2019 1.0 0.0 0.0 1.0 0.0 0.0
2020 0.0 0.0 1.0 1.0 0.0 0.0
CodePudding user response:
Can you provide the format you want the data table to be in?
Is this roughly what you are going for?
library(dplyr)
my_data %>%
group_by(Gender, Year) %>%
count(Employment_status) %>%
summarise(sum(n)) %>%
arrange(Year)
Output:
Gender Year `sum(n)`
<fct> <dbl> <int>
1 F 2016 1
2 M 2016 2
3 F 2017 1
4 M 2017 2
5 F 2018 1
6 M 2018 2
7 F 2019 1
8 M 2019 1
9 F 2020 1
10 M 2020 1