I have a table like this:
genderTime | Salary | Grade |
---|---|---|
ftMale | 32000 | G |
ptMale | 15500 | DP |
ptMale | 37500 | H |
ptFemale | 31500 | G |
ftFemale | 37400 | H |
ftMale | 36000 | G |
ptFemale | 31000 | G |
ptFemale | 16000 | DP |
ptMale | 37000 | H |
I have the Grades as a factor where grades = DP, G, H
I need to create a table that gives me the mean salary by grade for full-time males (ftMale) and part-time males (ptMale) like so:
Grade | Full-time Male | Part-time male |
---|---|---|
DP | 0 | 15500 |
G | 34000 | 0 |
H | 0 | 37250 |
My dataset is a lot larger than this and there would be means for part time and full time in each grade. Any help would be much appreciated! Thanks
CodePudding user response:
Another dplyr approach:
df %>%
group_by(genderTime, Grade) %>%
summarise(Salary = mean(Salary, na.rm=TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = genderTime, values_from = Salary, values_fill = 0) %>%
select("Grade", "ftMale", "ptMale")
# A tibble: 3 × 3
Grade ftMale ptMale
<chr> <dbl> <dbl>
1 H 0 37250
2 G 34000 0
3 DP 0 15500
Just another (slightly different) dplyr solution
df %>%
filter(genderTime %in% c("ftMale", "ptMale")) %>%
group_by(genderTime, Grade) %>%
summarise(Salary = mean(Salary, na.rm=TRUE)) %>%
pivot_wider(names_from = genderTime, values_from = Salary, values_fill = 0)
The same using full R base:
reshape(aggregate(Salary ~ genderTime Grade,
subset = genderTime %in% c("ftMale", "ptMale"),
FUN=mean,
data=df),
direction = "wide",
idvar = "Grade",
timevar = "genderTime")
CodePudding user response:
Group by Grade
and then summarize your data by getting the mean of Salary
while making sure that you are subsetting to the relevant ftMale/ptMale
values.
library(tidyverse)
df |>
group_by(Grade) |>
summarize(full_time_male = mean(Salary[genderTime == "ftMale"]),
part_time_male = mean(Salary[genderTime == "ptMale"]))