Grouping and finding mean in R dataframe-CodePudding

I have a table like this:

genderTime	Salary	Grade
ftMale	32000	G
ptMale	15500	DP
ptMale	37500	H
ptFemale	31500	G
ftFemale	37400	H
ftMale	36000	G
ptFemale	31000	G
ptFemale	16000	DP
ptMale	37000	H

I have the Grades as a factor where grades = DP, G, H

I need to create a table that gives me the mean salary by grade for full-time males (ftMale) and part-time males (ptMale) like so:

Grade	Full-time Male	Part-time male
DP	0	15500
G	34000	0
H	0	37250

My dataset is a lot larger than this and there would be means for part time and full time in each grade. Any help would be much appreciated! Thanks

CodePudding user response：

Another dplyr approach:

df %>% 
  group_by(genderTime, Grade) %>% 
  summarise(Salary = mean(Salary, na.rm=TRUE)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = genderTime, values_from = Salary, values_fill = 0) %>% 
  select("Grade", "ftMale", "ptMale")

# A tibble: 3 × 3
  Grade ftMale ptMale
  <chr>  <dbl>  <dbl>
1 H          0  37250
2 G      34000      0
3 DP         0  15500

Just another (slightly different) dplyr solution

df %>% 
  filter(genderTime %in% c("ftMale", "ptMale")) %>% 
  group_by(genderTime, Grade) %>% 
  summarise(Salary = mean(Salary, na.rm=TRUE)) %>% 
  pivot_wider(names_from = genderTime, values_from = Salary, values_fill = 0)

The same using full R base:

reshape(aggregate(Salary ~ genderTime   Grade, 
                  subset = genderTime %in% c("ftMale", "ptMale"), 
                  FUN=mean, 
                  data=df), 
        direction = "wide", 
        idvar = "Grade", 
        timevar = "genderTime")

CodePudding user response：

Group by Grade and then summarize your data by getting the mean of Salary while making sure that you are subsetting to the relevant ftMale/ptMale values.

library(tidyverse)

df |>
  group_by(Grade) |>
  summarize(full_time_male = mean(Salary[genderTime == "ftMale"]),
            part_time_male = mean(Salary[genderTime == "ptMale"]))