I got a DF with peoples salary data and their job. One row is one person. I need to calculate the average salary of 3 people on the same job and make a new DF out of it. The 3 people need to be on the same job and their wages need to be adjacent if the DF is sorted from highest to lowest salary. The average salary of the person themselves and the ones above and below them in the DF if they have the same job. The people with the highest and lowest salary in a job are excluded as they have nobody above or below them.
This is a sample of the data i have
Job salary IT 5000 IT 4500 IT 4000 IT 4000 Sales 4500 Sales 4500 Sales 4000 Sales 3000 Sales 2500 HR 3000 HR 2500 HR 2300
This is what i would like to get (if the average salary went to decimal places i rounded it. But in the R DF there is no need to do it. Decimal places are ok.
Job salary IT 4500 IT 4167 Sales 4333 Sales 3833 Sales 3167 HR 2600
I'm stuck as i can't figure out how to calculate the average of the 3 people on the same job and exclude the top and bottom. Hope you can help.
Thank you
CodePudding user response:
You want a rolling average by group. This can be done with zoo::rollmean
coupled with dplyr::group_by
.
library(dplyr)
library(zoo)
dat %>%
group_by(Job) %>%
summarise(mean = rollmean(salary, 3, align = "right"))
Job mean
<fct> <dbl>
1 IT 4500
2 IT 4167.
3 Sales 4333.
4 Sales 3833.
5 Sales 3167.
6 HR 2600
CodePudding user response:
Here are some base R options
> with(df,stack(tapply(salary, Job, function(x) rowMeans(embed(x, 3)))))
values ind
1 2600.000 HR
2 4500.000 IT
3 4166.667 IT
4 4333.333 Sales
5 3833.333 Sales
6 3166.667 Sales
> aggregate(salary ~ ., df, function(x) rowMeans(embed(x, 3)))
Job salary
1 HR 2600
2 IT 4500.000, 4166.667
3 Sales 4333.333, 3833.333, 3166.667