I'm trying to figure out how I can compare the values in a dataframe row-by-row that correspond to those given by the aggregate()
function.
For example:
#create data frame
df <- data.frame(team=c('a', 'a', 'b', 'b', 'b', 'c', 'c'),
pts=c(5, 8, 14, 18, 5, 7, 7),
rebs=c(8, 8, 9, 3, 8, 7, 4))
#view data frame
df
team pts rebs
1 a 5 8
2 a 8 8
3 b 14 9
4 b 18 3
5 b 5 8
6 c 7 7
7 c 7 4
#find mean points scored by team
agg_df = aggregate(df$pts, list(df$team), FUN=mean)
Group.1 x
1 a 6.50000
2 b 12.33333
3 c 7.00000
What I want to do is create a new column in df
using a logic similar to the following pseudo-code:
df$pts[i] > agg_df$x[i]
then df$performance = 'overperformed'
else df$performance = 'underperformed'
.
But this is not exactly what I want. I want to compare row 1 and 2's points to the mean points for team a in agg_df
. Similarly, rows 3-5 in df
should be compared to the mean points for group b in agg_df
.
The final result would look like:
> df
team pts rebs performance
1 a 5 8 under
2 a 8 8 over
3 b 14 9 over
4 b 18 3 over
5 b 5 8 under
6 c 7 7 average
7 c 7 4 average
I am a little puzzled as to how to achieve this, or if it is even achievable, so any help is very much appreciated.
CodePudding user response:
You can do:
library(tidyverse)
df %>%
group_by(team) %>%
mutate(performance = case_when(pts > mean(pts) ~ "over",
pts == mean(pts) ~ "average",
pts < mean(pts) ~ "under")) %>%
ungroup()
which gives:
# A tibble: 7 x 4
team pts rebs performance
<chr> <dbl> <dbl> <chr>
1 a 5 8 under
2 a 8 8 over
3 b 14 9 over
4 b 18 3 over
5 b 5 8 under
6 c 7 7 average
7 c 7 4 average
CodePudding user response:
Or in base way with merge()
.
# Merge data
db <- merge(df, agg_df, by.x = "team", by.y = 'Group.1')
db$performance <- ifelse(db$pts == db$x, 'average',
ifelse(db$pts > db$x, 'over', 'under'))
db$x <- NULL
db