Home > Net >  R - Compare values in a dataframe to aggregated dataframe
R - Compare values in a dataframe to aggregated dataframe

Time:11-12

I'm trying to figure out how I can compare the values in a dataframe row-by-row that correspond to those given by the aggregate() function.

For example:

#create data frame
df <- data.frame(team=c('a', 'a', 'b', 'b', 'b', 'c', 'c'),
                 pts=c(5, 8, 14, 18, 5, 7, 7),
                 rebs=c(8, 8, 9, 3, 8, 7, 4))

#view data frame
df

  team pts rebs
1    a   5    8
2    a   8    8
3    b  14    9
4    b  18    3
5    b   5    8
6    c   7    7
7    c   7    4

#find mean points scored by team
agg_df = aggregate(df$pts, list(df$team), FUN=mean)

  Group.1        x
1       a  6.50000
2       b 12.33333
3       c  7.00000

What I want to do is create a new column in df using a logic similar to the following pseudo-code: df$pts[i] > agg_df$x[i] then df$performance = 'overperformed' else df$performance = 'underperformed'.

But this is not exactly what I want. I want to compare row 1 and 2's points to the mean points for team a in agg_df. Similarly, rows 3-5 in df should be compared to the mean points for group b in agg_df.

The final result would look like:

> df
  team pts rebs performance
1    a   5    8       under
2    a   8    8        over
3    b  14    9        over
4    b  18    3        over
5    b   5    8       under
6    c   7    7     average
7    c   7    4     average

I am a little puzzled as to how to achieve this, or if it is even achievable, so any help is very much appreciated.

CodePudding user response:

You can do:

library(tidyverse)
df %>%
  group_by(team) %>%
  mutate(performance = case_when(pts  > mean(pts) ~ "over",
                                 pts == mean(pts) ~ "average",
                                 pts  < mean(pts) ~ "under")) %>%
  ungroup()

which gives:

# A tibble: 7 x 4
  team    pts  rebs performance
  <chr> <dbl> <dbl> <chr>      
1 a         5     8 under      
2 a         8     8 over       
3 b        14     9 over       
4 b        18     3 over       
5 b         5     8 under      
6 c         7     7 average    
7 c         7     4 average  

CodePudding user response:

Or in base way with merge().

# Merge data
db <- merge(df, agg_df, by.x = "team", by.y = 'Group.1')
db$performance <- ifelse(db$pts == db$x, 'average',
                         ifelse(db$pts > db$x, 'over', 'under'))
db$x <- NULL
db
  • Related