I have a dataframe with a column for region and a column for an economic indicator.
like
region | indicator |
---|---|
A | 100 |
B | 50 |
C | 10 |
D | 20 |
B | 102 |
C | 10 |
Say that A and B belong to one group, and C and D belong to another. I want to create a new column for the comparative performance that will quantify how well a region is doing compared to the group average.
so for example the average for group 1 is 84, the average for group 2 is 13.33. so the output should be
region | comparative_indicator |
---|---|
A | 16 |
B | -34 |
C | -3.33 |
D | 6.67 |
B | 18 |
C | -3.33 |
I have no idea what to do here
test dataframe
region<- c('A','B','C','D','B','C')
indicator <- c(100,50,10,20,102,10)
df <- data.frame(region,indicator)
CodePudding user response:
You may divide the regions
in different groups. For each group subtract the average indicator
value of the group with it's value to get comparative_indicator
.
library(dplyr)
df %>%
mutate(groups = case_when(region %in% c('A', 'B') ~ 'group1',
region %in% c('C', 'D') ~ 'group2')) %>%
group_by(groups) %>%
mutate(comparative_indicator = indicator - mean(indicator)) %>%
ungroup
# region indicator groups comparative_indicator
# <chr> <dbl> <chr> <dbl>
#1 A 100 group1 16
#2 B 50 group1 -34
#3 C 10 group2 -3.33
#4 D 20 group2 6.67
#5 B 102 group1 18
#6 C 10 group2 -3.33
You may remove/rename columns according to your preference in the output.