Say I have two dataframes, A and B, and they are produced like this:
library(dplyr)
# Example Data A
{
set.seed(123)
index = rep(c(1:30),
each = 15*360)
month = rep(c(1:12),
each = 15,
times = 30)
day = rep(c(1:15),
each = 1,
times = 360)
variable_of_interest = runif(n = 15*360*30,
min = 0,
max = 100)
Data_A = as.data.frame(cbind(index,
month,
day,
variable_of_interest))
}
# Example Data B
{
Data_B = Data_A %>% group_by(index,
month) %>% summarise(classification_threshold = mean(variable_of_interest))
}
Data_A
and Data_B
have two similar columns, index
and month
, but have different rownumbers.
What I desire is to use the column called classification_threshold
of dataframe Data_B
to mutate dataframe Data_A
by creating a new column, that indicates, whether the corresponding observation of variable_of_interest
exceeds its own unique threshold (value=1) or below (value=0).
In doing so, I'd like to use the columns index
and month
to identify the correct classification_threshold
value to compare variable_of_interest
with.
CodePudding user response:
Do a left join between the Data_A and summarised Data_B by 'index', 'month' and create the column by comparing the two columns
library(dplyr)
Data_A_new <- left_join(Data_A, ungroup(Data_B), by = c("index", "month")) %>%
mutate(flag = (variable_of_interest > classification_threshold))