Home > Blockchain >  Mutate dataframe A based on dataframe B?
Mutate dataframe A based on dataframe B?

Time:01-07

Say I have two dataframes, A and B, and they are produced like this:

library(dplyr)
# Example Data A
{
  set.seed(123)
  
  index = rep(c(1:30),
              each = 15*360)
  
  month = rep(c(1:12), 
              each = 15, 
              times = 30)
  
  day = rep(c(1:15),
            each = 1,
            times = 360)
  
  variable_of_interest = runif(n = 15*360*30,
                               min = 0,
                               max = 100)
  
  Data_A = as.data.frame(cbind(index,
                             month,
                             day,
                             variable_of_interest)) 
}

# Example Data B
{
  Data_B = Data_A %>% group_by(index,
                               month) %>% summarise(classification_threshold = mean(variable_of_interest))
}
  

Data_A and Data_B have two similar columns, index and month, but have different rownumbers.

What I desire is to use the column called classification_threshold of dataframe Data_B to mutate dataframe Data_A by creating a new column, that indicates, whether the corresponding observation of variable_of_interest exceeds its own unique threshold (value=1) or below (value=0).

In doing so, I'd like to use the columns index and month to identify the correct classification_threshold value to compare variable_of_interest with.

CodePudding user response:

Do a left join between the Data_A and summarised Data_B by 'index', 'month' and create the column by comparing the two columns

library(dplyr)
Data_A_new <- left_join(Data_A, ungroup(Data_B), by = c("index", "month")) %>% 
   mutate(flag =  (variable_of_interest > classification_threshold))
  • Related