Home > database >  How can I average two vectors only if their individual difference is less than 5 in R?
How can I average two vectors only if their individual difference is less than 5 in R?

Time:07-19

I have a data frame that looks like this:

time_stamp sensor_index humidity temperature pm2.5_a pm2.5_b
2022-07-15 15:00:00 51377 37.434 102.834 18.209 17.264
2022-07-11 22:00:00 51377 31.267 102.367 7.982 8.971
2022-07-11 00:00:00 51377 43.533 91.5 10.518 12.260
2022-07-11 14:00:00 51377 51.433 95.7 14.168 20.168

I'm trying to apply a correction factor that averages pm2.5_a and pm2.5_b if there difference is less than ±5.

The formula is if(pm2.5_a-pm2.5_b is <±5 then 0.52*(average of pm2.5_a & pm2.5_b) - 0.085*humidity 5.71, else nothing)

My desired output would look like this:

time_stamp sensor_index humidity temperature pm2.5_a pm2.5_b pm_cor
2022-07-15 15:00:00 51377 37.434 102.834 18.209 17.264 11.75
2022-07-11 22:00:00 51377 31.267 102.367 7.982 8.971 7.46
2022-07-11 00:00:00 51377 43.533 91.5 10.518 12.260 7.93
2022-07-11 14:00:00 51377 51.433 95.7 14.168 20.168

CodePudding user response:

We may use case_when or ifelse

library(dplyr)
df1 <- df1 %>% 
 mutate(pm_cor = case_when(abs(pm2.5_a-pm2.5_b ) < 5 ~
    0.52 * rowMeans(cbind( pm2.5_a,pm2.5_b ), na.rm = TRUE) - 
          0.085*humidity   5.71))

-output

df1
          time_stamp sensor_index humidity temperature pm2.5_a pm2.5_b    pm_cor
1 2022-07-15 15:00:00        51377   37.434     102.834  18.209  17.264 11.751090
2 2022-07-11 22:00:00        51377   31.267     102.367   7.982   8.971  7.460085
3 2022-07-11 00:00:00        51377   43.533      91.500  10.518  12.260  7.931975
4 2022-07-11 14:00:00        51377   51.433      95.700  14.168  20.168        NA

data

df1 <- structure(list(time_stamp = c("2022-07-15 15:00:00", "2022-07-11 22:00:00", 
"2022-07-11 00:00:00", "2022-07-11 14:00:00"), sensor_index = c(51377L, 
51377L, 51377L, 51377L), humidity = c(37.434, 31.267, 43.533, 
51.433), temperature = c(102.834, 102.367, 91.5, 95.7), pm2.5_a = c(18.209, 
7.982, 10.518, 14.168), pm2.5_b = c(17.264, 8.971, 12.26, 20.168
)), class = "data.frame", row.names = c(NA, -4L))
  • Related