I have a data frame that looks like this:
time_stamp | sensor_index | humidity | temperature | pm2.5_a | pm2.5_b |
---|---|---|---|---|---|
2022-07-15 15:00:00 | 51377 | 37.434 | 102.834 | 18.209 | 17.264 |
2022-07-11 22:00:00 | 51377 | 31.267 | 102.367 | 7.982 | 8.971 |
2022-07-11 00:00:00 | 51377 | 43.533 | 91.5 | 10.518 | 12.260 |
2022-07-11 14:00:00 | 51377 | 51.433 | 95.7 | 14.168 | 20.168 |
I'm trying to apply a correction factor that averages pm2.5_a and pm2.5_b if there difference is less than ±5.
The formula is if(pm2.5_a-pm2.5_b is <±5 then 0.52*(average of pm2.5_a & pm2.5_b) - 0.085*humidity 5.71, else nothing)
My desired output would look like this:
time_stamp | sensor_index | humidity | temperature | pm2.5_a | pm2.5_b | pm_cor |
---|---|---|---|---|---|---|
2022-07-15 15:00:00 | 51377 | 37.434 | 102.834 | 18.209 | 17.264 | 11.75 |
2022-07-11 22:00:00 | 51377 | 31.267 | 102.367 | 7.982 | 8.971 | 7.46 |
2022-07-11 00:00:00 | 51377 | 43.533 | 91.5 | 10.518 | 12.260 | 7.93 |
2022-07-11 14:00:00 | 51377 | 51.433 | 95.7 | 14.168 | 20.168 |
CodePudding user response:
We may use case_when
or ifelse
library(dplyr)
df1 <- df1 %>%
mutate(pm_cor = case_when(abs(pm2.5_a-pm2.5_b ) < 5 ~
0.52 * rowMeans(cbind( pm2.5_a,pm2.5_b ), na.rm = TRUE) -
0.085*humidity 5.71))
-output
df1
time_stamp sensor_index humidity temperature pm2.5_a pm2.5_b pm_cor
1 2022-07-15 15:00:00 51377 37.434 102.834 18.209 17.264 11.751090
2 2022-07-11 22:00:00 51377 31.267 102.367 7.982 8.971 7.460085
3 2022-07-11 00:00:00 51377 43.533 91.500 10.518 12.260 7.931975
4 2022-07-11 14:00:00 51377 51.433 95.700 14.168 20.168 NA
data
df1 <- structure(list(time_stamp = c("2022-07-15 15:00:00", "2022-07-11 22:00:00",
"2022-07-11 00:00:00", "2022-07-11 14:00:00"), sensor_index = c(51377L,
51377L, 51377L, 51377L), humidity = c(37.434, 31.267, 43.533,
51.433), temperature = c(102.834, 102.367, 91.5, 95.7), pm2.5_a = c(18.209,
7.982, 10.518, 14.168), pm2.5_b = c(17.264, 8.971, 12.26, 20.168
)), class = "data.frame", row.names = c(NA, -4L))