Home > database >  Assign NA to a variable based on Group and ifelse condition
Assign NA to a variable based on Group and ifelse condition

Time:05-20

I'm trying to assign NA value in column RAIN_15, if any row in RAIN_15 has value > 100 then all observations corresponding to that gr in RAIN_15 will be replaced by NA. Sample data is as given below:

   TRGCODE RAIN   DATE    TIME gr RAIN_15
    6155   0.0 2015-06-18 0000 17     0.0
    6155   0.0 2015-06-18 0015 17     0.0
    6155   0.0 2015-06-18 0030 17     0.0
    6155   0.0 2015-06-18 0045 17     0.0
    6155   0.0 2015-06-18 0100 17     0.0
    6155   0.0 2015-06-18 0115 17     0.0
    6155   0.0 2015-06-18 0130 17     0.0
    6155   0.0 2015-06-18 0145 17     0.0
    6155   0.0 2015-06-18 0200 17     0.0
    6155   0.0 2015-06-18 0215 17     0.0
    6155   0.0 2015-06-18 0230 17     0.0
    6155   0.0 2015-06-18 0245 17     0.0
    6155   0.2 2015-06-18 0300 17     0.2
    6155 123.7 2015-06-18 0315 17   123.5
    6155 127.0 2015-06-18 0330 17     3.3
    6155 127.0 2015-06-18 0345 17     0.0
    6155 127.0 2015-06-18 0400 17     0.0
    6155 127.0 2015-06-18 0415 17     0.0
    6155 127.0 2015-06-18 0430 17     0.0
    6155 127.0 2015-06-18 0445 17     0.0
    6155 127.0 2015-06-18 0500 17     0.0
    6155 141.7 2015-06-18 0515 17    14.7
    6155 594.3 2015-06-18 0530 17   452.6
    6155    NA 2015-06-18 0545 17      NA
    6155    NA 2015-06-18 0600 17      NA

Data has thousands of such rows where gr is group based on TIME. Here RAIN_15 has value > 100, so all observations in RAIN_15 for corresponding to gr 17 should be replaced by NA. I tried with

df_v1 <- df %>% group_by(TRGCODE, gr) %>% mutate(RAIN_15 = ifelse(any(RAIN_15 > 100), NA, RAIN_15)) 

but it is replacing RAIN_15 values with 0. If I remove any in ifelse then it is just replacing values > 100 in RAIN_15 by NA.

CodePudding user response:

With data.table:

library(data.table)
setDT(df_v1)
df_v1[,RAIN_15:=fifelse(rep(any(RAIN_15>100,na.rm=T),.N),NA_real_,RAIN_15),by=gr][]

   TRGCODE  RAIN       DATE TIME gr RAIN_15
 1:    6155   0.0 2015-06-18    0 17      NA
 2:    6155   0.0 2015-06-18   15 17      NA
 3:    6155   0.0 2015-06-18   30 17      NA
 4:    6155   0.0 2015-06-18   45 17      NA
 5:    6155   0.0 2015-06-18  100 17      NA
 6:    6155   0.0 2015-06-18  115 17      NA
 7:    6155   0.0 2015-06-18  130 17      NA
 8:    6155   0.0 2015-06-18  145 17      NA
 9:    6155   0.0 2015-06-18  200 17      NA
10:    6155   0.0 2015-06-18  215 17      NA
11:    6155   0.0 2015-06-18  230 17      NA
12:    6155   0.0 2015-06-18  245 17      NA
13:    6155   0.2 2015-06-18  300 17      NA
14:    6155 123.7 2015-06-18  315 17      NA
15:    6155 127.0 2015-06-18  330 17      NA
16:    6155 127.0 2015-06-18  345 17      NA
17:    6155 127.0 2015-06-18  400 17      NA
18:    6155 127.0 2015-06-18  415 17      NA
19:    6155 127.0 2015-06-18  430 17      NA
20:    6155 127.0 2015-06-18  445 17      NA
21:    6155 127.0 2015-06-18  500 17      NA
22:    6155 141.7 2015-06-18  515 17      NA
23:    6155 594.3 2015-06-18  530 17      NA
24:    6155    NA 2015-06-18  545 17      NA
25:    6155    NA 2015-06-18  600 17      NA
    TRGCODE  RAIN       DATE TIME gr RAIN_15

CodePudding user response:

df <- read.table(header = TRUE, text = 'TRGCODE RAIN   DATE    TIME gr RAIN_15
    6155   0.0 2015-06-18 0000 17     0.0
    6155   0.0 2015-06-18 0015 17     0.0
    6155   0.0 2015-06-18 0030 17     0.0
    6155   0.0 2015-06-18 0045 17     0.0
    6155   0.0 2015-06-18 0100 17     0.0
    6155   0.0 2015-06-18 0115 17     0.0
    6155   0.0 2015-06-18 0130 17     0.0
    6155   0.0 2015-06-18 0145 17     0.0
    6155   0.0 2015-06-18 0200 17     0.0
    6155   0.0 2015-06-18 0215 17     0.0
    6155   0.0 2015-06-18 0230 17     0.0
    6155   0.0 2015-06-18 0245 17     0.0
    6155   0.2 2015-06-18 0300 17     0.2
    6155 123.7 2015-06-18 0315 17   123.5
    6155 127.0 2015-06-18 0330 17     3.3
    6155 127.0 2015-06-18 0345 17     0.0
    6155 127.0 2015-06-18 0400 17     0.0
    6155 127.0 2015-06-18 0415 17     0.0
    6155 127.0 2015-06-18 0430 17     0.0
    6155 127.0 2015-06-18 0445 17     0.0
    6155 127.0 2015-06-18 0500 17     0.0
    6155 141.7 2015-06-18 0515 17    14.7
    6155 594.3 2015-06-18 0530 17   452.6
    6155    NA 2015-06-18 0545 17      NA
    6155    NA 2015-06-18 0600 17      NA')

x=aggregate(RAIN_15 ~ gr, data = df, max)

x$RAIN_15=ifelse(x$RAIN_15>100, NA,x$RAIN_15)

df$RAIN_15=merge(df,x, all.x = TRUE, by="gr")[7]
  • Related