I'm trying to assign NA value in column RAIN_15
, if any row in RAIN_15
has value > 100 then all observations corresponding to that gr
in RAIN_15
will be replaced by NA
. Sample data is as given below:
TRGCODE RAIN DATE TIME gr RAIN_15
6155 0.0 2015-06-18 0000 17 0.0
6155 0.0 2015-06-18 0015 17 0.0
6155 0.0 2015-06-18 0030 17 0.0
6155 0.0 2015-06-18 0045 17 0.0
6155 0.0 2015-06-18 0100 17 0.0
6155 0.0 2015-06-18 0115 17 0.0
6155 0.0 2015-06-18 0130 17 0.0
6155 0.0 2015-06-18 0145 17 0.0
6155 0.0 2015-06-18 0200 17 0.0
6155 0.0 2015-06-18 0215 17 0.0
6155 0.0 2015-06-18 0230 17 0.0
6155 0.0 2015-06-18 0245 17 0.0
6155 0.2 2015-06-18 0300 17 0.2
6155 123.7 2015-06-18 0315 17 123.5
6155 127.0 2015-06-18 0330 17 3.3
6155 127.0 2015-06-18 0345 17 0.0
6155 127.0 2015-06-18 0400 17 0.0
6155 127.0 2015-06-18 0415 17 0.0
6155 127.0 2015-06-18 0430 17 0.0
6155 127.0 2015-06-18 0445 17 0.0
6155 127.0 2015-06-18 0500 17 0.0
6155 141.7 2015-06-18 0515 17 14.7
6155 594.3 2015-06-18 0530 17 452.6
6155 NA 2015-06-18 0545 17 NA
6155 NA 2015-06-18 0600 17 NA
Data has thousands of such rows where gr
is group based on TIME
. Here RAIN_15
has value > 100, so all observations in RAIN_15
for corresponding to gr
17 should be replaced by NA
.
I tried with
df_v1 <- df %>% group_by(TRGCODE, gr) %>% mutate(RAIN_15 = ifelse(any(RAIN_15 > 100), NA, RAIN_15))
but it is replacing RAIN_15 values with 0. If I remove any
in ifelse
then it is just replacing values > 100 in RAIN_15
by NA.
CodePudding user response:
With data.table
:
library(data.table)
setDT(df_v1)
df_v1[,RAIN_15:=fifelse(rep(any(RAIN_15>100,na.rm=T),.N),NA_real_,RAIN_15),by=gr][]
TRGCODE RAIN DATE TIME gr RAIN_15
1: 6155 0.0 2015-06-18 0 17 NA
2: 6155 0.0 2015-06-18 15 17 NA
3: 6155 0.0 2015-06-18 30 17 NA
4: 6155 0.0 2015-06-18 45 17 NA
5: 6155 0.0 2015-06-18 100 17 NA
6: 6155 0.0 2015-06-18 115 17 NA
7: 6155 0.0 2015-06-18 130 17 NA
8: 6155 0.0 2015-06-18 145 17 NA
9: 6155 0.0 2015-06-18 200 17 NA
10: 6155 0.0 2015-06-18 215 17 NA
11: 6155 0.0 2015-06-18 230 17 NA
12: 6155 0.0 2015-06-18 245 17 NA
13: 6155 0.2 2015-06-18 300 17 NA
14: 6155 123.7 2015-06-18 315 17 NA
15: 6155 127.0 2015-06-18 330 17 NA
16: 6155 127.0 2015-06-18 345 17 NA
17: 6155 127.0 2015-06-18 400 17 NA
18: 6155 127.0 2015-06-18 415 17 NA
19: 6155 127.0 2015-06-18 430 17 NA
20: 6155 127.0 2015-06-18 445 17 NA
21: 6155 127.0 2015-06-18 500 17 NA
22: 6155 141.7 2015-06-18 515 17 NA
23: 6155 594.3 2015-06-18 530 17 NA
24: 6155 NA 2015-06-18 545 17 NA
25: 6155 NA 2015-06-18 600 17 NA
TRGCODE RAIN DATE TIME gr RAIN_15
CodePudding user response:
df <- read.table(header = TRUE, text = 'TRGCODE RAIN DATE TIME gr RAIN_15
6155 0.0 2015-06-18 0000 17 0.0
6155 0.0 2015-06-18 0015 17 0.0
6155 0.0 2015-06-18 0030 17 0.0
6155 0.0 2015-06-18 0045 17 0.0
6155 0.0 2015-06-18 0100 17 0.0
6155 0.0 2015-06-18 0115 17 0.0
6155 0.0 2015-06-18 0130 17 0.0
6155 0.0 2015-06-18 0145 17 0.0
6155 0.0 2015-06-18 0200 17 0.0
6155 0.0 2015-06-18 0215 17 0.0
6155 0.0 2015-06-18 0230 17 0.0
6155 0.0 2015-06-18 0245 17 0.0
6155 0.2 2015-06-18 0300 17 0.2
6155 123.7 2015-06-18 0315 17 123.5
6155 127.0 2015-06-18 0330 17 3.3
6155 127.0 2015-06-18 0345 17 0.0
6155 127.0 2015-06-18 0400 17 0.0
6155 127.0 2015-06-18 0415 17 0.0
6155 127.0 2015-06-18 0430 17 0.0
6155 127.0 2015-06-18 0445 17 0.0
6155 127.0 2015-06-18 0500 17 0.0
6155 141.7 2015-06-18 0515 17 14.7
6155 594.3 2015-06-18 0530 17 452.6
6155 NA 2015-06-18 0545 17 NA
6155 NA 2015-06-18 0600 17 NA')
x=aggregate(RAIN_15 ~ gr, data = df, max)
x$RAIN_15=ifelse(x$RAIN_15>100, NA,x$RAIN_15)
df$RAIN_15=merge(df,x, all.x = TRUE, by="gr")[7]