if() statement with paste0() or grep() in r-CodePudding

I made reproducible minimal example, but my real data is really huge


ac_1 <-c(0.1, 0.3, 0.03, 0.03)
ac_2 <-c(0.2, 0.4, 0.1, 0.008)
ac_3 <-c(0.8, 0.043, 0.7, 0.01)
ac_4 <-c(0.2, 0.73, 0.1, 0.1)
c_2<-c(1,2,5,23)
check_1<-c(0.01, 0.902,0.02,0.07)
check_2<-c(0.03, 0.042,0.002,0.00001)
check_3<-c(0.01, 0.02,0.5,0.001)
check_4<-c(0.001, 0.042,0.02,0.2)
id<-1:4


df<-data.frame(id,ac_1, ac_2,ac_3,ac_4,c_2,check_1,check_2,check_3,check_4)

so, the dataframe is like this:

> df
  id ac_1  ac_2  ac_3 ac_4 c_2 check_1 check_2 check_3 check_4
1  1 0.10 0.200 0.800 0.20   1   0.010 0.03000   0.010   0.001
2  2 0.30 0.400 0.043 0.73   2   0.902 0.04200   0.020   0.042
3  3 0.03 0.100 0.700 0.10   5   0.020 0.00200   0.500   0.020
4  4 0.03 0.008 0.010 0.10  23   0.070 0.00001   0.001   0.200

and what I want to do is,

if check_1 is 0.02, I will make the corresponding ac_1 to be missing data. if check_2 is 0.02, I will make the corresponding ac_2 to be missing data. I will keep doing this every "check" and "ac"columns

For example, in the check_1 column, the 3th id person have 0.02. so, this person's ac_1 score should be missing data-- 0.03 should be missing data (NA)

In the check_3 column, the 2nd id person have 0.02. so, this person's ac_3 score should be missing data.

In the check_4 column, the 3th id person have 0.02 so, this person's ac_4 score should be missing data.

so. what i did is as follows:



for(i in 1:4){
  
  if(paste0("df$check_",i)==0.02){
    paste0("df$ac_",i)==NA
  }
}

But, it did not work...

CodePudding user response：

You're really close, but you're off on a few fundamentals.

You can't (easily) use strings to refer to objects, so "df$check_1" won't work. You can use strings to refer to column names, but not with $, you need to use [ or [[, so df[["check_1"]] will work.
if isn't vectorized, so it won't work on each value in a column. Use ifelse instead, or even better in this case we can skip the if entirely.
Using == on non-integer numbers is risky due to precision issues. We'll use a tolerance instead.
Minor issue, paste0("df$ac_",i)==NA isn't good, == is for checking equality. You need = or <- for assignment on that line.

Addressing all of these issues:

for(i in 1:4){  
  df[
    ## rows to replace
    abs(df[[paste0("check_", i)]] - 0.02) < 1e-10,
    ## column to replace
    paste0("ac_", i)
  ] <- NA
}

df
#   id ac_1  ac_2 ac_3 ac_4 c_2 check_1 check_2 check_3 check_4
# 1  1 0.10 0.200 0.80 0.20   1   0.010 0.03000   0.010   0.001
# 2  2 0.30 0.400   NA 0.73   2   0.902 0.04200   0.020   0.042
# 3  3   NA 0.100 0.70   NA   5   0.020 0.00200   0.500   0.020
# 4  4 0.03 0.008 0.01 0.10  23   0.070 0.00001   0.001   0.200

CodePudding user response：

Its often better to work with long format data, even if just temporarily. Here is an example of doing so, using dplyr and tidyr:

pivot_longer(df, -c(id,c_2)) %>%
  separate(name,into=c("type", "pos")) %>% 
  pivot_wider(names_from=type, values_from = value) %>% 
  mutate(ac=if_else(near(check,0.02), as.double(NA), ac)) %>% 
  pivot_wider(names_from = pos, values_from = ac:check)

(Updated with near() thanks to Gregor)

Output:

     id   c_2  ac_1  ac_2  ac_3  ac_4 check_1 check_2 check_3 check_4
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     1     1  0.1  0.2    0.8   0.2    0.01  0.03      0.01    0.001
2     2     2  0.3  0.4   NA     0.73   0.902 0.042     0.02    0.042
3     3     5 NA    0.1    0.7  NA      0.02  0.002     0.5     0.02 
4     4    23  0.03 0.008  0.01  0.1    0.07  0.00001   0.001   0.2