Replace multiple column values based on the logical set of the same dataframe-CodePudding

I have a dataframe df. I want to replace any column values where df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey", "Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] != df$SURVEY_MIN] is true with NA. How do I do that in R?

df <- structure(list(PhysicalActivity_yn_agesurvey = c(58, 47, 47, 
50, 53, 59), smoker_former_or_never_yn_agesurvey = c(58, 47, 
47, 50, 53, 59), NOT_RiskyHeavyDrink_yn_agesurvey = c(59, 48, 
47, 50, 53, 59), Not_obese_yn_agesurvey = c(58, 47, 47, 50, 53, 
59), HEALTHY_Diet_yn_agesurvey = c(58, 47, 47, 50, 53, 59), SURVEY_MIN = c(58, 
47, 47, 50, 53, 59)), row.names = c(NA, 6L), class = "data.frame")

These are the codes I tried:

df[lapply(df, function(x) ifelse(x != df$SURVEY_MIN, TRUE, FALSE))] <- NA

Also tried:

df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey",
                "Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] [df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey",
                 "Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] != df$SURVEY_MIN] <- NA

CodePudding user response：

Writing for loops is very bad practise in R ! (99% of the time)

df[(df != df$SURVEY_MIN)]<-NA

will do the trick.

CodePudding user response：

I hope I understand your question correctly, but this should do the trick:

for (i in 1:nrow(df)) {
  for (j in 1:(ncol(df)-1)) { 
    if (df[i,j] != df$SURVEY_MIN[i]) {
      df[i,j] <- NA
    }
  }
}

CodePudding user response：

You need to first create a data frame of 0 values which wil be filled based your condition (conditional statement if you translate to R). This requires a loop where each cell should be compared to the corresponding value in column SURVEY_MIN. So first I create a data frame called df_result excluding the column you want to compare (SURVEY_MIN), but later you can join it:

df_result <- data.frame(PhysicalActivity_yn_agesurvey = numeric(nrow(df)), 
                    smoker_former_or_never_yn_agesurvey = numeric(nrow(df)), 
                    NOT_RiskyHeavyDrink_yn_agesurvey = numeric(nrow(df)), 
                    Not_obese_yn_agesurvey = numeric(nrow(df)), 
                    HEALTHY_Diet_yn_agesurvey = numeric(nrow(df)))

Then we need to define a function fill the cells based on your question, apply the function to each cell from df and save the result in the df_result:

for (i in 1:nrow(df)) {
 for (j in 1:5) {
  colname <- names(df[j])
  if (df[i, j] == df$SURVEY_MIN[i]) {
   df_result[i, j] <- df[i, j]
  } else {
    df_result[i, j] <- NA
  }
 }
}

This tells me there are only two values that are different from the corresponding row value in SURVEY_MIN, and they are from NOT_RiskyHeavyDrink_yn_agesurvey:

df_result
PhysicalActivity_yn_agesurvey smoker_former_or_never_yn_agesurvey NOT_RiskyHeavyDrink_yn_agesurvey Not_obese_yn_agesurvey HEALTHY_Diet_yn_agesurvey
58                                  58                               NA                     58                        58
47                                  47                               NA                     47                        47
47                                  47                               47                     47                        47
50                                  50                               50                     50                        50
53                                  53                               53                     53                        53
59                                  59                               59                     59                        59