I have a dataframe df. I want to replace any column values where df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey", "Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] != df$SURVEY_MIN]
is true
with NA. How do I do that in R?
df <- structure(list(PhysicalActivity_yn_agesurvey = c(58, 47, 47,
50, 53, 59), smoker_former_or_never_yn_agesurvey = c(58, 47,
47, 50, 53, 59), NOT_RiskyHeavyDrink_yn_agesurvey = c(59, 48,
47, 50, 53, 59), Not_obese_yn_agesurvey = c(58, 47, 47, 50, 53,
59), HEALTHY_Diet_yn_agesurvey = c(58, 47, 47, 50, 53, 59), SURVEY_MIN = c(58,
47, 47, 50, 53, 59)), row.names = c(NA, 6L), class = "data.frame")
These are the codes I tried:
df[lapply(df, function(x) ifelse(x != df$SURVEY_MIN, TRUE, FALSE))] <- NA
Also tried:
df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey",
"Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] [df[c("PhysicalActivity_yn_agesurvey", "smoker_former_or_never_yn_agesurvey", "NOT_RiskyHeavyDrink_yn_agesurvey",
"Not_obese_yn_agesurvey", "HEALTHY_Diet_yn_agesurvey")] != df$SURVEY_MIN] <- NA
CodePudding user response:
Writing for loops is very bad practise in R ! (99% of the time)
df[(df != df$SURVEY_MIN)]<-NA
will do the trick.
CodePudding user response:
I hope I understand your question correctly, but this should do the trick:
for (i in 1:nrow(df)) {
for (j in 1:(ncol(df)-1)) {
if (df[i,j] != df$SURVEY_MIN[i]) {
df[i,j] <- NA
}
}
}
CodePudding user response:
You need to first create a data frame of 0 values which wil be filled based your condition (conditional statement if you translate to R). This requires a loop where each cell should be compared to the corresponding value in column SURVEY_MIN. So first I create a data frame called df_result excluding the column you want to compare (SURVEY_MIN), but later you can join it:
df_result <- data.frame(PhysicalActivity_yn_agesurvey = numeric(nrow(df)),
smoker_former_or_never_yn_agesurvey = numeric(nrow(df)),
NOT_RiskyHeavyDrink_yn_agesurvey = numeric(nrow(df)),
Not_obese_yn_agesurvey = numeric(nrow(df)),
HEALTHY_Diet_yn_agesurvey = numeric(nrow(df)))
Then we need to define a function fill the cells based on your question, apply the function to each cell from df and save the result in the df_result:
for (i in 1:nrow(df)) {
for (j in 1:5) {
colname <- names(df[j])
if (df[i, j] == df$SURVEY_MIN[i]) {
df_result[i, j] <- df[i, j]
} else {
df_result[i, j] <- NA
}
}
}
This tells me there are only two values that are different from the corresponding row value in SURVEY_MIN, and they are from NOT_RiskyHeavyDrink_yn_agesurvey:
df_result
PhysicalActivity_yn_agesurvey smoker_former_or_never_yn_agesurvey NOT_RiskyHeavyDrink_yn_agesurvey Not_obese_yn_agesurvey HEALTHY_Diet_yn_agesurvey
58 58 NA 58 58
47 47 NA 47 47
47 47 47 47 47
50 50 50 50 50
53 53 53 53 53
59 59 59 59 59