Home > Software engineering >  How to replace zeros and ones instead of words into a data frame that has quantitative and qualitati
How to replace zeros and ones instead of words into a data frame that has quantitative and qualitati

Time:09-24

I have 6 columns in my data frame the column names are exam 1, exam 2, exam 3, result exam 1, result exam 2, result exam 3 respectively the first three columns have numbers and NAs and the last three columns have Pass and Fail and NAs. I want to replace all the NAs with 0s and I want to replace instead of all the Pass words with 1s and instead of all the Fail words with 0s. So I want to replace the Fail with zeros and the NAs also with zeros.

I have used multiple approaches in R but I can't make it work.

df[df == 'NA'] <- 0 , df[df == NA] <- 0 

df[df$"result exam 1" == "Pass",]$"result exam 1" = 1
df[df$"result exam 1" == "Fail",]$"result exam 1" = 0 

None of these codes are working.

Would someone be able to please help with this problem?

Thank you

CodePudding user response:

You really need to get a better grasp of basic R syntax:

  1. You are putting the subset operator [ in the wrong place (you need to subset your vector, not the data frame)
  2. You are then using the $ operator on the result of the previous operation, and that throws an error (that you should have posted) because $ cannot be used on vectors.
  3. You are testing a value for missingness: x == NA has no sense: how can you check a non-available value? You must use the is.na() function.

Here is what you should have done (with just a bit of help from basic R tutorials):

df$exam.results.1[df$exam.results.1 == 1] <- "Pass"
df$exam.results.1[df$exam.results.1 == 0] <- "Fail"
df$exam[is.na(df$exam)] <- 0

CodePudding user response:

Assuming the name of the data frame is dt. Make a vector for the names of result columns

result <- c("result exam 1", "result exam 2", "result exam 3")
dt <- dt %>% mutate_at(result, ~ifelse(.x == "Pass", 1, 0)  )

This will replace all "pass" with 1 and rest of fail and NA with 0. For NA s in other columns

dt[c("exam 1", "exam 2", "exam 3")][is.na(dt[c("exam 1", "exam 2", "exam 3")])] <- 0

CodePudding user response:

To do this for multiple columns in one go you can use the following -

cols <- grep('result', names(df))
df[cols][is.na(df[cols])] <- 0
df[cols][df[cols] == 'Fail'] <- 0
df[cols][df[cols] == 'Pass'] <- 1
df
  •  Tags:  
  • r
  • Related