Discard 200 random healthy instances. How do I implement this in Rstudio?
This is the data frame:
https://www.kaggle.com/code/jamaltariqcheema/model-performance-and-comparison/data
I tried this but I got an error.
kidney_disease$hd <- ifelse(test=kidney_disease$hd == 0, yes="Healthy", no="Unhealthy")
CodePudding user response:
Maybe the following solves the question's problem.
Choose row numbers at random with sample
, assign a default value "Healthy"
to the new column hd
and assign the value "Unhealthy"
to the randomly chosen rows.
set.seed(2022) # Make results reproducible
i <- sample(nrow(kidney_disease), 200)
kidney_disease$hd <- "Healthy"
kidney_disease$hd[i] <- "Unhealthy"