Home > Software design >  Conditionally filling missing data based on other variables in R
Conditionally filling missing data based on other variables in R

Time:06-29

enter image description here

sorry for adding the screenshot, I download data from https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction

Can someone inform me about the way to fill those NA values that the occupation column has? I create a new variable to determine whether an applicant is working or not and I want to fill NA values as zero if the same observation is zero in is_working column and left the others NA.

df <- data.frame (occupation  = c("NA","NA","Drivers","Accountants","NA","Drivers","Laborers","Cleaning staff","Drivers","Drivers"),
                  is_working = c("1","0","1","1","1","1","1","1","1","1")
                  )

In short, if the value is zero in is_working column, I want to make the NA value in occupation zero. If the value is 1 in is_working, I want to assign "other" to the NA value in occupation.

CodePudding user response:

library(dplyr)
df %>%
  mutate(
    # change string "NA" to missing values NA
    occupation = ifelse(occupation == "NA", NA, occupation),
    # replace NAs where is_working is 0 with 0
    occupation = ifelse(is.na(occupation) & is_working == 0, "0", occupation)
  )
#        occupation is_working
# 1            <NA>          1
# 2               0          0
# 3         Drivers          1
# 4     Accountants          1
# 5            <NA>          1
# 6         Drivers          1
# 7        Laborers          1
# 8  Cleaning staff          1
# 9         Drivers          1
# 10        Drivers          1
  • Related