I am working with a data frame (imported from an csv file) which has NA values in its numerical columns. And I want to replace all of the NA values to median values and update the data frame. For example in this picture, the column "Speed_of_maximum_wind_gust(km/h)", which is a numerical column has an NA value (red circled)
Then I went ahead and add the following code:
test2 %>% mutate_if(is.numeric, function(x) ifelse(is.na(x), median(x, na.rm = T), x))
The above code certainly works, however it only gave an output:
And the mentioned NA value remains the same:
I don't know what I did wrong here but I want the result to be updated in the data frame so that I can export it as new .csv file for later. Would anyone be able to help me out? Thanks a lot!
CodePudding user response:
Based on your screenshots, it looks like you're just going back to the RStudio viewer window to look at the data frame again. If so, the issue is this:
When you write test2 %>% mutate_if(...)
, you're telling R to change something in test2
and return the result (roughly meaning, in this context, to just print the result and show it to you). What you're not telling it to do is to save that result anywhere.
You would want something like test2 <- test2 %>% mutate_if(...)
to overwrite the existing test2
data frame in your global environment, or something like test3 <- test2 %>% mutate_if(...)
to give it a new name and store the modified thing as a separate object while retaining the old one.
Lastly, I would echo Andrea M's concern that you might not want to do this at all. Imputing missing data with averages is, on a good day, risky.