head(df)
county state_abr population unemp health_ins poverty SNAP no_comp no_internet home_broad broad_num broad_avail broad_cost price_bbn
1 Autauga AL 55869 2.7 7.1 15.4 12.7 NA 20.9 78.9 0 0.0 67.32586 35.00
2 Baldwin AL 223234 2.7 10.2 10.6 7.5 NA 21.3 78.1 0 0.0 67.32586 35.00
3 Barbour AL 24686 3.8 11.2 28.9 27.4 NA 38.9 60.4 4 99.2 74.99000 35.00
4 Bibb AL 22394 3.1 7.9 14.0 12.4 23.7 33.8 66.1 0 0.0 67.32586 35.00
5 Blount AL 57826 2.7 11.0 NA 9.5 21.3 NA 68.5 0 0.0 67.32586 35.00
6 Bullock AL 10101 3.6 10.8 31.4 25.9 27.1 40.1 58.9 1 40.1 57.99000 71.95
Sublette WY 9831 4.4 13.4 8.4 2.2 5.4 17.5 81.7 3 19.5 59.65 Sweetwater WY 42343 3.9 12.0 12.0 5.8 7.7 16.1 82.4 5 95.1 63.30
Teton WY 23464 2.7 10.0 7.1 2.1 4.2 13.6 85.9 6 96.0 69.99
Uinta WY 20226 3.9 12.2 12.5 7.1 6.1 11.5 88.2 5 73.9 63.30
Washakie WY 7805 3.9 15.4 12.4 4.9 12.1 21.5 78.3 5 86.1 64.36
Weston WY 6927 2.9 13.3 17.4 4.7 13.8 26.1 73.3 2 52.0 66.67
I have this dataframe in R and I want to replace NA values with numeric values. The easy way is to get the mean of the column and replace it with the NA, but I want to be more precise.
Because my data frame is split up into states, in this case I'm just using a subset of WY and AL, I want to calculate the mean for that state and apply it accordingly to the NA value.
So, for example there's an 'NA' for no_comp on line 1 with state_abr AL. If I took the mean for no_comp, it would also include the mean of WY, which I don't want. I want to just calculate the mean for no_comp with state_abr 'AL' and apply it to the corresponding NA value.
CodePudding user response:
We may group by 'state_abr', loop over the numeric columns in mutate
with across
and replace the NA
with mean
value using na.aggregate
from zoo
. By default na.aggregate
uses FUN = mean
library(zoo)
library(dplyr)
df1 <- df1 %>%
group_by(state_abr) %>%
mutate(across(where(is.numeric), na.aggregate)) %>%
ungroup
Or if we want to not use additional package
df1 <- df1 %>%
group_by(state_abr) %>%
mutate(across(where(is.numeric), ~ replace(.x, is.na(.x),
mean(.x, na.rm = TRUE))))
CodePudding user response:
Or you can use this base R one-liner:
df[which(is.na(df$no_comp) == TRUE),]$no_comp <- ave(df$no_comp,df$state_abr, FUN = function(x) mean(x,na.rm = TRUE))[which(is.na(df$no_comp) == TRUE)]
#Data:
county <-c("Autauga","Baldwin","Barbour","Bibb","Blount","Bullock","Sublette","Teton","Uinta","Washakie","Weston")
state_abr <- c(rep("AL",6),rep("WY",5))
population <- c(55869,223234,24686,22394,57826,10101,9831,23464,20226,7805,6927)
unemp <- c(2.7,2.7,3.8,3.1,2.7,3.6,4.4,2.7,3.9,3.9,2.9)
no_comp <- c(NA,NA,NA,23.7,21.3,27.1,5.4,4.2,6.1,12.1,13.8)
df <- data.frame(county,state_abr,population,unemp,no_comp)