Home > Blockchain >  How do I impute values by factor levels using 'missForest'?
How do I impute values by factor levels using 'missForest'?

Time:01-08

I am trying to impute missing values in my dataframe with the non-parametric method available in missForest. My data (OneDrive link) consists of one categorical variable and five continuous variables.

head(data)
               phylo     sv1 sv2      sv3 sv4 sv5
1 Phaon_camerunensis 6.03803  NA 5121.257  NA  70
2   Umma_longistigma 6.03803  NA 5121.257  NA  53
3   Umma_longistigma 6.03803  NA 5121.257  NA  64
4   Umma_longistigma 6.03803  NA 5121.257  NA  63
5      Sapho_ciliata 6.03803  NA 5121.257  NA  63
6     Sapho_gloriosa 6.03803  NA 5121.257  NA  63

I was successful at first using missForest()

imp<- missForest(data[2:6])

However, instead of aggregating over the whole data matrix (or vector? idk exactly) I would like to impute missing values by phylo.

I tried data[2:6] %>% group_by(phylo) %>% and sapply(split(data[2:6], data$phylo)) %>% but no success.

Any guess on how to deal with it?

CodePudding user response:

If you want to run missForest for each group, you can use group_map:

imp <- df %>% group_by(phylo) %>% group_map(~ missForest(.))

To get only the first item from the result:

imp2 <- t(sapply(imp, "[[", 1))
  • Related