I am trying to impute missing values in my dataframe with the non-parametric method available in missForest
.
My data (OneDrive link) consists of one categorical variable and five continuous variables.
head(data)
phylo sv1 sv2 sv3 sv4 sv5
1 Phaon_camerunensis 6.03803 NA 5121.257 NA 70
2 Umma_longistigma 6.03803 NA 5121.257 NA 53
3 Umma_longistigma 6.03803 NA 5121.257 NA 64
4 Umma_longistigma 6.03803 NA 5121.257 NA 63
5 Sapho_ciliata 6.03803 NA 5121.257 NA 63
6 Sapho_gloriosa 6.03803 NA 5121.257 NA 63
I was successful at first using missForest()
imp<- missForest(data[2:6])
However, instead of aggregating over the whole data matrix (or vector? idk exactly) I would like to impute missing values by phylo
.
I tried data[2:6] %>% group_by(phylo) %>%
and sapply(split(data[2:6], data$phylo)) %>%
but no success.
Any guess on how to deal with it?
CodePudding user response:
If you want to run missForest
for each group, you can use group_map
:
imp <- df %>% group_by(phylo) %>% group_map(~ missForest(.))
To get only the first item from the result:
imp2 <- t(sapply(imp, "[[", 1))