imputing missing values in R dataframe-CodePudding

I am trying to impute missing values in my dataset by matching against values in another dataset.

This is my data:

df1 %>% head()
           
   <V1>       <V2>     
1  apple       NA 
2  cheese      NA        
3  butter      NA               
 
df2 %>% head()
           
   <V1>      <V2>     
1  apple     jacks           
2  cheese    whiz      
3  butter    scotch
4  apple     turnover           
5  cheese    sliders      
6  butter    chicken
7  apple     sauce           
8  cheese    doodles      
9  butter    milk

This is what I want df1 to look like:

   <V1>       <V2>     
1  apple      jacks, turnover, sauce
2  cheese     whiz, sliders, doodles        
3  butter     scotch, chicken, milk

This is my code:

df1$V2[is.na(df1$V2)] <- df2$V2[match(df1$V1,df2$V1)][which(is.na(df1$V2))]

This code works fine, however it only pulls the first missing value and ignores the rest.

CodePudding user response：

I don't think you even need to import the df1 in this case can do it all based on df2

df1 <- df2 %>% group_by(`<V1>`) %>% summarise(`<V2>`=paste0(`<V2>`, collapse = ", "))

CodePudding user response：

Another solution just using base R

aggregate(DF2$V2, list(DF2$V1), c, simplify=F)
  Group.1                      x
1   apple jacks, turnover, sauce
2  butter  scotch, chicken, milk
3  cheese whiz, sliders, doodles