Home > Software engineering >  Return most common value in column by group, replace null in that column with that value
Return most common value in column by group, replace null in that column with that value

Time:05-17

I'd like to replace the na values in my df column with the most common value by group

#Ex:

df <- data.frame(Home_Abbr = c('PHI', 'PHI', 'DAL', 'PHI'),
                 Home_City = c('Philadelphia', 'Philadelphia', 'Dallas', NULL))

#Desired Result

Home_Abbr   Home_City

PHI         Philadelphia
PHI         Philadelphia
DAL         Dallas
PHI         Philadelphia

Here is what I've tried so far:

df <- df %>%
  group_by(Home_Abbr) %>%
  mutate(Home_City = names(which.max(table(Home_City))))

But when I run this I get a 'Can't combine NULL and non NULL results' Error.

CodePudding user response:

We can use Mode function

 Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

and then replace

library(dplyr)
df %>% 
  group_by(Home_Abbr) %>%
  mutate(Home_City = replace(Home_City, is.na(Home_City), 
      Mode(Home_City))) %>%
  ungroup

-output

# A tibble: 4 × 2
  Home_Abbr Home_City   
  <chr>     <chr>       
1 PHI       Philadelphia
2 PHI       Philadelphia
3 DAL       Dallas      
4 PHI       Philadelphia

data

df <- structure(list(Home_Abbr = c("PHI", "PHI", "DAL", "PHI"), Home_City = c("Philadelphia", 
"Philadelphia", "Dallas", NA)), class = "data.frame", row.names = c(NA, 
-4L))
  • Related