Home > OS >  R filtering rows where multiple column values do not match
R filtering rows where multiple column values do not match

Time:12-15

In R, I am trying to filter rows of a dataframe where many columns are not equal.

Example: (my actual dataframe has many more columns)

data.frame(id = c(1, 2, 3, 4, 5), xx = c("no", "no", "yes", "yes", "no"), x1 = c("m", "f", "f", "f", "m"), x2 = c("m", "f", "f", "m", "m"), x3 = c("m", "u", "f", "m", NA), x4 = c("m", "f", "m", "m", "f"), x5 = c(NA, "f", "m", "u", "f"))

id  xx x1 x2   x3 x4   x5
1  1  no  m  m    m  m <NA>
2  2  no  f  f    u  f    f
3  3 yes  f  f    f  m    m
4  4 yes  f  m    m  m    u
5  5  no  m  m <NA>  f    f

#what I want
id  xx x1  x2 x3 x4 x5
3  yes  f  f  f  m  m
4  yes  f  m  m  m  u
5  no  m  m <NA> f  f

I want to compare x1:x5 (not id or xx). If all the columns are equal (NA values don't count), filter out the row (row 1 in the example). If all the columns are equal but the combination is m and u OR f and u, filter out these rows too (row 2 in the example). I only want to keep rows where there are both m and f, in any combination. I would prefer a tidyverse solution, but anything would work.

Thanks!

CodePudding user response:

It's bit messy but works.

df %>%
  rowwise %>%
  mutate(a = paste0(unique(unlist(list(across(x1:x5, ~.x))) %>% na.omit) %>% sort, collapse = "")) %>%
  filter(nchar(a) != 1, !a %in% c("fu", "mu")) %>%
  select(-a)

     id xx    x1    x2    x3    x4    x5   
  <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1     3 yes   f     f     f     m     m    
2     4 yes   f     m     m     m     u    
3     5 no    m     m     NA    f     f  

CodePudding user response:

I think you can get the rowSums("x_n" columns) == "f" > 0 AND rowSums("x_n"'s) >0 as a logical index:

xcols <- dfrm[grepl("x[1-5]", names(dfrm))]

dfrm[ rowSums(xcols =="f", na.rm=TRUE)>0 & rowSums(xcols =="m", na.rm=TRUE) >0  , ]
  id  xx x1 x2   x3 x4 x5
3  3 yes  f  f    f  m  m
4  4 yes  f  m    m  m  u
5  5  no  m  m <NA>  f  f
  • Related