In R, I am trying to filter rows of a dataframe where many columns are not equal.
Example: (my actual dataframe has many more columns)
data.frame(id = c(1, 2, 3, 4, 5), xx = c("no", "no", "yes", "yes", "no"), x1 = c("m", "f", "f", "f", "m"), x2 = c("m", "f", "f", "m", "m"), x3 = c("m", "u", "f", "m", NA), x4 = c("m", "f", "m", "m", "f"), x5 = c(NA, "f", "m", "u", "f"))
id xx x1 x2 x3 x4 x5
1 1 no m m m m <NA>
2 2 no f f u f f
3 3 yes f f f m m
4 4 yes f m m m u
5 5 no m m <NA> f f
#what I want
id xx x1 x2 x3 x4 x5
3 yes f f f m m
4 yes f m m m u
5 no m m <NA> f f
I want to compare x1:x5 (not id or xx). If all the columns are equal (NA values don't count), filter out the row (row 1 in the example). If all the columns are equal but the combination is m and u OR f and u, filter out these rows too (row 2 in the example). I only want to keep rows where there are both m and f, in any combination. I would prefer a tidyverse solution, but anything would work.
Thanks!
CodePudding user response:
It's bit messy but works.
df %>%
rowwise %>%
mutate(a = paste0(unique(unlist(list(across(x1:x5, ~.x))) %>% na.omit) %>% sort, collapse = "")) %>%
filter(nchar(a) != 1, !a %in% c("fu", "mu")) %>%
select(-a)
id xx x1 x2 x3 x4 x5
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 3 yes f f f m m
2 4 yes f m m m u
3 5 no m m NA f f
CodePudding user response:
I think you can get the rowSums("x_n" columns) == "f" > 0 AND rowSums("x_n"'s) >0 as a logical index:
xcols <- dfrm[grepl("x[1-5]", names(dfrm))]
dfrm[ rowSums(xcols =="f", na.rm=TRUE)>0 & rowSums(xcols =="m", na.rm=TRUE) >0 , ]
id xx x1 x2 x3 x4 x5
3 3 yes f f f m m
4 4 yes f m m m u
5 5 no m m <NA> f f