Home > Back-end >  R - Remove all rows that do not match certain criteria
R - Remove all rows that do not match certain criteria


Below is what my data looks like. My goal is to find all rows that have matching x, z, and m, but different y. And I need to keep both, or all, of the ones that have the differing y.

How can I do that?

x <- c("A","B","C","B","D","E","E")
y <- c(0,10,10,10,10,12,0)
z <- c("A1","B1","B1","B1","B1","C1","C1")
m <- c(rep("2017-12-28",7))

df <- data.frame(x,y,z,m)

# Below is the goal

CodePudding user response:

base R

df[ave(df$y, df[,c("x","z","m")], FUN = function(y) length(unique(y))) > 1,]
#   x  y  z          m
# 6 E 12 C1 2017-12-28
# 7 E  0 C1 2017-12-28

Note that due to the way ave coerces its return value to the same class as the first argument, if y is something other than numeric or integer, this may not work perfectly as desired.

Also, for code-golf or readability, you can replace the FUN= argument with one of FUN=dplyr::n_distinct or data.table::uniqueN, if you prefer an ave solution and yet have one of those packages loaded.


df %>%
  group_by(x, z, m) %>%
  filter(n_distinct(y) > 1) %>%
# # A tibble: 2 x 4
#   x         y z     m         
#   <chr> <dbl> <chr> <chr>     
# 1 E        12 C1    2017-12-28
# 2 E         0 C1    2017-12-28


as.data.table(df)[, .SD[uniqueN(y) > 1,], by = .(x, z, m)]
#         x      z          m     y
#    <char> <char>     <char> <num>
# 1:      E     C1 2017-12-28    12
# 2:      E     C1 2017-12-28     0


df <- structure(list(x = c("A", "B", "C", "B", "D", "E", "E"), y = c(0, 10, 10, 10, 10, 12, 0), z = c("A1", "B1", "B1", "B1", "B1", "C1", "C1"), m = c("2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28")), class = "data.frame", row.names = c(NA, -7L))
  •  Tags:  
  • r
  • Related