Home > Back-end >  R - Remove all rows that do not match certain criteria
R - Remove all rows that do not match certain criteria

Time:11-03

Below is what my data looks like. My goal is to find all rows that have matching x, z, and m, but different y. And I need to keep both, or all, of the ones that have the differing y.

How can I do that?

x <- c("A","B","C","B","D","E","E")
y <- c(0,10,10,10,10,12,0)
z <- c("A1","B1","B1","B1","B1","C1","C1")
m <- c(rep("2017-12-28",7))

df <- data.frame(x,y,z,m)
df

# Below is the goal
df[6:7,]

CodePudding user response:

base R

df[ave(df$y, df[,c("x","z","m")], FUN = function(y) length(unique(y))) > 1,]
#   x  y  z          m
# 6 E 12 C1 2017-12-28
# 7 E  0 C1 2017-12-28

Note that due to the way ave coerces its return value to the same class as the first argument, if y is something other than numeric or integer, this may not work perfectly as desired.

Also, for code-golf or readability, you can replace the FUN= argument with one of FUN=dplyr::n_distinct or data.table::uniqueN, if you prefer an ave solution and yet have one of those packages loaded.

dplyr

library(dplyr)
df %>%
  group_by(x, z, m) %>%
  filter(n_distinct(y) > 1) %>%
  ungroup()
# # A tibble: 2 x 4
#   x         y z     m         
#   <chr> <dbl> <chr> <chr>     
# 1 E        12 C1    2017-12-28
# 2 E         0 C1    2017-12-28

data.table

library(data.table)
as.data.table(df)[, .SD[uniqueN(y) > 1,], by = .(x, z, m)]
#         x      z          m     y
#    <char> <char>     <char> <num>
# 1:      E     C1 2017-12-28    12
# 2:      E     C1 2017-12-28     0

Data

df <- structure(list(x = c("A", "B", "C", "B", "D", "E", "E"), y = c(0, 10, 10, 10, 10, 12, 0), z = c("A1", "B1", "B1", "B1", "B1", "C1", "C1"), m = c("2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28")), class = "data.frame", row.names = c(NA, -7L))
  •  Tags:  
  • r
  • Related