Below is what my data looks like. My goal is to find all rows that have matching x, z, and m, but different y. And I need to keep both, or all, of the ones that have the differing y.
How can I do that?
x <- c("A","B","C","B","D","E","E")
y <- c(0,10,10,10,10,12,0)
z <- c("A1","B1","B1","B1","B1","C1","C1")
m <- c(rep("2017-12-28",7))
df <- data.frame(x,y,z,m)
df
# Below is the goal
df[6:7,]
CodePudding user response:
base R
df[ave(df$y, df[,c("x","z","m")], FUN = function(y) length(unique(y))) > 1,]
# x y z m
# 6 E 12 C1 2017-12-28
# 7 E 0 C1 2017-12-28
Note that due to the way ave
coerces its return value to the same class
as the first argument, if y
is something other than numeric
or integer
, this may not work perfectly as desired.
Also, for code-golf or readability, you can replace the FUN=
argument with one of FUN=dplyr::n_distinct
or data.table::uniqueN
, if you prefer an ave
solution and yet have one of those packages loaded.
dplyr
library(dplyr)
df %>%
group_by(x, z, m) %>%
filter(n_distinct(y) > 1) %>%
ungroup()
# # A tibble: 2 x 4
# x y z m
# <chr> <dbl> <chr> <chr>
# 1 E 12 C1 2017-12-28
# 2 E 0 C1 2017-12-28
data.table
library(data.table)
as.data.table(df)[, .SD[uniqueN(y) > 1,], by = .(x, z, m)]
# x z m y
# <char> <char> <char> <num>
# 1: E C1 2017-12-28 12
# 2: E C1 2017-12-28 0
Data
df <- structure(list(x = c("A", "B", "C", "B", "D", "E", "E"), y = c(0, 10, 10, 10, 10, 12, 0), z = c("A1", "B1", "B1", "B1", "B1", "C1", "C1"), m = c("2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28")), class = "data.frame", row.names = c(NA, -7L))