Home > OS >  How to find duplicates in two columns only while another column is different
How to find duplicates in two columns only while another column is different

Time:02-23

I want to find where two (or more) rows have the same x,y (location) but a different ID.

In the table below I would like to know about the last two rows only.

x y id
1 2 1
1 2 1
1 3 4
2 3 1
2 3 2
# example data
x <- read.table(text = "x   y   id
1   2   1
1   2   1
1   3   4
2   3   1
2   3   2", header = TRUE)

CodePudding user response:

Another way, using dplyr:

x %>% 
  group_by(x, y) %>% 
  filter(n_distinct(id) > 1)

# A tibble: 2 x 3
# Groups:   x, y [1]
      x     y    id
  <int> <int> <int>
1     2     3     1
2     2     3     2

CodePudding user response:

Group by two columns, count the unique values on 3rd column, subset if it is more than 1:

x[ ave(x[, "id"], x[, c("x", "y") ], FUN = function(i) length(unique(i))) > 1, ]
#   x y id
# 4 2 3  1
# 5 2 3  2

CodePudding user response:

Using data.table

library(data.table)
i1 <- setDT(x)[, .I[uniqueN(id) > 1], .(x, y)]$V1
x[i1]
       x     y    id
   <int> <int> <int>
1:     2     3     1
2:     2     3     2
  •  Tags:  
  • r
  • Related