Home > front end >  Find unique IDs based on matching dates, data.table r
Find unique IDs based on matching dates, data.table r

Time:12-01

I have some data, like so

library(data.table)

data1 <- data.table(
  id = c(seq(1:5)),
  date = c(as.Date("2021-05-05"),as.Date("2021-05-04"),as.Date("2021-05-03"),as.Date("2021-05-02"),as.Date("2021-05-01")),
  var = c(rep("var1",5))
)

data2 <- data.table(
  id = c(seq(1:5)),
  date = c(as.Date("2021-05-05"),as.Date("2020-05-04"),as.Date("2020-05-03"),as.Date("2020-05-02"),as.Date("2020-05-01")),
  var = c(rep("var2",5))
)

data3 <- data.table(
  id = c(seq(1:5)),
  date = c(as.Date("2021-05-05"),as.Date("2020-07-04"),as.Date("2020-07-03"),as.Date("2020-07-02"),as.Date("2020-07-01")),
  var = c(rep("var3",5))
)

data4 <- data.table(
  id = c(seq(1:5)),
  date = c(as.Date("2021-05-05"),as.Date("2020-07-04"),as.Date("2020-07-03"),as.Date("2020-07-02"),as.Date("2020-07-01")),
  var = c(rep("var4",5))
)

data <- rbind(data1,data2,data3,data4)
data <- data[order(id)]

I would like to find and keep those unique IDs (including keeping the dates), where date for var1, var2, and var3 is equal to one another. I dont care about var4. In the above data, only id 1 would match these criteria, so I want to end up with something like this:

data_final <- data.table(
  id = 1,
  date_var1 = as.Date("2021-05-05"),
  date_var2 = as.Date("2021-05-05"),
  date_var3 = as.Date("2021-05-05")
)

CodePudding user response:

Use dcast to reshape to wide format after filtering - grouped by 'id', filter the 'id's having all values in 'date' are same. Then do the dcast on the filtered data to reshape back to wide format

data1 <- data[data[, .I[uniqueN(date[var %in% paste0("var", 1:3)]) == 1],id]$V1
         ][var != 'var4']
out1 <- dcast(data1, id ~ paste0('date_', var), value.var = 'date')

-output

> out1
   id  date_var1  date_var2  date_var3
1:  1 2021-05-05 2021-05-05 2021-05-05
  • Related