Home > Blockchain >  Creating unique id if the difference between dates is more than 7 in R
Creating unique id if the difference between dates is more than 7 in R

Time:02-02

I would like to go over each row, and if the difference between the current row and the previous row is more than 7, then I would like to assign the current row a unique id, otherwise the current and the previous row would have the same id. Please note that the zipcode is not relevant for the unique id. Here's my data:

z<- structure(list(zipcode = c(96717L, 96730L, 96825L, 96826L, 96720L, 
96756L, 96740L, 96819L, 96734L, 96740L, 96714L, 96714L, 96703L, 
90017L, 96796L, 96714L, 96714L, 96761L, 96712L, 96712L), date = structure(c(8809, 
8809, 8847, 8848, 8989, 9041, 9161, 9188, 9201, 9293, 9403, 9437, 
9437, 9437, 9437, 9443, 9444, 9457, 9457, 9483), class = "Date")), row.names = c(NA, 
-20L), class = c("data.table", "data.frame"))

enter image description here

Here's my desired output:

y<- structure(list(zipcode = c(96717, 96730, 96825, 96826, 96720, 
96756, 96740, 96819, 96734, 96740, 96714, 96714, 96703, 90017, 
96796, 96714, 96714, 96761, 96712, 96712), date = structure(c(761097600, 
761097600, 764380800, 764467200, 776649600, 781142400, 791510400, 
793843200, 794966400, 802915200, 812419200, 815356800, 815356800, 
815356800, 815356800, 815875200, 815961600, 817084800, 817084800, 
819331200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), difference_date = c("NA", 
"0", "38", "1", "141", "52", "120", "27", "13", "92", "110", 
"34", "0", "0", "0", "6", "1", "13", "0", "26"), id = c(1, 1, 
2, 2, 3, 4, 5, 6, 8, 9, 10, 11, 11, 11, 11, 11, 11, 12, 12, 13
)), class = c("data.table", "data.frame"))

enter image description here

Here's what I did:

library(data.table)

z[,date:=as.Date(date)][,diff_days:=c(NA,diff.Date(date,lag=1L,differences=1L))][, event_id :=1:.N,.(diff_days<=7)]

Here's what I got:

enter image description here

CodePudding user response:

First I converted your example data to data.table format:

library(data.table)
z <- as.data.table(z)

Then calculate the difference date:

z[, date_diff := as.numeric(difftime(date, lag(date), units = "days"))]

And finally, add the id :

z[, event_id := cumsum(c(TRUE, diff(date) > 7))]
  • Related