I would like to go over each row, and if the difference between the current row and the previous row is more than 7, then I would like to assign the current row a unique id, otherwise the current and the previous row would have the same id. Please note that the zipcode is not relevant for the unique id. Here's my data:
z<- structure(list(zipcode = c(96717L, 96730L, 96825L, 96826L, 96720L,
96756L, 96740L, 96819L, 96734L, 96740L, 96714L, 96714L, 96703L,
90017L, 96796L, 96714L, 96714L, 96761L, 96712L, 96712L), date = structure(c(8809,
8809, 8847, 8848, 8989, 9041, 9161, 9188, 9201, 9293, 9403, 9437,
9437, 9437, 9437, 9443, 9444, 9457, 9457, 9483), class = "Date")), row.names = c(NA,
-20L), class = c("data.table", "data.frame"))
Here's my desired output:
y<- structure(list(zipcode = c(96717, 96730, 96825, 96826, 96720,
96756, 96740, 96819, 96734, 96740, 96714, 96714, 96703, 90017,
96796, 96714, 96714, 96761, 96712, 96712), date = structure(c(761097600,
761097600, 764380800, 764467200, 776649600, 781142400, 791510400,
793843200, 794966400, 802915200, 812419200, 815356800, 815356800,
815356800, 815356800, 815875200, 815961600, 817084800, 817084800,
819331200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), difference_date = c("NA",
"0", "38", "1", "141", "52", "120", "27", "13", "92", "110",
"34", "0", "0", "0", "6", "1", "13", "0", "26"), id = c(1, 1,
2, 2, 3, 4, 5, 6, 8, 9, 10, 11, 11, 11, 11, 11, 11, 12, 12, 13
)), class = c("data.table", "data.frame"))
Here's what I did:
library(data.table)
z[,date:=as.Date(date)][,diff_days:=c(NA,diff.Date(date,lag=1L,differences=1L))][, event_id :=1:.N,.(diff_days<=7)]
Here's what I got:
CodePudding user response:
First I converted your example data to data.table format:
library(data.table)
z <- as.data.table(z)
Then calculate the difference date:
z[, date_diff := as.numeric(difftime(date, lag(date), units = "days"))]
And finally, add the id :
z[, event_id := cumsum(c(TRUE, diff(date) > 7))]