average timespread between spells-CodePudding

I have a data frame as follows and would like to create a time spread variable based on max spread of events.

 A<- c('1244', '1244', '1244', '1245', '1245', '1245', '1245', '1245', '1245', '1245')
 sequence<- c(1,1,0, 1,1,0,0,1,1,1)
   # 1= lived locally and 0 lived internationally 
date<- c('19/Oct/12', '19/Oct/12', '20/Oct/12', '19/Oct/11', '19/Oct/11', '22/Nov/12', 
  '24/Nov/12', '29/Nov/12','2/Dec/12', '29/Dec/12')

 df<- data.frame(A,sequence, date)

I would like to calculate the average time spread of these people in locations both locally and internationally. For example if we see the person 1244 she/he has sequence 110 (i.e., they lived in two spells in locally and 1 spell internationally and if we want to calculate the time spread the avg_local_timespread= total time in all spells/count of events in spells which translates to 0 day/2(which is difference between the last date- start date (so, 0 days= 19/Oct/12-19/Oct/12) and avg_international_timespread= 0/1=0 (20/Oct/12-20/Oct/12)

And for person 1245 it is 1100111 so they have lived locally in two spells, the first spell is 11 and second one is 111. As I want to get the average time spread locally in this case it is the time difference between start date and final date for first spell(11) last spell(111)/total spells (5). Hence avg_local_timespread= 6 days (30 0)/5 and avg_international_timespread= 1 day (2/2)('24/Nov/12'-'22/Nov/12')

I am expecting an output as follows:

 A  avg_local_timespread avg_international_timespread total_local_timespread 
1244         0                     0                         0
1245         6                     1                         30
                                                        
         total_international_timespread    

1244              0
1245              2

CodePudding user response：

I cannot replicate your desired output becasue i do not understand the calculation you provide in your question. But I believe this will get you started.

library(data.table)
library(lubridate)
# convert to data.table format
setDT(df)
# convert dates to date-format
df[, date := lubridate::dmy(date, locale = "English_United States")]
# create sequence-groups by A
df[, seq_group := rleid(sequence == 1), by = .(A)][]
# summarise
ans <- df[, .(start = min(date), end = max(date)), by = .(A, seq_group, sequence)]
# add diration of spell
ans[, duration := end - start]
#       A seq_group sequence      start        end duration
# 1: 1244         1        1 2012-10-19 2012-10-19   0 days
# 2: 1244         2        0 2012-10-20 2012-10-20   0 days
# 3: 1245         1        1 2011-10-19 2011-10-19   0 days
# 4: 1245         2        0 2012-11-22 2012-11-24   2 days
# 5: 1245         3        1 2012-11-29 2012-12-29  30 days