Home > Software design >  calculating count for a column of dates
calculating count for a column of dates

Time:06-29

I want to calculate the mean and standard deviation for the number of dates (or visits) that people have. Sample data are:

id   date
1    2015-02-23
1    2015-04-24
2    2018-05-23
2    2022-12-05
2    2022-12-06
3    2021-05-21

ID1 has 2 visits (evidenced by 2 dates), ID2 has 3 visits, and ID3 has 1 visit, so the mean would be (2 3 1)/3 =2.

Does anyone know how to calculate the mean and standard deviation? I tried to do this using the summarize function, but I can't get it to work.

CodePudding user response:

You can use rle function with your id

  • data
df < - structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), date = c("2015-02-23", 
"2015-04-24", "2018-05-23", "2022-12-05", "2022-12-06", "2021-05-21"
)), class = "data.frame", row.names = c(NA, -6L))
mean(rle(df$id)$lengths)
  • Related