I work in a hospital and I'm having a problem. I was asked to report the number of patients who were in the hospital (not the number of admissions) in each month filtered by the year in R. How could I do this? Here is an example data frame:
id <- c("1","2","3","4","5")
hospitalization_date <- c(as.Date("2010-01-01"), as.Date("2011-03-01"),as.Date("2010-04-01"),as.Date("2010-02-01"),as.Date("2011-06-01"))
discharged_date <- c(as.Date("2010-05-11"), as.Date("2011-08-04"),as.Date("2010-06-13"),as.Date("2010-08-02"),as.Date("2011-06-23"))
cid <- c("F29","F33","F71","F29","F09")
gender <- c("F","M","M","M","F")
df <- data.frame(id,hospitalization_date,discharged_date,cid,gender)
I would like the result to be like this, filtering by the year 2010:
monthly number_of_patients
1 jan 1
2 feb 2
3 mar 2
4 apr 3
5 may 3
6 jun 2
7 jul 1
8 aug 1
9 sep 0
10 oct 0
11 nov 0
12 dec 0
notice that the patient id=1 is in the months jan, feb, mar, apr and may, the patient id=3 is in the months apr, may and jun and the patient id=4 is in the months feb, mar, apr, may , Jun, Jul and Aug all from the year 2010.
and so, filtered by the year 2011:
monthly number_of_patients
1 jan 0
2 feb 0
3 mar 1
4 apr 1
5 may 1
6 jun 2
7 jul 1
8 aug 1
9 sep 0
10 oct 0
11 nov 0
12 dec 0
Help me please.
CodePudding user response:
Perhaps this helps
library(dplyr)
library(lubridate)
library(tidyr)
library(purrr)
out <- df %>%
mutate(across(ends_with('_date'), ymd)) %>%
transmute(id,
dates = map2(hospitalization_date, discharged_date,
~ seq(.x, .y, by = 'month'))) %>%
unnest(dates) %>%
mutate(year = year(dates), monthly = format(dates, '%b')) %>%
count(year, monthly) %>%
group_by(year) %>%
complete(monthly = month.abb, fill = list(n = 0)) %>%
arrange(match(monthly, month.abb), .by_group = TRUE) %>%
ungroup
-output
> as.data.frame(out)
year monthly n
1 2010 Jan 1
2 2010 Feb 2
3 2010 Mar 2
4 2010 Apr 3
5 2010 May 3
6 2010 Jun 2
7 2010 Jul 1
8 2010 Aug 1
9 2010 Sep 0
10 2010 Oct 0
11 2010 Nov 0
12 2010 Dec 0
13 2011 Jan 0
14 2011 Feb 0
15 2011 Mar 1
16 2011 Apr 1
17 2011 May 1
18 2011 Jun 2
19 2011 Jul 1
20 2011 Aug 1
21 2011 Sep 0
22 2011 Oct 0
23 2011 Nov 0
24 2011 Dec 0
CodePudding user response:
library(tidyverse); library(lubridate)
df %>%
pivot_longer(contains("date")) %>%
mutate(chg = ifelse(name == "hospitalization_date", 1, -1),
month = if_else(chg == -1,
ceiling_date(value, "month"),
floor_date(value, "month"))) %>%
count(month, wt = chg, name = "change") %>%
complete(month = seq.Date(min(month), max(month),
by = "month"), fill = list(change=0)) %>%
mutate(count = cumsum(change))
Result
month change count
1 2010-01-01 1 1
2 2010-02-01 1 2
3 2010-03-01 0 2
4 2010-04-01 1 3
5 2010-05-01 0 3
6 2010-06-01 -1 2
7 2010-07-01 -1 1
8 2010-08-01 0 1
9 2010-09-01 -1 0
10 2010-10-01 0 0
11 2010-11-01 0 0
12 2010-12-01 0 0
13 2011-01-01 0 0
14 2011-02-01 0 0
15 2011-03-01 1 1
16 2011-04-01 0 1
17 2011-05-01 0 1
18 2011-06-01 1 2
19 2011-07-01 -1 1
20 2011-08-01 0 1
21 2011-09-01 -1 0