I do not know how to create a new variable based on multiple conditions.
The example can interprete my problem.
There are 3 main variables.
id
means people. The same id
means same people.
date
means the date that access to hospital.
outcome
means if a person have a specific disease. 1
means sick and 0
means no disease.
df = data.frame(
id = c(1,1,1,1, 2,2,2,2, 3,3,3,3, 4,4,4,4),
date = c('2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01',
'2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01',
'2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01',
'2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01'),
outcome = c(0,0,1,1, 0,1,0,0, 1,1,0,0, 0,0,0,0)
) %>%
mutate(date = as.Date(date)) %>%
as_tibble()
# A tibble: 16 × 3
id date outcome
<dbl> <date> <dbl>
1 1 2013-01-01 0
2 1 2014-01-01 0
3 1 2015-01-01 1
4 1 2016-01-01 1
5 2 2013-01-01 0
6 2 2014-01-01 1
7 2 2015-01-01 0
8 2 2016-01-01 0
9 3 2013-01-01 1
10 3 2014-01-01 1
11 3 2015-01-01 0
12 3 2016-01-01 0
13 4 2013-01-01 0
14 4 2014-01-01 0
15 4 2015-01-01 0
16 4 2016-01-01 0
I want to create a new variable time.to.event
which means the time difference between the date of first accident and date of first time access to hospital.
for example:
for id = 1
, the date of first accident is 2015-01-01
;
the date of first time access to hospital is 2013-01-01
.
time.to.event
should be 730
(2015-01-01
minus 2013-01-01
).
for id = 2
, the date of first accident is 2014-01-01
;
the date of first time access to hospital is 2013-01-01
.
time.to.event
should be 365
(2014-01-01
minus 2013-01-01
).
for id = 3
, the date of first accident is 2013-01-01
;
the date of first time access to hospital is 2013-01-01
.
time.to.event
should be 0
(2013-01-01
minus 2013-01-01
).
for id = 4
, there is no date of first accident.
time.to.event
should be 1095
(max(date)
minus min(date)
).
I have tried the code below but failed.
df %>%
group_by(id) %>%
mutate(time.to.event =
case_when(sum(outcome) == 0 ~max(date) - min(date),
sum(outcome) != 0 ~ first(date[outcome == 1]) - min(date)))
The final data should like this:
# A tibble: 12 × 4
id date outcome time.to.event
<dbl> <date> <dbl> <dbl>
1 1 2013-01-01 0 730
2 1 2014-01-01 0 730
3 1 2015-01-01 1 730
4 1 2016-01-01 1 730
5 2 2013-01-01 0 365
6 2 2014-01-01 1 365
7 2 2015-01-01 0 365
8 2 2016-01-01 0 365
9 3 2013-01-01 1 0
10 3 2014-01-01 1 0
11 3 2015-01-01 0 0
12 3 2016-01-01 0 0
13 4 2013-01-01 0 1095
14 4 2014-01-01 0 1095
15 4 2015-01-01 0 1095
16 4 2016-01-01 0 1095
CodePudding user response:
Try with dplyr::first
.
library(dplyr)
df %>%
group_by(id) %>%
mutate(time.to.event = first(date[outcome == 1]) - min(date))
# Groups: id [3]
id date outcome time.to.event
<dbl> <date> <dbl> <drtn>
1 1 2013-01-01 0 730 days
2 1 2014-01-01 0 730 days
3 1 2015-01-01 1 730 days
4 1 2016-01-01 1 730 days
5 2 2013-01-01 0 365 days
6 2 2014-01-01 1 365 days
7 2 2015-01-01 0 365 days
8 2 2016-01-01 0 365 days
9 3 2013-01-01 1 0 days
10 3 2014-01-01 1 0 days
11 3 2015-01-01 0 0 days
12 3 2016-01-01 0 0 days
CodePudding user response:
We may wrap with which
or use match
to get the index where the value is 1. If there is none present, return NA with [1]
and then coalesce
with the diff
or range
of 'date'
library(dplyr)
df <- df %>%
group_by(id) %>%
mutate(time_to_event = as.numeric(coalesce(date[match(1,
outcome )][1] - min(date), diff(range(date))))) %>%
ungroup
-output
df
# A tibble: 16 × 4
id date outcome time_to_event
<dbl> <date> <dbl> <dbl>
1 1 2013-01-01 0 730
2 1 2014-01-01 0 730
3 1 2015-01-01 1 730
4 1 2016-01-01 1 730
5 2 2013-01-01 0 365
6 2 2014-01-01 1 365
7 2 2015-01-01 0 365
8 2 2016-01-01 0 365
9 3 2013-01-01 1 0
10 3 2014-01-01 1 0
11 3 2015-01-01 0 0
12 3 2016-01-01 0 0
13 4 2013-01-01 0 1095
14 4 2014-01-01 0 1095
15 4 2015-01-01 0 1095
16 4 2016-01-01 0 1095
CodePudding user response:
You can group by id, and subtract the minimum date from the minimum date where outcome==1; I've implemented in a helper function, f
f <- function(x,o) {
if(1 %in% o) min(x[o==1]) - min(x)
else max(x) - min(x)
}
Now, just apply the function to df
by id
group_by(df, id) %>% mutate(days = f(date,outcome))
Output:
# A tibble: 16 × 4
# Groups: id [4]
id date outcome days
<dbl> <date> <dbl> <drtn>
1 1 2013-01-01 0 730 days
2 1 2014-01-01 0 730 days
3 1 2015-01-01 1 730 days
4 1 2016-01-01 1 730 days
5 2 2013-01-01 0 365 days
6 2 2014-01-01 1 365 days
7 2 2015-01-01 0 365 days
8 2 2016-01-01 0 365 days
9 3 2013-01-01 1 0 days
10 3 2014-01-01 1 0 days
11 3 2015-01-01 0 0 days
12 3 2016-01-01 0 0 days
13 4 2013-01-01 0 1095 days
14 4 2014-01-01 0 1095 days
15 4 2015-01-01 0 1095 days
16 4 2016-01-01 0 1095 days