I have a dataset about the years that different subjects took a certain treatment. I need to obtain a column that sets the first year of treatment and 0 if the subject has never been treated.
Let's say I have this dataset:
subject <- c(A, A, A, A, A, B, B, B, B, B, C, C, C, C, C)
year <- c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004)
treat <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0)
df1 <- data.frame(subject, year, treat)
I want to obtain this:
subject <- c(A, A, A, A, A, B, B, B, B, B, C, C, C, C, C)
year <- c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004)
treat <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0)
first_treat <- c(2003, 2003, 2003, 2003, 2003, 2001, 2001, 2001, 2001, 2001, 0, 0, 0, 0, 0)
df1 <- data.frame(subject, year, treat, first_treat)
In my original dataset I have mulriple subjects, so I would like to obtain a code to get this done without the need to mention rows or column values.
Thanks!
CodePudding user response:
Here is an option with ifelse
:
library(dplyr)
df1 %>%
group_by(subject) %>%
mutate(first_treat = ifelse(1 %in% treat, year[treat==1], 0))
subject year treat first_treat
<chr> <dbl> <dbl> <dbl>
1 A 2000 0 2003
2 A 2001 0 2003
3 A 2002 0 2003
4 A 2003 1 2003
5 A 2004 1 2003
6 B 2000 0 2001
7 B 2001 1 2001
8 B 2002 1 2001
9 B 2003 1 2001
10 B 2004 1 2001
11 C 2000 0 0
12 C 2001 0 0
13 C 2002 0 0
14 C 2003 0 0
15 C 2004 0 0
CodePudding user response:
We can do a group by approach - grouped by 'subject', create a logical vector with treat
, subset the 'year' and extract the first element ([1]
)
library(dplyr)
df1 <- df1 %>%
group_by(subject) %>%
mutate(first_treat = year[treat == 1][1]) %>%
ungroup
-output
df1
# A tibble: 15 × 4
subject year treat first_treat
<chr> <dbl> <dbl> <dbl>
1 a 2000 0 2003
2 a 2001 0 2003
3 a 2002 0 2003
4 a 2003 1 2003
5 a 2004 1 2003
6 b 2000 0 2001
7 b 2001 1 2001
8 b 2002 1 2001
9 b 2003 1 2001
10 b 2004 1 2001
11 c 2000 0 NA
12 c 2001 0 NA
13 c 2002 0 NA
14 c 2003 0 NA
15 c 2004 0 NA
If we want to return no year case as 0, an option is with coalesce
df1 %>%
group_by(subject) %>%
mutate(first_treat = coalesce(year[treat == 1][1], 0)) %>%
ungroup
# A tibble: 15 × 4
subject year treat first_treat
<chr> <dbl> <dbl> <dbl>
1 a 2000 0 2003
2 a 2001 0 2003
3 a 2002 0 2003
4 a 2003 1 2003
5 a 2004 1 2003
6 b 2000 0 2001
7 b 2001 1 2001
8 b 2002 1 2001
9 b 2003 1 2001
10 b 2004 1 2001
11 c 2000 0 0
12 c 2001 0 0
13 c 2002 0 0
14 c 2003 0 0
15 c 2004 0 0