Home > OS >  How to create a new column in a dataframe depending on other columns values
How to create a new column in a dataframe depending on other columns values

Time:05-10

I have a dataset about the years that different subjects took a certain treatment. I need to obtain a column that sets the first year of treatment and 0 if the subject has never been treated.

Let's say I have this dataset:

subject <- c(A, A, A, A, A, B, B, B, B, B, C, C, C, C, C)
year <- c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004)
treat <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0)
df1 <- data.frame(subject, year, treat)

I want to obtain this:

subject <- c(A, A, A, A, A, B, B, B, B, B, C, C, C, C, C)
year <- c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004)
treat <- c(0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0)
first_treat <- c(2003, 2003, 2003, 2003, 2003, 2001, 2001, 2001, 2001, 2001, 0, 0, 0, 0, 0)
df1 <- data.frame(subject, year, treat, first_treat)

In my original dataset I have mulriple subjects, so I would like to obtain a code to get this done without the need to mention rows or column values.

Thanks!

CodePudding user response:

Here is an option with ifelse:

library(dplyr)

df1 %>% 
  group_by(subject) %>% 
  mutate(first_treat = ifelse(1 %in% treat, year[treat==1], 0))
subject  year treat first_treat
   <chr>   <dbl> <dbl>       <dbl>
 1 A        2000     0        2003
 2 A        2001     0        2003
 3 A        2002     0        2003
 4 A        2003     1        2003
 5 A        2004     1        2003
 6 B        2000     0        2001
 7 B        2001     1        2001
 8 B        2002     1        2001
 9 B        2003     1        2001
10 B        2004     1        2001
11 C        2000     0           0
12 C        2001     0           0
13 C        2002     0           0
14 C        2003     0           0
15 C        2004     0           0

CodePudding user response:

We can do a group by approach - grouped by 'subject', create a logical vector with treat, subset the 'year' and extract the first element ([1])

library(dplyr)
df1 <- df1 %>% 
  group_by(subject) %>% 
  mutate(first_treat = year[treat == 1][1]) %>%
  ungroup

-output

df1
# A tibble: 15 × 4
   subject  year treat first_treat
   <chr>   <dbl> <dbl>       <dbl>
 1 a        2000     0        2003
 2 a        2001     0        2003
 3 a        2002     0        2003
 4 a        2003     1        2003
 5 a        2004     1        2003
 6 b        2000     0        2001
 7 b        2001     1        2001
 8 b        2002     1        2001
 9 b        2003     1        2001
10 b        2004     1        2001
11 c        2000     0          NA
12 c        2001     0          NA
13 c        2002     0          NA
14 c        2003     0          NA
15 c        2004     0          NA

If we want to return no year case as 0, an option is with coalesce

df1 %>% 
  group_by(subject) %>% 
  mutate(first_treat = coalesce(year[treat == 1][1], 0)) %>%
  ungroup
# A tibble: 15 × 4
   subject  year treat first_treat
   <chr>   <dbl> <dbl>       <dbl>
 1 a        2000     0        2003
 2 a        2001     0        2003
 3 a        2002     0        2003
 4 a        2003     1        2003
 5 a        2004     1        2003
 6 b        2000     0        2001
 7 b        2001     1        2001
 8 b        2002     1        2001
 9 b        2003     1        2001
10 b        2004     1        2001
11 c        2000     0           0
12 c        2001     0           0
13 c        2002     0           0
14 c        2003     0           0
15 c        2004     0           0
  •  Tags:  
  • r
  • Related