Home > Software design >  R - Create new column differentially based on another column
R - Create new column differentially based on another column

Time:05-28

I have the following dataset:

ID year start_year
a  1    1
a  2    1
a  3    1
b  1    2
b  2    2
b  3    2
c  1    3
c  2    3
c  3    3

And I want to create a new dummy column present that, for each ID is, 1-1-1 if start_year is 1, is 0-1-1 if start_year is 2, and is 0-0-1 if start_year is 3. My goal is to get the following table:

ID year start_year present
a  1    1          1
a  2    1          1
a  3    1          1
b  1    2          0
b  2    2          1
b  3    2          1
c  1    3          0
c  2    3          0
c  3    3          1

I guess this should be fairly easy for most of you, but I'm really stuck. Many thanks for your help!

CodePudding user response:

Easier option is to create a key/value list and then subset the list with the first element of 'start_year' for each 'ID' (assuming there are only 3 elements per group)

library(dplyr)
lst1 <- list(`1` = c(1, 1, 1), `2` = c(0, 1, 1), `3` = c(0, 0, 1))
df1 %>%
   group_by(ID) %>% 
   mutate(present = lst1[[as.character(first(start_year))]]) %>%
   ungroup

-output

# A tibble: 9 × 4
  ID     year start_year present
  <chr> <int>      <int>   <dbl>
1 a         1          1       1
2 a         2          1       1
3 a         3          1       1
4 b         1          2       0
5 b         2          2       1
6 b         3          2       1
7 c         1          3       0
8 c         2          3       0
9 c         3          3       1

data

df1 <- structure(list(ID = c("a", "a", "a", "b", "b", "b", "c", "c", 
"c"), year = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), start_year = c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L)), class = "data.frame", row.names = c(NA, 
-9L))

CodePudding user response:

A possible approach:

library(tidyverse)

df <- tribble(
  ~ID, ~year, ~start_year,
  "a", 1, 1,
  "a", 2, 1,
  "a", 3, 1,
  "b", 1, 2,
  "b", 2, 2,
  "b", 3, 2,
  "c", 1, 3,
  "c", 2, 3,
  "c", 3, 3
)

df |> mutate(present = if_else(start_year <= year, 1, 0))
#> # A tibble: 9 × 4
#>   ID     year start_year present
#>   <chr> <dbl>      <dbl>   <dbl>
#> 1 a         1          1       1
#> 2 a         2          1       1
#> 3 a         3          1       1
#> 4 b         1          2       0
#> 5 b         2          2       1
#> 6 b         3          2       1
#> 7 c         1          3       0
#> 8 c         2          3       0
#> 9 c         3          3       1

Created on 2022-05-27 by the reprex package (v2.0.1)

  • Related