Home > OS >  How to create time-invariant variable using dplyr (R)
How to create time-invariant variable using dplyr (R)

Time:03-20

I have a dataset with two waves of data. I want to transform age to be time-invariant to take the value at time 2 for time 1. What is the best way to do this using dplyr?

library(dplyr)
df <- tibble(ID = c(1001, 1001, 1002, 1002), time = c(1,2,1,2), age = c(23,25,54,56))

Table:

ID time age
1001 1 23
1001 2 25
1002 1 54
1002 2 56

Desired Table:

ID time age
1001 1 25
1001 2 25
1002 1 56
1002 2 56

CodePudding user response:

We may do a group by 'ID' and get the max of 'age' in mutate

library(dplyr)
df %>%
   group_by(ID) %>% 
   mutate(age = max(age)) %>% 
   ungroup

Or if it should be from 'time' 2 - subset the 'age' based on logical expression with 'time', select the first element (also returns NA if there are no 'time' value of 2 for a particular 'ID'

df %>%
    group_by(ID) %>%
    mutate(age = age[time == 2][1]) %>%
    ungroup

-output

# A tibble: 4 × 3
     ID  time   age
  <dbl> <dbl> <dbl>
1  1001     1    25
2  1001     2    25
3  1002     1    56
4  1002     2    56

Or another option is arrange the 'ID', 'time' and select the last element (assuming only 1 and 2 in 'time' and assuming all 'ID' have time 2)

df %>% 
   arrange(ID, time) %>%
   group_by(ID) %>%
   mutate(age = last(age)) %>%
   ungroup
  •  Tags:  
  • r
  • Related