I have a dataset with two waves of data. I want to transform age to be time-invariant to take the value at time 2 for time 1. What is the best way to do this using dplyr
?
library(dplyr)
df <- tibble(ID = c(1001, 1001, 1002, 1002), time = c(1,2,1,2), age = c(23,25,54,56))
Table:
ID | time | age |
---|---|---|
1001 | 1 | 23 |
1001 | 2 | 25 |
1002 | 1 | 54 |
1002 | 2 | 56 |
Desired Table:
ID | time | age |
---|---|---|
1001 | 1 | 25 |
1001 | 2 | 25 |
1002 | 1 | 56 |
1002 | 2 | 56 |
CodePudding user response:
We may do a group by 'ID' and get the max
of 'age' in mutate
library(dplyr)
df %>%
group_by(ID) %>%
mutate(age = max(age)) %>%
ungroup
Or if it should be from 'time' 2 - subset the 'age' based on logical expression with 'time', select the first element (also returns NA
if there are no 'time' value of 2 for a particular 'ID'
df %>%
group_by(ID) %>%
mutate(age = age[time == 2][1]) %>%
ungroup
-output
# A tibble: 4 × 3
ID time age
<dbl> <dbl> <dbl>
1 1001 1 25
2 1001 2 25
3 1002 1 56
4 1002 2 56
Or another option is arrange
the 'ID', 'time' and select the last element (assuming only 1 and 2 in 'time' and assuming all 'ID' have time 2)
df %>%
arrange(ID, time) %>%
group_by(ID) %>%
mutate(age = last(age)) %>%
ungroup