I have a dataframe that contains variables from a collection of follow-up visits from patients after transplantation.
For sake of simplicity the number of variables are 2: 1) the patient identification, 2) the number of days from the transplantation up until the follow-up event.
Each row is a follow-up visit. There can be multiple follow-ups for each patient in the data frame. The amount of follow-ups for each patient vary. The days after transplantation when these follow-up visits happen vary as well.
I would like to extract the number of days from the last follow up of each patient and write it in a separate column in every follow-up observation of the patient.
In the real dataset the amount of patients is around 15,000. I tried to extract the values from a nested dataframe, but it was not possible for me.
Example:
patient_ID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C")
days_tx_followup <- c(0, 5, 10, 15, 2, 4, 1, 2, 3)
df <- data.frame(patient_ID, days_tx_followup)
patient_ID days_tx_followup
1 A 0
2 A 5
3 A 10
4 A 15
5 B 2
6 B 4
7 C 1
8 C 2
9 C 3
What I would like to have:
patient_ID days_tx_followup last_followup
1 A 0 15
2 A 5 15
3 A 10 15
4 A 15 15
5 B 2 4
6 B 4 4
7 C 1 3
8 C 2 3
9 C 3 3
CodePudding user response:
Thankfully dplyr
has a function called last
that can do just this.
df %>%
group_by(patient_ID) %>%
mutate(
last_followup = last(days_tx_followup)
)
#> # A tibble: 9 × 3
#> # Groups: patient_ID [3]
#> patient_ID days_tx_followup last_followup
#> <chr> <dbl> <dbl>
#> 1 A 0 15
#> 2 A 5 15
#> 3 A 10 15
#> 4 A 15 15
#> 5 B 2 4
#> 6 B 4 4
#> 7 C 1 3
#> 8 C 2 3
#> 9 C 3 3
Created on 2022-08-23 by the reprex package (v2.0.1)
CodePudding user response:
Using by
and tail
.
by(df, df$patient_ID, \(x) cbind(x, last_followup=tail(x$days_tx_followup, 1))) |> unsplit(df$patient_ID)
# patient_ID days_tx_followup last_followup
# 1 A 0 15
# 2 A 5 15
# 3 A 10 15
# 4 A 15 15
# 5 B 2 4
# 6 B 4 4
# 7 C 1 3
# 8 C 2 3
# 9 C 3 3