In R: How to extract a specific (e.g. last) value from a dataframe with multiple rows belonging to o-CodePudding

I have a dataframe that contains variables from a collection of follow-up visits from patients after transplantation.

For sake of simplicity the number of variables are 2: 1) the patient identification, 2) the number of days from the transplantation up until the follow-up event.

Each row is a follow-up visit. There can be multiple follow-ups for each patient in the data frame. The amount of follow-ups for each patient vary. The days after transplantation when these follow-up visits happen vary as well.

I would like to extract the number of days from the last follow up of each patient and write it in a separate column in every follow-up observation of the patient.

In the real dataset the amount of patients is around 15,000. I tried to extract the values from a nested dataframe, but it was not possible for me.

Example:

patient_ID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C")

days_tx_followup <- c(0, 5, 10, 15, 2, 4, 1, 2, 3)

df <- data.frame(patient_ID, days_tx_followup)

  patient_ID days_tx_followup
1          A                0
2          A                5
3          A               10
4          A               15
5          B                2
6          B                4
7          C                1
8          C                2
9          C                3

What I would like to have:

  patient_ID days_tx_followup last_followup
1          A                0            15
2          A                5            15
3          A               10            15
4          A               15            15
5          B                2             4
6          B                4             4
7          C                1             3
8          C                2             3
9          C                3             3

CodePudding user response：

Thankfully dplyr has a function called last that can do just this.

df %>%
  group_by(patient_ID) %>%
  mutate(
    last_followup = last(days_tx_followup)
  )
#> # A tibble: 9 × 3
#> # Groups:   patient_ID [3]
#>   patient_ID days_tx_followup last_followup
#>   <chr>                 <dbl>         <dbl>
#> 1 A                         0            15
#> 2 A                         5            15
#> 3 A                        10            15
#> 4 A                        15            15
#> 5 B                         2             4
#> 6 B                         4             4
#> 7 C                         1             3
#> 8 C                         2             3
#> 9 C                         3             3

^{Created on 2022-08-23 by the reprex package (v2.0.1)}

CodePudding user response：

Using by and tail.

by(df, df$patient_ID, \(x) cbind(x, last_followup=tail(x$days_tx_followup, 1))) |> unsplit(df$patient_ID)
#   patient_ID days_tx_followup last_followup
# 1          A                0            15
# 2          A                5            15
# 3          A               10            15
# 4          A               15            15
# 5          B                2             4
# 6          B                4             4
# 7          C                1             3
# 8          C                2             3
# 9          C                3             3