This question is related to:
Context:
I have a dataframe that contains variables from a collection of follow-up visits from patients after transplantation.
There are now 3 relevant variables: 1) the patient identification, 2) the unique transplantation identification, 3) the number of days between transplantation and follow-up
Each row is a follow-up visit. There can be multiple follow-ups for each patient in the data frame. The amount of follow-ups for each patient vary.
The transplantation identification is missing for some follow-ups. Sometimes patients have had two transplantations so the transplantations IDs differ. It is not clear what the maximum number of different transplantation IDs per patient are.
Questions:
First I would like to know, which patients have zero, one, two... etc. transplantation IDs.
After that I would like to extract the transplantation identification and put it into a separate column for all follow ups of a specific patient.
Example:
patient_ID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C")
tx_ID <- c("123", NA, NA, "123", NA, NA, "456", NA, "789")
days_from_tx <- c(0, 5, 10, 15, 2, 4, 1, 2, 3)
df <- data.frame(patient_ID, tx_ID, days_from_tx)
patient_ID tx_ID days_from_tx
1 A 123 0
2 A <NA> 5
3 A <NA> 10
4 A 123 15
5 B <NA> 2
6 B <NA> 4
7 C 456 3
8 C <NA> 10
9 C 789 2
What I would like to have:
patient_ID tx_ID days_from_tx nr_tx_ID tx1 tx2
1 A 123 0 1 123 <NA>
2 A <NA> 5 1 123 <NA>
3 A <NA> 10 1 123 <NA>
4 A 123 15 1 123 <NA>
5 B <NA> 2 0 <NA> <NA>
6 B <NA> 4 0 <NA> <NA>
7 C 456 3 2 456 789
8 C <NA> 10 2 456 789
9 C 789 2 2 456 789
What I have tried:
I tried to group the patients, then find out the unique transplantation IDs and write them in a new column. I used the code from the answer to my last question and modified it slightly.
df %>%
group_by(patient_ID) %>%
mutate(
last_followup = n_distinct(tx_ID, na.rm = TRUE)
)
Unfortunatley this did not work. R prints the number of unique transplantation IDs of all patients into the column.
CodePudding user response:
You could use mutate()
to create the nr_tx_id
column, and separately, using filter()
and distinct() with
pivot_wider()` to get the distincte tx_IDs in wide format, and then join the two results:
df %>%
group_by(patient_ID) %>%
mutate(nr_tx_ID = length(unique(tx_ID[!is.na(tx_ID)]))) %>%
ungroup()
left_join(
df %>%
filter(!is.na(tx_ID)) %>%
distinct(patient_ID,tx_ID) %>%
group_by(patient_ID) %>%
mutate(id = row_number()) %>%
ungroup()
pivot_wider(names_from = id, values_from = tx_ID,names_prefix = "tx")
)
Output:
patient_ID tx_ID days_from_tx nr_tx_ID tx1 tx2
<chr> <chr> <dbl> <int> <chr> <chr>
1 A 123 0 1 123 NA
2 A NA 5 1 123 NA
3 A NA 10 1 123 NA
4 A 123 15 1 123 NA
5 B NA 2 0 NA NA
6 B NA 4 0 NA NA
7 C 456 1 2 456 789
8 C NA 2 2 456 789
9 C 789 3 2 456 789
CodePudding user response:
Using by
and unsplit
.
by(df, df$patient_ID, \(x) {
tx <- sort(unique(x$tx_ID))
cbind(x, nr_tx_ID=length(tx), tx1=tx[1], tx2=tx[2])
}) |> unsplit(df$patient_ID)
# patient_ID tx_ID days_from_tx nr_tx_ID tx1 tx2
# 1 A 123 0 1 123 <NA>
# 2 A <NA> 5 1 123 <NA>
# 3 A <NA> 10 1 123 <NA>
# 4 A 123 15 1 123 <NA>
# 5 B <NA> 2 0 <NA> <NA>
# 6 B <NA> 4 0 <NA> <NA>
# 7 C 456 1 2 456 789
# 8 C <NA> 2 2 456 789
# 9 C 789 3 2 456 789
Data:
df <- structure(list(patient_ID = c("A", "A", "A", "A", "B", "B", "C",
"C", "C"), tx_ID = c("123", NA, NA, "123", NA, NA, "456", NA,
"789"), days_from_tx = c(0, 5, 10, 15, 2, 4, 1, 2, 3)), class = "data.frame", row.names = c(NA,
-9L))