The following df
represents treatments that a single patient has received during the course of a study. They first received drug-v, followed by drug-w, followed by drug-x, and so on.
original <- tibble::tribble(
~treatment_administered,
"drug-v",
"drug-w",
"drug-x",
"drug-y",
"drug-z",
"drug-l"
)
original
My aim is to keeping a cumulative record of prior treatment exposures that belong to a specific class of treatment - let's call this "class A". In this example, drug v, drug x and drug z belong to class A. Here is the final df I wish to create.
final <- tibble::tribble(
~prior_classA_details, ~treatment_administered,
"", "drug-v",
"drug-v", "drug-w",
"drug-v", "drug-x",
"drug-v,drug-x", "drug-y",
"drug-v,drug-x", "drug-z",
"drug-v, drug x,drug-z", "drug-l"
)
final
As you can see, prior_classA_details
is tracking treatment_administered
on the previous row, and if it's a class A treatment, it adds the name to the following row. This is an iterative process as it goes down the list, concatenating prior_classA_details
as class A treatments are administered.
There are multiple other data columns in this df
that I have not included here (only relevant columns included). Ideally looking for a dplyr solution please.
CodePudding user response:
Here's one way -
library(dplyr)
library(purrr)
classA <- c("drug-v", "drug-x", "drug-z")
original %>%
mutate(prior_classA_details = lag(map_chr(row_number(), ~{
toString(keep(treatment_administered[seq_len(.x)], function(y) y %in% classA))
}), default = ''), .before = 1)
# prior_classA_details treatment_administered
# <chr> <chr>
#1 "" drug-v
#2 "drug-v" drug-w
#3 "drug-v" drug-x
#4 "drug-v, drug-x" drug-y
#5 "drug-v, drug-x" drug-z
#6 "drug-v, drug-x, drug-z" drug-l
We create a vector for classA
drugs and for each row keep only those values that are of type classA
in a cumulative fashion and create one concatenated string. lag
is used to get lagged records by step 1.