Apologies I am not sure this warrants another question. But am basing this off of another questions (Group by ID and Outcome and take the earliest earliest Dates of specific outcomes and assign numbers (i.e outcome1, outcome2)).
mydata = data.frame (Id =c (1,1,1,1,1,1,1,1,2,2,2,2),
Date = c("2001-01-31", "2001-02-13","2001-05-31",
"2001-06-02","2018-01-31","2018-03-31","2018-07-31",
"2019-04-04","2014-01-31","2014-02-02","2014-04-31",
"2014-05-18"),Outcome = c("Relapse","CR","Relapse","Relapse",
"CR","CR","CR","Relapse","CR", "CR","Relapse","CR"))
Which outputs the below.
Id Date Outcome
1 2001-01-31 Relapse
1 2001-02-13 CR
1 2001-05-31 Relapse
1 2001-06-02 Relapse
1 2018-01-31 CR
1 2018-03-31 CR
1 2018-07-31 CR
1 2019-04-04 Relapse
2 2014-01-31 CR
2 2014-02-02 CR
2 2014-04-31 Relapse
2 2014-05-18 CR
As you can see each patient is in certain phases at different times and I would like to capture the earliest dates of when each new phase starts per patient. I would like then to rename these phases to CR1, Relapse1, CR2, Relapse2 and so forth. But I would only like to start naming AFTER patient achieves their first CR1. So if a patient has a relapse before CR, those would be ignored.
Im hoping to get an output like this:
Id CR1 Relapse1 CR2 Relapse2
1 2001-02-13 2001-05-31 2018-01-31 2019-04-04
2 2014-01-31 2014-04-31 2014-05-18 NA
As you see patient 1 had a relapse at 2001-01-31, but was ignored because it occured before the first CR1.
The following code was so helpful to the previous question, but was wondering if I can modify it so that it will start the counting after the first CR? Thank you!!
mydata %>%
group_by(Id) %>%
mutate(Grp = rleid(Outcome)) %>%
group_by(Grp, .add = T) %>%
slice(1) %>%
group_by(Id, Outcome) %>%
mutate(n = row_number()) %>%
pivot_wider(id_cols = Id, names_from = c(Outcome, n), values_from = Date)
CodePudding user response:
Using data.table
package
library(data.table)
mydata <- setDT(mydata)
# identify consecutive same Outcome for each Id
mydata[, flag:=rleid(Id, Outcome)]
# collapse identified consecutive values (keep min date)
tmp <- mydata[, by="flag,Id,Outcome", .(Date=min(Date))]
# mark relapse if first
tmp <- tmp[, by=Id,
flag := (Outcome[1]=="Relapse") & (seq_len(.N)==1)
]
# remove marked relapse
tmp <- tmp[(!flag)]
# Outcome numbering
tmp[, by="Id,Outcome", n:=seq_len(.N)]
# cast (widen)
dcast(tmp, Id~paste0(Outcome, n), value.var = "Date")
Got
Id CR1 CR2 Relapse1 Relapse2
1: 1 2001-02-13 2018-01-31 2001-05-31 2019-04-04
2: 2 2014-01-31 2014-05-18 2014-04-31 <NA>
using %>% notation
mydata[, flag:=rleid(Id,Outcome)] %>%
.[, by="flag,Id,Outcome", .( Date=min(Date) )] %>%
.[, by="Id", flag := (Outcome[1]=="Relapse") & (seq_len(.N)==1)] %>%
.[(!flag)] %>%
.[, by="Id,Outcome", n := seq_len(.N)] %>%
dcast(., Id ~ paste0(Outcome, n), value.var = "Date")
CodePudding user response:
That is my approach :
library(data.table)
my_df <- data.frame (Id =c (1,1,1,1,1,1,1,1,2,2,2,2),
Date = c("2001-01-31", "2001-02-13","2001-05-31",
"2001-06-02","2018-01-31","2018-03-31","2018-07-31",
"2019-04-04","2014-01-31","2014-02-02","2014-04-31",
"2014-05-18"),
Outcome = c("Relapse","Relapse","Relapse","Relapse",
"CR","CR","CR","Relapse","CR", "CR","Relapse","CR"),
stringsAsFactors = FALSE)
my_df <- my_df %>% group_by(Id) %>% arrange(Id, Date)
my_df <- my_df %>%
group_by(Id) %>%
mutate(Value = seq_along(Outcome)) %>%
mutate(first_v = Outcome == "CR" & !duplicated(Outcome == "CR")) %>%
mutate(first_a = first_v == "FALSE" & Value > Value[Outcome == "CR" & !duplicated(Outcome == "CR")] | Outcome == "CR") %>%
filter(first_a == "TRUE")
my_df %>%
group_by(Id) %>%
mutate(Grp = rleid(Outcome)) %>%
group_by(Grp, .add = T) %>%
group_by(Id, Outcome) %>%
mutate(n = row_number()) %>%
pivot_wider(id_cols = Id, names_from = c(Outcome, n), values_from = Date)
CodePudding user response:
You could try adding:
filter(Date >= first(Date[Outcome == "CR"]))
to filter
out rows before the first "CR". This assumes your Date
is sorted/arranged first.
library(tidyverse)
library(data.table)
mydata %>%
group_by(Id) %>%
filter(Date >= first(Date[Outcome == "CR"])) %>%
mutate(Grp = rleid(Outcome)) %>%
group_by(Grp, .add = T) %>%
slice(1) %>%
group_by(Id, Outcome) %>%
mutate(n = row_number()) %>%
pivot_wider(id_cols = Id, names_from = c(Outcome, n), values_from = Date)
Output
Id CR_1 Relapse_1 CR_2 Relapse_2
<dbl> <chr> <chr> <chr> <chr>
1 1 2001-02-13 2001-05-31 2018-01-31 2019-04-04
2 2 2014-01-31 2014-04-31 2014-05-18 NA