I have looked for answer to this but I have not been able to find one that will help me with a column that has numbers and strings. My data look like this
Name <- c("Doe, John","Doe, John","Doe, John", "Doe, Jane", "Doe, Jane","Doe, Jane","Parker, Peter","Parker, Peter","Parker, Peter", "Stark, Tony","Stark, Tony","Stark, Tony")
Accession <- c(123, 234, 345, 456, 567, 678, 789, 8910, 1023, 1134, 1567, 1769)
MRN <-c(55555, 55555, 55555, 66666, 66666, 66666, 77777, 77777, 77777, 88888, 88888, 88888)
Collected <-c("2022-01-05", "2022-01-06", "2022-01-07", "2022-01-08", "2022-01-09", "2022-01-10", "2022-01-11", "2022-01-12", "2022-01-13", "2022-01-14", "2022-01-15", "2022-01-16")
Result <-c(137, "Not Detected", 356, 1025, 1405, 538, "Not Detected", "Not Detected", "Not Detected", "Not Detected", 137, "Not Detected")
CV <- data.frame(Name, Accession, MRN, Collected, Result)
I have multiple observations on a group of people (sometimes up to 100 per person) I would like to begin counting from their first observation based on the date when they have a positive result and I would like to make this a new column Days_till_Pos
, from that first positive result date I would then like to count the number of days until the results is "Not Detected" creating a new column Days_till_Neg
.
I would like the data to look like this, where the results are grouped by MRN
.
In this scenario if there is a number value in the Result
column it would be considered positive, if it says "Not Detected" it would be considered negative. Also, If their first result is positive then the Days_till_Pos
should say NA
and if they never have a positive result then the Days_till_Neg
column should say NA
as well.
Name<- c("Doe, John","Doe, Jane","Parker, Peter", "Stark, Tony")
MRN<- c(55555, 66666, 77777, 88888)
Days_till_Pos<- c(NA, NA, NA, 1)
Days_till_Neg<- c(1,0,NA, 1)
CV1<- data.frame(Name, MRN, Days_till_Pos, Days_till_Neg)
Name MRN Days_till_Pos Days_till_Neg
Doe,John 55555 NA 1
Doe, Jane 66666 NA 0
Parker, Peter 77777 NA NA
Stark, Tony 88888 1 1
CodePudding user response:
I'm not entirely sure this is correct, but happy to modify further as needed.
The first mutate
is to make sure you have Collected
in date format.
After grouping by Name
and MRN
, you can use summarise for each patient.
For Days_till_Pos
, check if the first Result
is negative. If it is, then take the difference between the first collection date when positive, and the first collection date (which is known to be negative). Alternatively, if first result is positive, then just include NA
.
For Days_till_Neg
, check if all
results are negative. If so, include NA
. If not, then take the date for when the Result
is negative, but the lag
(previous date) result was positive. This would provide the date of transition from positive to negative. Then subtract the first date of a positive test.
Note that this assumes that patients do not oscillate multiple times between negative and positive tests.
library(tidyverse)
CV %>%
mutate(Collected = as.Date(Collected)) %>%
group_by(Name, MRN) %>%
summarise(
Days_till_Pos = ifelse(first(Result) == "Not Detected",
Collected[Result != "Not Detected"][1] - Collected[1],
NA),
Days_till_Neg = ifelse(all(Result == "Not Detected"),
NA,
Collected[Result == "Not Detected" & lag(Result, default = "Not Detected") != "Not Detected"][1] - Collected[Result != "Not Detected"][1])
)
Output
Name MRN Days_till_Pos Days_till_Neg
<chr> <dbl> <dbl> <dbl>
1 Doe, Jane 66666 NA NA
2 Doe, John 55555 NA 1
3 Parker, Peter 77777 NA NA
4 Stark, Tony 88888 1 1