Home > other >  How to calculate days between positive and negative results and creating new columns for each with R
How to calculate days between positive and negative results and creating new columns for each with R

Time:01-12

I have looked for answer to this but I have not been able to find one that will help me with a column that has numbers and strings. My data look like this

Name <- c("Doe, John","Doe, John","Doe, John", "Doe, Jane", "Doe, Jane","Doe, Jane","Parker, Peter","Parker, Peter","Parker, Peter", "Stark, Tony","Stark, Tony","Stark, Tony")
Accession <- c(123, 234, 345, 456, 567, 678, 789, 8910, 1023, 1134, 1567, 1769)
MRN <-c(55555, 55555, 55555, 66666, 66666, 66666, 77777, 77777, 77777, 88888, 88888, 88888)
Collected <-c("2022-01-05", "2022-01-06", "2022-01-07", "2022-01-08", "2022-01-09", "2022-01-10", "2022-01-11", "2022-01-12", "2022-01-13", "2022-01-14", "2022-01-15", "2022-01-16")
Result <-c(137, "Not Detected", 356, 1025, 1405, 538, "Not Detected", "Not Detected", "Not Detected", "Not Detected", 137, "Not Detected")

CV <- data.frame(Name, Accession, MRN, Collected, Result)

I have multiple observations on a group of people (sometimes up to 100 per person) I would like to begin counting from their first observation based on the date when they have a positive result and I would like to make this a new column Days_till_Pos, from that first positive result date I would then like to count the number of days until the results is "Not Detected" creating a new column Days_till_Neg.

I would like the data to look like this, where the results are grouped by MRN.

In this scenario if there is a number value in the Result column it would be considered positive, if it says "Not Detected" it would be considered negative. Also, If their first result is positive then the Days_till_Pos should say NA and if they never have a positive result then the Days_till_Neg column should say NA as well.

Name<- c("Doe, John","Doe, Jane","Parker, Peter", "Stark, Tony")
MRN<- c(55555, 66666, 77777, 88888)
Days_till_Pos<- c(NA, NA, NA, 1)
Days_till_Neg<- c(1,0,NA, 1)

CV1<- data.frame(Name, MRN, Days_till_Pos, Days_till_Neg)

Name                MRN          Days_till_Pos          Days_till_Neg
Doe,John            55555              NA                    1
Doe, Jane           66666              NA                    0
Parker, Peter       77777              NA                    NA 
Stark, Tony         88888              1                     1

CodePudding user response:

I'm not entirely sure this is correct, but happy to modify further as needed.

The first mutate is to make sure you have Collected in date format.

After grouping by Name and MRN, you can use summarise for each patient.

For Days_till_Pos, check if the first Result is negative. If it is, then take the difference between the first collection date when positive, and the first collection date (which is known to be negative). Alternatively, if first result is positive, then just include NA.

For Days_till_Neg, check if all results are negative. If so, include NA. If not, then take the date for when the Result is negative, but the lag (previous date) result was positive. This would provide the date of transition from positive to negative. Then subtract the first date of a positive test.

Note that this assumes that patients do not oscillate multiple times between negative and positive tests.

library(tidyverse)

CV %>%
  mutate(Collected = as.Date(Collected)) %>%
  group_by(Name, MRN) %>%
  summarise(
    Days_till_Pos = ifelse(first(Result) == "Not Detected", 
                           Collected[Result != "Not Detected"][1] - Collected[1], 
                           NA),
    Days_till_Neg = ifelse(all(Result == "Not Detected"), 
                           NA, 
                           Collected[Result == "Not Detected" & lag(Result, default = "Not Detected") != "Not Detected"][1] - Collected[Result != "Not Detected"][1])
  )

Output

  Name            MRN Days_till_Pos Days_till_Neg
  <chr>         <dbl>         <dbl>         <dbl>
1 Doe, Jane     66666            NA            NA
2 Doe, John     55555            NA             1
3 Parker, Peter 77777            NA            NA
4 Stark, Tony   88888             1             1
  • Related