Home > database >  When working with multiple rows with same Id in R, how to assign different values for different grou
When working with multiple rows with same Id in R, how to assign different values for different grou

Time:07-01

I am dealing with some patients data in R. I need to calculate the time it takes between the first visit to the last visit for the normal patients, and the time between the first visit to the date of first disease diagnosis for patients who progressed to the disease. I have tried but it didn't work out. I really appreciate if someone could help.

My data looks like "patient", where visit_number = the visit orders, followup_days=days between the first visit to each follow up visit.

patient<-data.frame(patient_ID=c(1,1,2,2,2,3,3,4,4,4),age=c(63,64,60,61,63,61,62,77,77,79),
visit_number=c(1,2,1,2,3,1,2,1,2,3), followup_days=c(0,504,0,390,798,0,379,0,310,621),diagnosis=c(0,0,0,1,1,0,0,0,0,1))

enter image description here

The new data needs to look like "patient1". I need to create a new variable "time".

For patients with a normal status, the time is the length of days between the first visit and the last visit.

For patients with a disease diagnosis (diagnosis=1), the time is the length of days between the first visit, and the FIRST time of the diagnosis of 1.

patient1 <-data.frame(patient_ID=c(1,1,2,2,2,3,3,4,4,4),age=c(63,64,60,61,63,61,62,77,77,79),
visit_number=c(1,2,1,2,3,1,2,1,2,3), followup_days=c(0,504,0,390,798,0,379,0,310,621),
diagnosis=c(0,0,0,1,1,0,0,0,0,1), time=c(504,504,390,390,390,379,379,621,621,621))

enter image description here

Lastly, for the final data set, I would like to only keep the first visit for each patients, with the "time" column added.

    new_patient <-data.frame(patient_ID=c(1,2,3,4),age=c(63,60,61,77),
    visit_number=c(1,1,1,1), followup_days=c(0,0,0,0),disgonosis=c(0,0,0,0), time=c(504,390,379,621))

enter image description here

Any ideas how to make it happen? Thank you

CodePudding user response:

To create the patient1 data, we first load the dplyr package and create a function that returns the minimum positve value, we then proceed by grouping the patients and create the time variable conditional on the diagnosis variable:

library(dplyr)
minpositive = function(x) min(x[x > 0])

patient1 <- patient %>% group_by(patient_ID) %>% 
mutate(time = ifelse(sum(diagnosis)>0, 
minpositive(followup_days * diagnosis), 
max(followup_days)))

To create the final dataset we filter based on visit_number:

new_patient <- patient1 %>% filter(visit_number == 1)

This should create the desired output.

CodePudding user response:

Group by patient_ID, and use an if-else statement to generate the time variable, conditional on the presence of 1 in the diagnosis column:

library(dplyr)

patient %>% 
  group_by(patient_ID) %>% 
  mutate(time = ifelse(1 %in% diagnosis,  min(followup_days[diagnosis==1]),max(followup_days))) %>% 
  filter(visit_number==1)

Output:

  patient_ID   age visit_number followup_days diagnosis  time
       <dbl> <dbl>        <dbl>         <dbl>     <dbl> <dbl>
1          1    63            1             0         0   504
2          2    60            1             0         0   390
3          3    61            1             0         0   379
4          4    77            1             0         0   621
  •  Tags:  
  • r
  • Related