Home > Enterprise >  R function to find difference between 2 values in a column when the value in another column is the s
R function to find difference between 2 values in a column when the value in another column is the s

Time:07-15

I have a data set that looks like the following:

Subject ID Density Disease Date
1 0.5 AD 1/1/2020
1 0.45 AD 2/3/2021
2 0.4 CN 1/4/2018
2 0.3 MCI 5/6/2019
3 0.67 MCI 4/7/2020
3 0.61 MCI 3/8/2021
3 0.54 AD 9/21/2021

I would like to find a code in R that will tell me the difference in densities for a particular subject only when the subject has a change disease status between dates. So I would like the output to look something like:

Subject ID Original Disease New Disease Change in density
2 CN MCI -0.1
3 MCI AD -0.07

So this would exclude subject 1 and would also only calculate the difference in densities for subject 3 between dates 3/8/2021 and 9/21/2021 as those were the dates in which the disease changed.

Sorry if the question is not formatted correctly. Also any help with any of the code would be great even if it will not get me the exact desired output. I just need to keep track of the disease status and changes in densities.

CodePudding user response:

library(dplyr)
df %>%
  group_by(Subject.ID) %>%
  mutate(New_Disease = lead(Disease),
         New_Density = lead(Density),
         Change_Density = New_Density - Density) %>%
  filter(Disease != lead(Disease)) %>%
  ungroup() %>%
  select(Subject.ID, Original_Disease = Disease, New_Disease, Change_Density)

Result

# A tibble: 2 × 4
  Subject.ID Original_Disease New_Disease Change_Density
       <int> <chr>            <chr>                <dbl>
1          2 CN               MCI                -0.1   
2          3 MCI              AD                 -0.0700

Using this data:

df <- data.frame(
  stringsAsFactors = FALSE,
        Subject.ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L),
           Density = c(0.5, 0.45, 0.4, 0.3, 0.67, 0.61, 0.54),
           Disease = c("AD", "AD", "CN", "MCI", "MCI", "MCI", "AD"),
              Date = c("1/1/2020","2/3/2021",
                       "1/4/2018","5/6/2019","4/7/2020","3/8/2021","9/21/2021")
)

CodePudding user response:

Here is an option

library(dplyr)
library(tidyr)
library(lubridate)
df1 %>% 
  filter(SubjectID != 1) %>%
  mutate(Date = mdy(Date)) %>% 
  group_by(SubjectID) %>% 
  mutate(Change_in_density =  if(cur_group()$SubjectID == 3) 
    diff(Density[between(Date, as.Date('2021-03-08'), 
   as.Date('2021-09-21'))]) else diff(Density)) %>% 
  ungroup %>% 
  select(-Density, -Date) %>% 
  distinct() %>% 
  mutate(nm1 = rep(c("Original_Disease", "New Disease"),
     length.out = n()) ) %>%
  pivot_wider(names_from = nm1, values_from = Disease)

-output

# A tibble: 2 × 4
  SubjectID Change_in_density Original_Disease `New Disease`
      <int>             <dbl> <chr>            <chr>        
1         2           -0.1    CN               MCI          
2         3           -0.0700 MCI              AD        

data

df1 <- structure(list(SubjectID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L), Density = c(0.5, 
0.45, 0.4, 0.3, 0.67, 0.61, 0.54), Disease = c("AD", "AD", "CN", 
"MCI", "MCI", "MCI", "AD"), Date = c("1/1/2020", "2/3/2021", 
"1/4/2018", "5/6/2019", "4/7/2020", "3/8/2021", "9/21/2021")),
 class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response:

Another possible solution:

library(dplyr)

df %>% 
  group_by(SubjectID) %>% 
  summarise(if (n_distinct(Disease) == 1)  NA else 
   tibble(original = first(Disease), final = last(Disease)),
   change = Density[n()] - 
            Density[which.max(Disease != lag(Disease, default = Disease[1])) - 1],
  .groups = "drop")

#> # A tibble: 2 × 4
#>   SubjectID original final  change
#>       <int> <chr>    <chr>   <dbl>
#> 1         2 CN       MCI   -0.1   
#> 2         3 MCI      AD    -0.0700
  •  Tags:  
  • r
  • Related