In R, how do I code to analyse patients with follow up at any given time point?-CodePudding

I have patients with baseline pain scores and follow up of 6 months, 1 year and 2 years (each their own variable column). I have 26,000 patients. There is missing data at those various time points. I can easily analyse pain score outcomes at one year excluding missing, 6mths and two years etc.... What I would like to do is analyse outcomes in those with data at EITHER 6mths, one year or two year. Some patients will have more than one and some will have missing data for all three. Any ideas how to code this? Maybe another column with mutate() ... that creates 'vas.outcome' and then in this variable I can have one-year data, if missing one-year then two-year, and if missing two-year then 6-month. If all three missing then code as NA.

# A tibble: 6 x 4
        vas.base      vas.6mth       vas.year      vas.two
          <dbl>         <dbl>         <dbl>         <dbl>
1           5            NA              NA           4
2           9            2.3             1.2          NA
3           8.1          NA              NA           NA
4           10           NA              NA           3.3
5           6.5          6.5             NA           NA
6           8            NA              NA           3

CodePudding user response：

one approach:

library(dplyr)

your_data_frame %>%
  mutate(vas.outcome = coalesce(vas.6mth, vas.year, vas.two))

CodePudding user response：

You could use a case_when()/fcase() approach

dt[, pain:=fcase(
  !is.na(vas.year), vas.year,
  !is.na(vas.two), vas.two,
  !is.na(vas.6mth), vas.6mth,
  default = NA
)]

dt %>% 
  mutate(pain:=case_when(
    !is.na(vas.year)~vas.year,
    !is.na(vas.two)~vas.two,
    TRUE~vas.6mth
  ))

Output:

   vas.base vas.6mth vas.year vas.two pain
1:      5.0       NA       NA     4.0  4.0
2:      9.0      2.3      1.2      NA  1.2
3:      8.1       NA       NA      NA   NA
4:     10.0       NA       NA     3.3  3.3
5:      6.5      6.5       NA      NA  6.5
6:      8.0       NA       NA     3.0  3.0

CodePudding user response：

I'm not 100% sure what you want your final dataset to look like, and I'm sure there are more elegant ways, but to choose the first occurrence of an outcome (after baseline), you can do:

Data

df <- read.table(text = "id        vas.base      vas.6mth       vas.year      vas.two
1           5            NA              NA           4
2           9            2.3             1.2          NA
3           8.1          NA              NA           NA
4           10           NA              NA           3.3
5           6.5          6.5             NA           NA
6           8            NA              NA           3", header = TRUE)

dplyr approach:

library(tidyr)
df %>% pivot_longer(starts_with("vas")[-1], names_to = "visit") %>%
  group_by(id) %>% mutate(vas.outcome = first(na.omit(value))) %>% 
  slice(1) %>% select(id, vas.outcome) %>% 
  left_join(df, by = "id")

Output:

#      id vas.outcome vas.base vas.6mth vas.year vas.two
# <int>       <dbl>    <dbl>    <dbl>    <dbl>   <dbl>
# 1     1         4        5        NA       NA       4  
# 2     2         2.3      9        2.3      1.2      NA  
# 3     3         NA       8.1      NA       NA       NA  
# 4     4         3.3      10       NA       NA       3.3
# 5     5         6.5      6.5      6.5      NA       NA  
# 6     6         3        8        NA       NA       3