Column difference from baseline-CodePudding

This may be a duplicate question, I wouldn't be surprised if that's the case,

This is an example of a dataset I am dealing with

ID    Type    Time1     Time2     Time3
1     A1      12.23     NA        NA
2     A1       0.35     0.53      NA
2     A2       5.78     NA        10.25
3     A5       NA       NA        4.19
4     A3       NA       3.18      7.15
5     A5       10.91    4.56      2.45

My goal is to create two columns [Delta1, Delta2] like this

Delta1 : This column stores the difference between values in Time2-Time1 only among rows where all three values : Time1, Time2, Time3 are available. For example last row ID 5 has values for all three time, time1,time2,time3 so Delta1 = 4.56-10.91 = -6.35

Delta2 : This column stores the difference between either Time2-Time1 or Time3-Time1 or Time3-Time2. If a row does not have any two time values, then it is 0

Final expected output

ID    Type    Time1     Time2     Time3     Delta1     Delta2
1     A1      12.23     NA        NA                   0
2     A1       0.35     0.53      NA                   0.18
2     A2       5.78     NA        10.25                4.47  
3     A5       NA       NA        4.19                 0
4     A3       NA       3.18      7.15                 3.97
5     A5       10.91    4.56      2.45      -6.35     -2.11

Any help is much appreciated , thanks in advance.

CodePudding user response：

df$Delta1 <- ifelse(!is.na(df$Time1) & !is.na(df$Time2) & !is.na(df$Time3),
                    df$Time2 - df$Time1,
                    NA)

df$Delta2 <- vapply(seq_len(nrow(df)), \(x){
                                              x = na.omit(c(df$Time3[x], df$Time2[x], df$Time1[x]))
                                              x = x[1] - x[2]
                                              if(is.na(x)) return(0)
                                              return(x)
                                            }, 0)

Result:

> df
  ID Type Time1 Time2 Time3 Delta1 Delta2
1  1   A1 12.23    NA    NA     NA   0.00
2  2   A1  0.35  0.53    NA     NA   0.18
3  2   A2  5.78    NA 10.25     NA   4.47
4  3   A5    NA    NA  4.19     NA   0.00
5  4   A3    NA  3.18  7.15     NA   3.97
6  5   A5 10.91  4.56  2.45  -6.35  -2.11

CodePudding user response：

The criteria you have stated is non-deterministic for the output of Delta2, I've gone for the max range.

library(dplyr)

data %>% 
  rowwise() %>% 
  mutate(
    Delta1 = if_else(
      anyNA(c_across(starts_with("Time"))),
      0,
      Time2 - Time1
    ),
    Delta2 = diff(range(na.omit(c_across(starts_with("Time")))))
  ) 
#> # A tibble: 6 × 7
#> # Rowwise: 
#>      ID Type  Time1 Time2 Time3 Delta1 Delta2
#>   <dbl> <chr> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#> 1     1 A1    12.2  NA    NA      0      0   
#> 2     2 A1     0.35  0.53 NA      0      0.18
#> 3     2 A2     5.78 NA    10.2    0      4.47
#> 4     3 A5    NA    NA     4.19   0      0   
#> 5     4 A3    NA     3.18  7.15   0      3.97
#> 6     5 A5    10.9   4.56  2.45  -6.35   8.46

CodePudding user response：

For producing column Delta 1 you can also try the following...


data <- data %>% 
  dplyr::mutate(Delta1 = case_when(
    Time1 & Time2 & Time3!="NA" ~ Time2 - Time1))

CodePudding user response：

With dplyr, you could use coalesce() to find the first non-missing element.

library(dplyr)

df %>%
  mutate(Delta1 = ifelse(if_any(starts_with("Time"), is.na), NA, Time2-Time1),
         Delta2 = coalesce(Time3-Time2, Time3-Time1, Time2-Time1, 0))

#   ID Type Time1 Time2 Time3 Delta1 Delta2
# 1  1   A1 12.23    NA    NA     NA   0.00
# 2  2   A1  0.35  0.53    NA     NA   0.18
# 3  2   A2  5.78    NA 10.25     NA   4.47
# 4  3   A5    NA    NA  4.19     NA   0.00
# 5  4   A3    NA  3.18  7.15     NA   3.97
# 6  5   A5 10.91  4.56  2.45  -6.35  -2.11

CodePudding user response：

Not pretty but this will work for Delta2 using sapply.

df$Delta2 <- sapply(seq_len(nrow(df)), function(x){
  items <- na.exclude(unlist(df[x,c("Time1", "Time2", "Time3")]))
  if(length(items) < 2) return(0)
  else return(items[length(items)] - items[length(items)-1])
})


  ID Type Time1 Time2 Time3 Delta2
1  1   A1 12.23    NA    NA   0.00
2  2   A1  0.35  0.53    NA   0.18
3  2   A2  5.78    NA 10.25   4.47
4  3   A5    NA    NA  4.19   0.00
5  4   A3    NA  3.18  7.15   3.97
6  5   A5 10.91  4.56  2.40  -2.16