This may be a duplicate question, I wouldn't be surprised if that's the case,
This is an example of a dataset I am dealing with
ID Type Time1 Time2 Time3
1 A1 12.23 NA NA
2 A1 0.35 0.53 NA
2 A2 5.78 NA 10.25
3 A5 NA NA 4.19
4 A3 NA 3.18 7.15
5 A5 10.91 4.56 2.45
My goal is to create two columns [Delta1, Delta2] like this
Delta1 : This column stores the difference between values in Time2-Time1 only among rows where all three values : Time1, Time2, Time3 are available. For example last row ID 5 has values for all three time, time1,time2,time3 so Delta1 = 4.56-10.91 = -6.35
Delta2 : This column stores the difference between either Time2-Time1 or Time3-Time1 or Time3-Time2. If a row does not have any two time values, then it is 0
Final expected output
ID Type Time1 Time2 Time3 Delta1 Delta2
1 A1 12.23 NA NA 0
2 A1 0.35 0.53 NA 0.18
2 A2 5.78 NA 10.25 4.47
3 A5 NA NA 4.19 0
4 A3 NA 3.18 7.15 3.97
5 A5 10.91 4.56 2.45 -6.35 -2.11
Any help is much appreciated , thanks in advance.
CodePudding user response:
df$Delta1 <- ifelse(!is.na(df$Time1) & !is.na(df$Time2) & !is.na(df$Time3),
df$Time2 - df$Time1,
NA)
df$Delta2 <- vapply(seq_len(nrow(df)), \(x){
x = na.omit(c(df$Time3[x], df$Time2[x], df$Time1[x]))
x = x[1] - x[2]
if(is.na(x)) return(0)
return(x)
}, 0)
Result:
> df
ID Type Time1 Time2 Time3 Delta1 Delta2
1 1 A1 12.23 NA NA NA 0.00
2 2 A1 0.35 0.53 NA NA 0.18
3 2 A2 5.78 NA 10.25 NA 4.47
4 3 A5 NA NA 4.19 NA 0.00
5 4 A3 NA 3.18 7.15 NA 3.97
6 5 A5 10.91 4.56 2.45 -6.35 -2.11
CodePudding user response:
The criteria you have stated is non-deterministic for the output of Delta2
, I've gone for the max range.
library(dplyr)
data %>%
rowwise() %>%
mutate(
Delta1 = if_else(
anyNA(c_across(starts_with("Time"))),
0,
Time2 - Time1
),
Delta2 = diff(range(na.omit(c_across(starts_with("Time")))))
)
#> # A tibble: 6 × 7
#> # Rowwise:
#> ID Type Time1 Time2 Time3 Delta1 Delta2
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 A1 12.2 NA NA 0 0
#> 2 2 A1 0.35 0.53 NA 0 0.18
#> 3 2 A2 5.78 NA 10.2 0 4.47
#> 4 3 A5 NA NA 4.19 0 0
#> 5 4 A3 NA 3.18 7.15 0 3.97
#> 6 5 A5 10.9 4.56 2.45 -6.35 8.46
CodePudding user response:
For producing column Delta 1 you can also try the following...
data <- data %>%
dplyr::mutate(Delta1 = case_when(
Time1 & Time2 & Time3!="NA" ~ Time2 - Time1))
CodePudding user response:
With dplyr
, you could use coalesce()
to find the first non-missing element.
library(dplyr)
df %>%
mutate(Delta1 = ifelse(if_any(starts_with("Time"), is.na), NA, Time2-Time1),
Delta2 = coalesce(Time3-Time2, Time3-Time1, Time2-Time1, 0))
# ID Type Time1 Time2 Time3 Delta1 Delta2
# 1 1 A1 12.23 NA NA NA 0.00
# 2 2 A1 0.35 0.53 NA NA 0.18
# 3 2 A2 5.78 NA 10.25 NA 4.47
# 4 3 A5 NA NA 4.19 NA 0.00
# 5 4 A3 NA 3.18 7.15 NA 3.97
# 6 5 A5 10.91 4.56 2.45 -6.35 -2.11
CodePudding user response:
Not pretty but this will work for Delta2 using sapply
.
df$Delta2 <- sapply(seq_len(nrow(df)), function(x){
items <- na.exclude(unlist(df[x,c("Time1", "Time2", "Time3")]))
if(length(items) < 2) return(0)
else return(items[length(items)] - items[length(items)-1])
})
ID Type Time1 Time2 Time3 Delta2
1 1 A1 12.23 NA NA 0.00
2 2 A1 0.35 0.53 NA 0.18
3 2 A2 5.78 NA 10.25 4.47
4 3 A5 NA NA 4.19 0.00
5 4 A3 NA 3.18 7.15 3.97
6 5 A5 10.91 4.56 2.40 -2.16