Calculate difference between rows in long data-CodePudding

My database

| ID   | Time   | Score |
|:---- |:------:| -----:|
| 1    | 1      | 105   |
| 1    | 2      | 155   |
| 1    | 3      | 100   |
| 2    | 1      | 105   |
| 2    | 2      | 150   |
| 2    | 3      | Na    |

How do I calculate the difference between last AVAIABLE score (Highest Time) and first score (Lowest Time)? My code so Far

DB = DB %>% 
group_by(ID) %>%
dplyr::mutate (Diff_score = Score[max(Time, na.rm= T) - Score [min(Time, na.rm=T)]) %>%
ungroup()

CodePudding user response：

We can modify the code to get the index of 'Time' with which.max and which.min and use it to the subset the 'Score' to find the difference (inorder to remove the NA cases, create an index 'i1' based on the NA values)

library(dplyr)
DB %>%
    na_if("Na") %>%
    type.convert(as.is = TRUE) %>%
    group_by(ID) %>%
    mutate(i1 = complete.cases(Score),
      Diff_score = Score[i1][which.max(Time[i1])] -
         Score[i1][which.min(Time[i1])], i1 = NULL) %>%
    ungroup

-output

# A tibble: 6 × 4
     ID  Time Score Diff_score
  <int> <int> <int>      <int>
1     1     1   105         -5
2     1     2   155         -5
3     1     3   100         -5
4     2     1   105         45
5     2     2   150         45
6     2     3    NA         45

Or use the range after converting 'Time' to NA where the 'Score' values are NA and make use of na.rm = TRUE in range for subsetting

DB %>% 
  na_if("Na") %>% 
  type.convert(as.is = TRUE) %>%
  group_by(ID) %>%
  mutate(Diff_score = diff(Score[range(NA^is.na(Score) *Time, 
        na.rm = TRUE)])) %>% 
  ungroup

CodePudding user response：

We could also use first and last (with ordering by Time) within the groups:

library(dplyr)

DB |>
group_by(ID) |>
  mutate(diff = last(Score[!is.na(Score)], order_by = Time) - first(Score[!is.na(Score)], order_by = Time)) |>
ungroup()

Output:

# A tibble: 6 × 4
     ID  Time Score  diff
  <dbl> <dbl> <dbl> <dbl>
1     1     1   105    -5
2     1     2   155    -5
3     1     3   100    -5
4     2     1   105    45
5     2     2   150    45
6     2     3    NA    45

Data:

library(readr)

DB <- read_delim("ID   | Time   | Score
         1    | 1      | 105   
1    | 2      | 155   
1    | 3      | 100  
2    | 1      | 105  
2    | 2      | 150  
2    | 3      | NA   ", delim = "|", trim_ws = TRUE)