My database
| ID | Time | Score |
|:---- |:------:| -----:|
| 1 | 1 | 105 |
| 1 | 2 | 155 |
| 1 | 3 | 100 |
| 2 | 1 | 105 |
| 2 | 2 | 150 |
| 2 | 3 | Na |
How do I calculate the difference between last AVAIABLE score (Highest Time) and first score (Lowest Time)? My code so Far
DB = DB %>%
group_by(ID) %>%
dplyr::mutate (Diff_score = Score[max(Time, na.rm= T) - Score [min(Time, na.rm=T)]) %>%
ungroup()
CodePudding user response:
We can modify the code to get the index of 'Time' with which.max
and which.min
and use it to the subset the 'Score' to find the difference (inorder to remove the NA cases, create an index 'i1' based on the NA values)
library(dplyr)
DB %>%
na_if("Na") %>%
type.convert(as.is = TRUE) %>%
group_by(ID) %>%
mutate(i1 = complete.cases(Score),
Diff_score = Score[i1][which.max(Time[i1])] -
Score[i1][which.min(Time[i1])], i1 = NULL) %>%
ungroup
-output
# A tibble: 6 × 4
ID Time Score Diff_score
<int> <int> <int> <int>
1 1 1 105 -5
2 1 2 155 -5
3 1 3 100 -5
4 2 1 105 45
5 2 2 150 45
6 2 3 NA 45
Or use the range
after converting 'Time' to NA where the 'Score' values are NA and make use of na.rm = TRUE
in range
for subsetting
DB %>%
na_if("Na") %>%
type.convert(as.is = TRUE) %>%
group_by(ID) %>%
mutate(Diff_score = diff(Score[range(NA^is.na(Score) *Time,
na.rm = TRUE)])) %>%
ungroup
CodePudding user response:
We could also use first
and last
(with ordering by Time) within the groups:
library(dplyr)
DB |>
group_by(ID) |>
mutate(diff = last(Score[!is.na(Score)], order_by = Time) - first(Score[!is.na(Score)], order_by = Time)) |>
ungroup()
Output:
# A tibble: 6 × 4
ID Time Score diff
<dbl> <dbl> <dbl> <dbl>
1 1 1 105 -5
2 1 2 155 -5
3 1 3 100 -5
4 2 1 105 45
5 2 2 150 45
6 2 3 NA 45
Data:
library(readr)
DB <- read_delim("ID | Time | Score
1 | 1 | 105
1 | 2 | 155
1 | 3 | 100
2 | 1 | 105
2 | 2 | 150
2 | 3 | NA ", delim = "|", trim_ws = TRUE)