Here is the reproducible dataset:
id<- c("U1", "U2", "U3", "U2", "U5", "U5")
date<- c("2020-02-01", "2020-05-06", "2020-04-01", "2020-07-09", "2020-11-01", "2020-12-01")
result<- c(1:6)
dt<- data.frame(id, date, result)
> dt
id date result
1 U1 2020-02-01 1
2 U2 2020-05-06 2
3 U3 2020-04-01 3
4 U2 2020-07-09 4
5 U5 2020-11-01 5
6 U5 2020-12-01 6
I want to create a loop (or maybe there is another way) that looks at the Unique ID's and test dates, and adds a new column to tell me which is the test number, ordered by those test dates. So the output would look like this:
id date result type
1 U1 2020-02-01 1 Result 1
2 U2 2020-05-06 2 Result 1
3 U3 2020-04-01 3 Result 1
4 U2 2020-07-09 4 Result 2
5 U5 2020-11-01 5 Result 1
6 U5 2020-12-01 6 Result 2
U2 has two results, ordered by their test date, and U5 has two results, ordered by their test date. As a bonus question, I would also love to find the time difference between the various tests for each Unique ID, again as a separate column. So it would look like this:
id date result type time
1 U1 2020-02-01 1 Result 1 First Test
2 U2 2020-05-06 2 Result 1 First Test
3 U3 2020-04-01 3 Result 1 First Test
4 U2 2020-07-09 4 Result 2 64 Days
5 U5 2020-11-01 5 Result 1 First Test
6 U5 2020-12-01 6 Result 2 30 Days
CodePudding user response:
We can group_by(id)
, arrange
by date, then use row_number()
. It is always advisable to convert dates to proper date class columns before any transformations.
library(dplyr)
library(glue)
dt %>%
mutate(date = as.Date(date)) %>%
group_by(id) %>%
arrange(date) %>%
mutate(type = glue("Result {row_number()}")) %>%
ungroup()
# A tibble: 6 × 4
id date result type
<chr> <date> <int> <glue>
1 U1 2020-02-01 1 Result 1
2 U3 2020-04-01 3 Result 1
3 U2 2020-05-06 2 Result 1
4 U2 2020-07-09 4 Result 2
5 U5 2020-11-01 5 Result 1
6 U5 2020-12-01 6 Result 2
For the bonus question,we can use a simple subtraction date - first(date)
:
dt %>%
mutate(date = as.Date(date)) %>%
group_by(id) %>%
arrange(date) %>%
mutate(type = glue("Result {row_number()}"),
time = date - first(date)) %>%
ungroup()
# A tibble: 6 × 5
id date result type time
<chr> <date> <int> <glue> <drtn>
1 U1 2020-02-01 1 Result 1 0 days
2 U3 2020-04-01 3 Result 1 0 days
3 U2 2020-05-06 2 Result 1 0 days
4 U2 2020-07-09 4 Result 2 64 days
5 U5 2020-11-01 5 Result 1 0 days
6 U5 2020-12-01 6 Result 2 30 days