I am working with a dataset containing observations of animals during a given time period as well as data on how much they ate during that time period and some metadata about the animal their ID number.
Here is a subset of the dataset:
start end id uptake
1: 2017-01-29 10:16:00 2017-01-29 12:25:00 14 2.04
2: 2017-01-29 10:16:00 2017-01-29 12:25:00 21 1.53
3: 2017-01-29 10:16:00 2017-01-29 12:25:00 12 0.17
4: 2017-01-29 10:16:00 2017-01-29 12:25:00 20 1.19
5: 2017-01-29 10:16:00 2017-01-29 12:25:00 19 0.85
6: 2017-01-31 09:48:00 2017-01-31 11:59:00 21 5.27
7: 2017-01-31 09:48:00 2017-01-31 11:59:00 34 1.87
8: 2017-01-31 11:50:00 2017-01-31 14:59:00 21 1.00
9: 2017-01-31 11:50:00 2017-01-31 14:59:00 34 0.14
10: 2017-01-31 11:50:00 2017-01-31 14:59:00 20 1.00
11: 2017-01-31 11:50:00 2017-01-31 14:59:00 19 0.43
12: 2017-01-31 11:50:00 2017-01-31 14:59:00 14 3.43
13: 2017-01-31 15:15:00 2017-01-31 16:21:00 12 1.00
14: 2017-01-31 15:15:00 2017-01-31 16:21:00 20 0.72
15: 2017-01-31 15:15:00 2017-01-31 16:21:00 14 0.86
16: 2017-01-31 15:15:00 2017-01-31 16:21:00 21 0.43
17: 2017-01-31 15:15:00 2017-01-31 16:21:00 19 0.57
18: 2017-02-01 09:55:00 2017-02-01 11:47:00 34 1.62
19: 2017-02-01 09:55:00 2017-02-01 11:47:00 21 3.06
20: 2017-02-01 12:03:00 2017-02-01 15:02:00 19 1.29
21: 2017-02-01 12:03:00 2017-02-01 15:02:00 14 3.86
Normally there is a maximum of one row per individual per day, as there was only one observation period. However, on some days there were multiple observation periods so that some ids have more than one row on these days. For these days I would like to collapse the multiple rows per individual while keeping the earliest start and latest end timestamp of the observation periods of that day, while summing up the uptake value but keeping the id value the same.
I am looking for something like this:
X start end id uptake
1 1 2017-01-29 10:16 2017-01-29 12:25 14 2.04
2 2 2017-01-29 10:16 2017-01-29 12:25 21 1.53
3 3 2017-01-29 10:16 2017-01-29 12:25 12 0.17
4 4 2017-01-29 10:16 2017-01-29 12:25 20 1.19
5 5 2017-01-29 10:16 2017-01-29 12:25 19 0.85
6 6 2017-01-31 09:48 2017-01-31 16:21 21 6.70
7 7 2017-01-31 09:48 2017-01-31 16:21 34 2.01
8 10 2017-01-31 11:50 2017-01-31 16:21 20 1.72
9 11 2017-01-31 11:50 2017-01-31 16:21 19 1.00
10 12 2017-01-31 11:50 2017-01-31 16:21 14 4.29
11 13 2017-01-31 15:15 2017-01-31 16:21 12 1.00
12 18 2017-02-01 09:55 2017-02-01 15:02 34 1.62
13 19 2017-02-01 09:55 2017-02-01 15:02 21 3.06
14 20 2017-02-01 12:03 2017-02-01 15:02 19 1.29
15 21 2017-02-01 12:03 2017-02-01 15:02 14 3.86
CodePudding user response:
Within dplyr
this is a task for group_by
and summarize
:
library(dplyr)
library(lubridate)
df |>
group_by(id, lubridate::date(start)) |>
summarise(start = min(start),
end = max(end),
uptake = sum(uptake)) |>
ungroup() |>
arrange(start)
Output:
# A tibble: 15 × 5
id date start end uptake
<dbl> <date> <dttm> <dttm> <dbl>
1 12 2017-01-29 2017-01-29 10:16:00 2017-01-29 12:25:00 0.17
2 14 2017-01-29 2017-01-29 10:16:00 2017-01-29 12:25:00 2.04
3 19 2017-01-29 2017-01-29 10:16:00 2017-01-29 12:25:00 0.85
4 20 2017-01-29 2017-01-29 10:16:00 2017-01-29 12:25:00 1.19
5 21 2017-01-29 2017-01-29 10:16:00 2017-01-29 12:25:00 1.53
6 21 2017-01-31 2017-01-31 09:48:00 2017-01-31 16:21:00 6.7
7 34 2017-01-31 2017-01-31 09:48:00 2017-01-31 14:59:00 2.01
8 14 2017-01-31 2017-01-31 11:50:00 2017-01-31 16:21:00 4.29
9 19 2017-01-31 2017-01-31 11:50:00 2017-01-31 16:21:00 1
10 20 2017-01-31 2017-01-31 11:50:00 2017-01-31 16:21:00 1.72
11 12 2017-01-31 2017-01-31 15:15:00 2017-01-31 16:21:00 1
12 21 2017-02-01 2017-02-01 09:55:00 2017-02-01 11:47:00 3.06
13 34 2017-02-01 2017-02-01 09:55:00 2017-02-01 11:47:00 1.62
14 14 2017-02-01 2017-02-01 12:03:00 2017-02-01 15:02:00 3.86
15 19 2017-02-01 2017-02-01 12:03:00 2017-02-01 15:02:00 1.29
Data (please include data using dput
next time):
library(readr)
library(dplyr)
df <- read_delim("idx,start,end,id,uptake
1:,2017-01-29 10:16:00,2017-01-29 12:25:00,14,2.04
2:,2017-01-29 10:16:00,2017-01-29 12:25:00,21,1.53
3:,2017-01-29 10:16:00,2017-01-29 12:25:00,12,0.17
4:,2017-01-29 10:16:00,2017-01-29 12:25:00,20,1.19
5:,2017-01-29 10:16:00,2017-01-29 12:25:00,19,0.85
6:,2017-01-31 09:48:00,2017-01-31 11:59:00,21,5.27
7:,2017-01-31 09:48:00,2017-01-31 11:59:00,34,1.87
8:,2017-01-31 11:50:00,2017-01-31 14:59:00,21,1.00
9:,2017-01-31 11:50:00,2017-01-31 14:59:00,34,0.14
10:,2017-01-31 11:50:00,2017-01-31 14:59:00,20,1.00
11:,2017-01-31 11:50:00,2017-01-31 14:59:00,19,0.43
12:,2017-01-31 11:50:00,2017-01-31 14:59:00,14,3.43
13:,2017-01-31 15:15:00,2017-01-31 16:21:00,12,1.00
14:,2017-01-31 15:15:00,2017-01-31 16:21:00,20,0.72
15:,2017-01-31 15:15:00,2017-01-31 16:21:00,14,0.86
16:,2017-01-31 15:15:00,2017-01-31 16:21:00,21,0.43
17:,2017-01-31 15:15:00,2017-01-31 16:21:00,19,0.57
18:,2017-02-01 09:55:00,2017-02-01 11:47:00,34,1.62
19:,2017-02-01 09:55:00,2017-02-01 11:47:00,21,3.06
20:,2017-02-01 12:03:00,2017-02-01 15:02:00,19,1.29
21:,2017-02-01 12:03:00,2017-02-01 15:02:00,14,3.86") |> select(-idx)