Home > Enterprise >  How to merge rows using start and finish timestamps of a consecutive id count in R, but per-group ba
How to merge rows using start and finish timestamps of a consecutive id count in R, but per-group ba

Time:11-07

The data:

      id station start               stop               
   <int>   <dbl> <dttm>              <dttm>             
 1     1       1 1899-12-31 00:05:04 1899-12-31 00:13:36
 2     2       1 1899-12-31 00:14:04 1899-12-31 00:21:32
 3     3       1 1899-12-31 00:21:32 1899-12-31 00:26:56
 4     4       1 1899-12-31 00:26:57 1899-12-31 00:27:10
 5     1       1 1899-12-31 00:38:32 1899-12-31 00:38:38
 6     3       1 1899-12-31 00:43:23 1899-12-31 00:43:47
 7     5       1 1899-12-31 00:53:00 1899-12-31 00:53:15
 8     6       1 1899-12-31 00:53:29 1899-12-31 00:55:49
 9     6       1 1899-12-31 00:56:15 1899-12-31 00:56:42
10     7       1 1899-12-31 01:14:30 1899-12-31 01:28:09
11     3       1 1899-12-31 01:28:09 1899-12-31 01:31:17
12     3       2 1899-12-31 01:34:22 1899-12-31 01:35:11
13     7       3 1899-12-31 01:36:44 1899-12-31 01:39:54
14     8       3 1899-12-31 01:41:49 1899-12-31 01:44:45
15     9       2 1899-12-31 01:44:47 1899-12-31 01:52:07

The need:

I need to merge the times of repeated id's so that they count as just one. For example, id #6 has two consecutive actions, each with a start and stop times (lines 8 and 9). The desired output would be that instead of this:

11     3       1 1899-12-31 01:28:09 1899-12-31 01:31:17
12     3       2 1899-12-31 01:34:22 1899-12-31 01:35:11

Something like this would be shown

11     3       1 1899-12-31 01:28:09 1899-12-31 01:35:11

So, both events as a single event with the first start timestamp, and the stop of the second one.

The caveat

This solution should merge times only if they belong to the same station, so for example in the sample data id #3 in line 11 would not be merged with line 12, since they belong to different stations.

A solution using tools from tidyverse is preferred but not necessary.

CodePudding user response:

Perhaps you could try the following. Assign a unique second id id2 that will be the same for repeat, consecutive id's. Then, after grouping by the new id2, summarise and take the first start and the last stop dates/times.

library(tidyverse)
library(data.table)

df %>%
  mutate(id2 = rleid(id)) %>%
  group_by(id2) %>%
  summarise(id = first(id), start = first(start), stop = last(stop)) %>%
  ungroup() %>%
  select(id, start, stop)

Output

      id start               stop               
   <int> <dttm>              <dttm>             
 1     1 1899-12-31 00:05:04 1899-12-31 00:13:36
 2     2 1899-12-31 00:14:04 1899-12-31 00:21:32
 3     3 1899-12-31 00:21:32 1899-12-31 00:26:56
 4     4 1899-12-31 00:26:57 1899-12-31 00:27:10
 5     1 1899-12-31 00:38:32 1899-12-31 00:38:38
 6     3 1899-12-31 00:43:23 1899-12-31 00:43:47
 7     5 1899-12-31 00:53:00 1899-12-31 00:53:15
 8     6 1899-12-31 00:53:29 1899-12-31 00:56:42
 9     7 1899-12-31 01:14:30 1899-12-31 01:28:09
10     3 1899-12-31 01:28:09 1899-12-31 01:35:11
11     7 1899-12-31 01:36:44 1899-12-31 01:39:54
12     8 1899-12-31 01:41:49 1899-12-31 01:44:45
13     9 1899-12-31 01:44:47 1899-12-31 01:52:07

For the caveat with different stations for a given id you can add station to rleid, such that you will get a unique id2 for each combination of id and station combined.

In addition, you can use unique as alternative to first in summarise to keep the column values you need (just an option).

df %>%
  mutate(id2 = rleid(id, station)) %>%
  group_by(id2) %>%
  summarise(id = unique(id), station = unique(station), start = first(start), stop = last(stop)) %>%
  ungroup() %>%
  select(id, station, start, stop)

Output

      id station start               stop               
   <int>   <dbl> <dttm>              <dttm>             
 1     1       1 1899-12-31 00:05:04 1899-12-31 00:13:36
 2     2       1 1899-12-31 00:14:04 1899-12-31 00:21:32
 3     3       1 1899-12-31 00:21:32 1899-12-31 00:26:56
 4     4       1 1899-12-31 00:26:57 1899-12-31 00:27:10
 5     1       1 1899-12-31 00:38:32 1899-12-31 00:38:38
 6     3       1 1899-12-31 00:43:23 1899-12-31 00:43:47
 7     5       1 1899-12-31 00:53:00 1899-12-31 00:53:15
 8     6       1 1899-12-31 00:53:29 1899-12-31 00:56:42
 9     7       1 1899-12-31 01:14:30 1899-12-31 01:28:09
10     3       1 1899-12-31 01:28:09 1899-12-31 01:31:17
11     3       2 1899-12-31 01:34:22 1899-12-31 01:35:11
12     7       3 1899-12-31 01:36:44 1899-12-31 01:39:54
13     8       3 1899-12-31 01:41:49 1899-12-31 01:44:45
14     9       2 1899-12-31 01:44:47 1899-12-31 01:52:07
  • Related