Home > Back-end >  How do I select certain time interval before and after complete.cases from two merged dataframes?
How do I select certain time interval before and after complete.cases from two merged dataframes?

Time:10-08

I have merged two data frames by time:

Mergeboth <- merge(test, deeptest, by.x="date.time", by.y="Bottom.Start", all=TRUE)

The test dataframe is very large (over 1 million datapoints) while the deeptest is only around 100. Checking with complete.cases() it correctly merged the two dataframes. Now that the dataframes are merged, I would like to select a certain time interval before and after the complete.cases (let's say 2 minutes for ease of output- but my actual goal is 12 hours), assign a new number to all of those newly selected rows and output it into a new dataframe: df_new.

So my current dataframe looks like this:

MergeBoth()
 Bottom.Start           Dive Max.Depth Depth
 1: 2015-12-22 01:07:00   NA        NA 311.0
 2: 2015-12-22 01:07:10   NA        NA 308.5
 3: 2015-12-22 01:07:20   NA        NA 307.0
 4: 2015-12-22 01:07:30   NA        NA 306.5
 5: 2015-12-22 01:07:40   NA        NA 305.5
 6: 2015-12-22 01:07:50   NA        NA 308.5
 7: 2015-12-22 01:08:00   NA        NA 318.5
 8: 2015-12-22 01:08:10   NA        NA 331.0
 9: 2015-12-22 01:08:20   NA        NA 345.5
10: 2015-12-22 01:08:30   NA        NA 361.0
11: 2015-12-22 01:08:40   NA        NA 376.5
12: 2015-12-22 01:08:50   NA        NA 392.0
13: 2015-12-22 01:09:00   NA        NA 408.5
14: 2015-12-22 01:09:10   NA        NA 425.5
15: 2015-12-22 01:09:20   NA        NA 442.5
16: 2015-12-22 01:09:30   NA        NA 459.0
17: 2015-12-22 01:09:40   NA        NA 475.0
18: 2015-12-22 01:09:50   NA        NA 491.0
19: 2015-12-22 01:10:00 1238       727 508.0
20: 2015-12-22 01:10:10   NA        NA 523.5
21: 2015-12-22 01:10:20   NA        NA 540.5
22: 2015-12-22 01:10:30   NA        NA 556.5
23: 2015-12-22 01:10:40   NA        NA 572.5
24: 2015-12-22 01:10:50   NA        NA 581.0
25: 2015-12-22 01:11:00   NA        NA 583.5
26: 2015-12-22 01:11:10   NA        NA 582.0
27: 2015-12-22 01:11:20   NA        NA 581.0
28: 2015-12-22 01:11:30   NA        NA 581.0
29: 2015-12-22 01:11:40   NA        NA 581.0
30: 2015-12-22 01:11:50   NA        NA 581.0
31: 2015-12-22 01:12:00   NA        NA 582.0
32: 2015-12-22 01:12:10   NA        NA 581.5
33: 2015-12-22 01:12:20   NA        NA 581.5
34: 2015-12-22 01:12:30   NA        NA 592.5
35: 2015-12-22 01:12:40   NA        NA 606.5
36: 2015-12-22 01:12:50   NA        NA 621.5
37: 2015-12-22 01:13:00   NA        NA 637.5
38: 2015-12-22 01:13:10   NA        NA 655.0
39: 2015-12-23 07:17:00   NA        NA 863.0
40: 2015-12-23 07:17:10   NA        NA 863.5
41: 2015-12-23 07:17:20   NA        NA 865.0
42: 2015-12-23 07:17:30   NA        NA 866.0
43: 2015-12-23 07:17:40   NA        NA 867.0
44: 2015-12-23 07:17:50   NA        NA 867.5
45: 2015-12-23 07:18:00   NA        NA 868.5
46: 2015-12-23 07:18:10   NA        NA 870.0
47: 2015-12-23 07:18:20   NA        NA 870.5
48: 2015-12-23 07:18:30   NA        NA 871.0
49: 2015-12-23 07:18:40   NA        NA 872.0
50: 2015-12-23 07:18:50   NA        NA 872.0
51: 2015-12-23 07:19:00 1267       970 874.0
52: 2015-12-23 07:19:10   NA        NA 875.0
53: 2015-12-23 07:19:20   NA        NA 875.0
54: 2015-12-23 07:19:30   NA        NA 876.0
55: 2015-12-23 07:19:40   NA        NA 876.5
56: 2015-12-23 07:19:50   NA        NA 876.0
57: 2015-12-23 07:20:00   NA        NA 878.0
58: 2015-12-23 07:20:10   NA        NA 876.0
59: 2015-12-23 07:20:20   NA        NA 875.5
60: 2015-12-23 07:20:30   NA        NA 875.5
61: 2015-12-23 07:20:40   NA        NA 874.0
62: 2015-12-23 07:20:50   NA        NA 872.5
63: 2015-12-23 07:21:00   NA        NA 870.5
64: 2015-12-23 07:21:10   NA        NA 867.0
65: 2015-12-23 07:21:20   NA        NA 863.5
66: 2015-12-23 07:21:30   NA        NA 860.5
67: 2015-12-23 07:21:40   NA        NA 859.0
68: 2015-12-23 07:21:50   NA        NA 861.0
69: 2015-12-23 07:22:00   NA        NA 864.5
70: 2015-12-23 07:22:10   NA        NA 868.5
71: 2015-12-23 07:22:20   NA        NA 874.5
72: 2015-12-23 07:22:30   NA        NA 882.0
73: 2015-12-23 07:22:40   NA        NA 894.0
74: 2015-12-23 07:22:50   NA        NA 907.0
75: 2015-12-23 07:23:00   NA        NA 922.5

And I am trying to get this:

df_new()
            Bottom.Start Dive Max.Depth Depth DiveNumber
 1: 2015-12-22 01:08:00   NA        NA 318.5          1
 2: 2015-12-22 01:08:10   NA        NA 331.0          1
 3: 2015-12-22 01:08:20   NA        NA 345.5          1
 4: 2015-12-22 01:08:30   NA        NA 361.0          1
 5: 2015-12-22 01:08:40   NA        NA 376.5          1
 6: 2015-12-22 01:08:50   NA        NA 392.0          1
 7: 2015-12-22 01:09:00   NA        NA 408.5          1
 8: 2015-12-22 01:09:10   NA        NA 425.5          1
 9: 2015-12-22 01:09:20   NA        NA 442.5          1
10: 2015-12-22 01:09:30   NA        NA 459.0          1
11: 2015-12-22 01:09:40   NA        NA 475.0          1
12: 2015-12-22 01:09:50   NA        NA 491.0          1
13: 2015-12-22 01:10:00 1238       727 508.0          1
14: 2015-12-22 01:10:10   NA        NA 523.5          1
15: 2015-12-22 01:10:20   NA        NA 540.5          1
16: 2015-12-22 01:10:30   NA        NA 556.5          1
17: 2015-12-22 01:10:40   NA        NA 572.5          1
18: 2015-12-22 01:10:50   NA        NA 581.0          1
19: 2015-12-22 01:11:00   NA        NA 583.5          1
20: 2015-12-22 01:11:10   NA        NA 582.0          1
21: 2015-12-22 01:11:20   NA        NA 581.0          1
22: 2015-12-22 01:11:30   NA        NA 581.0          1
23: 2015-12-22 01:11:40   NA        NA 581.0          1
24: 2015-12-22 01:11:50   NA        NA 581.0          1
25: 2015-12-22 01:12:00   NA        NA 582.0          1
26: 2015-12-23 07:17:00   NA        NA 863.0          2
27: 2015-12-23 07:17:10   NA        NA 863.5          2
28: 2015-12-23 07:17:20   NA        NA 865.0          2
29: 2015-12-23 07:17:30   NA        NA 866.0          2
30: 2015-12-23 07:17:40   NA        NA 867.0          2
31: 2015-12-23 07:17:50   NA        NA 867.5          2
32: 2015-12-23 07:18:00   NA        NA 868.5          2
33: 2015-12-23 07:18:10   NA        NA 870.0          2
34: 2015-12-23 07:18:20   NA        NA 870.5          2
35: 2015-12-23 07:18:30   NA        NA 871.0          2
36: 2015-12-23 07:18:40   NA        NA 872.0          2
37: 2015-12-23 07:18:50   NA        NA 872.0          2
38: 2015-12-23 07:19:00 1267       970 874.0          2
39: 2015-12-23 07:19:10   NA        NA 875.0          2
40: 2015-12-23 07:19:20   NA        NA 875.0          2
41: 2015-12-23 07:19:30   NA        NA 876.0          2
42: 2015-12-23 07:19:40   NA        NA 876.5          2
43: 2015-12-23 07:19:50   NA        NA 876.0          2
44: 2015-12-23 07:20:00   NA        NA 878.0          2
45: 2015-12-23 07:20:10   NA        NA 876.0          2
46: 2015-12-23 07:20:20   NA        NA 875.5          2
47: 2015-12-23 07:20:30   NA        NA 875.5          2
48: 2015-12-23 07:20:40   NA        NA 874.0          2
49: 2015-12-23 07:20:50   NA        NA 872.5          2
50: 2015-12-23 07:21:00   NA        NA 870.5          2

Here is my dput() for the dataframe:

structure(list(Bottom.Start = structure(c(1450746420, 1450746430, 
1450746440, 1450746450, 1450746460, 1450746470, 1450746480, 1450746490, 
1450746500, 1450746510, 1450746520, 1450746530, 1450746540, 1450746550, 
1450746560, 1450746570, 1450746580, 1450746590, 1450746600, 1450746610, 
1450746620, 1450746630, 1450746640, 1450746650, 1450746660, 1450746670, 
1450746680, 1450746690, 1450746700, 1450746710, 1450746720, 1450746730, 
1450746740, 1450746750, 1450746760, 1450746770, 1450746780, 1450746790, 
1450855020, 1450855030, 1450855040, 1450855050, 1450855060, 1450855070, 
1450855080, 1450855090, 1450855100, 1450855110, 1450855120, 1450855130, 
1450855140, 1450855150, 1450855160, 1450855170, 1450855180, 1450855190, 
1450855200, 1450855210, 1450855220, 1450855230, 1450855240, 1450855250, 
1450855260, 1450855270, 1450855280, 1450855290, 1450855300, 1450855310, 
1450855320, 1450855330, 1450855340, 1450855350, 1450855360, 1450855370, 
1450855380), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    Dive = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, 1238L, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, 1267L, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), Max.Depth = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 727, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    970, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Depth = c(311, 
    308.5, 307, 306.5, 305.5, 308.5, 318.5, 331, 345.5, 361, 
    376.5, 392, 408.5, 425.5, 442.5, 459, 475, 491, 508, 523.5, 
    540.5, 556.5, 572.5, 581, 583.5, 582, 581, 581, 581, 581, 
    582, 581.5, 581.5, 592.5, 606.5, 621.5, 637.5, 655, 863, 
    863.5, 865, 866, 867, 867.5, 868.5, 870, 870.5, 871, 872, 
    872, 874, 875, 875, 876, 876.5, 876, 878, 876, 875.5, 875.5, 
    874, 872.5, 870.5, 867, 863.5, 860.5, 859, 861, 864.5, 868.5, 
    874.5, 882, 894, 907, 922.5)), row.names = c(NA, -75L), class = c("data.table", 
"data.frame"))

I have tried lag() and tail() and a combination of dplyr arguments including case_when() and mutate() but cannot seem to get my desired output. I am relatively new to R and appreciate the help!

CodePudding user response:

You may try using difftime lag and cumsum -

library(dplyr)

df %>%
  mutate(DiveNumber = cumsum(c(TRUE, difftime(Bottom.Start, 
                             lag(Bottom.Start), units = 'mins')[-1] > 2)))

#           Bottom.Start Dive Max.Depth Depth DiveNumber
# 1: 2015-12-22 01:07:00   NA        NA 311.0          1
# 2: 2015-12-22 01:07:10   NA        NA 308.5          1
# 3: 2015-12-22 01:07:20   NA        NA 307.0          1
# 4: 2015-12-22 01:07:30   NA        NA 306.5          1
# 5: 2015-12-22 01:07:40   NA        NA 305.5          1
# 6: 2015-12-22 01:07:50   NA        NA 308.5          1
# 7: 2015-12-22 01:08:00   NA        NA 318.5          1
# 8: 2015-12-22 01:08:10   NA        NA 331.0          1
#...
#...
#37: 2015-12-22 01:13:00   NA        NA 637.5          1
#38: 2015-12-22 01:13:10   NA        NA 655.0          1
#39: 2015-12-23 07:17:00   NA        NA 863.0          2
#40: 2015-12-23 07:17:10   NA        NA 863.5          2
#41: 2015-12-23 07:17:20   NA        NA 865.0          2
#42: 2015-12-23 07:17:30   NA        NA 866.0          2
#43: 2015-12-23 07:17:40   NA        NA 867.0          2
#...
#...
#74: 2015-12-23 07:22:50   NA        NA 907.0          2
#75: 2015-12-23 07:23:00   NA        NA 922.5          2

For your actual goal of 12 hours you may change units = 'mins' to units = 'hours' and > 2 to > 12.

CodePudding user response:

I'm not sure what your original data.frames look like before the merge, but this is something to consider (using data.table which may be relatively faster).

First, you can create (or you may already have?) a small data.frame that includes complete cases. You can enumerate the dives and determine your time ranges of interest for these rows of data.

Then, you can perform a non-equi join and select rows of data that have Bottom.Start times within the determined time range.

library(data.table)
library(lubridate)

df_c <- df[complete.cases(df), ]
df_c[, `:=` (DiveNumber = seq_len(.N), Start = Bottom.Start - minutes(2), End = Bottom.Start   minutes(2))]

df[df_c, .(Bottom.Start = x.Bottom.Start, Dive, Max.Depth, Depth, DiveNumber), on = .(Bottom.Start >= Start, Bottom.Start <= End)]

Output

           Bottom.Start Dive Max.Depth Depth DiveNumber
 1: 2015-12-22 01:08:00   NA        NA 318.5          1
 2: 2015-12-22 01:08:10   NA        NA 331.0          1
 3: 2015-12-22 01:08:20   NA        NA 345.5          1
 4: 2015-12-22 01:08:30   NA        NA 361.0          1
 5: 2015-12-22 01:08:40   NA        NA 376.5          1
 6: 2015-12-22 01:08:50   NA        NA 392.0          1
 7: 2015-12-22 01:09:00   NA        NA 408.5          1
 8: 2015-12-22 01:09:10   NA        NA 425.5          1
 9: 2015-12-22 01:09:20   NA        NA 442.5          1
10: 2015-12-22 01:09:30   NA        NA 459.0          1
11: 2015-12-22 01:09:40   NA        NA 475.0          1
12: 2015-12-22 01:09:50   NA        NA 491.0          1
13: 2015-12-22 01:10:00 1238       727 508.0          1
14: 2015-12-22 01:10:10   NA        NA 523.5          1
15: 2015-12-22 01:10:20   NA        NA 540.5          1
16: 2015-12-22 01:10:30   NA        NA 556.5          1
17: 2015-12-22 01:10:40   NA        NA 572.5          1
18: 2015-12-22 01:10:50   NA        NA 581.0          1
19: 2015-12-22 01:11:00   NA        NA 583.5          1
20: 2015-12-22 01:11:10   NA        NA 582.0          1
21: 2015-12-22 01:11:20   NA        NA 581.0          1
22: 2015-12-22 01:11:30   NA        NA 581.0          1
23: 2015-12-22 01:11:40   NA        NA 581.0          1
24: 2015-12-22 01:11:50   NA        NA 581.0          1
25: 2015-12-22 01:12:00   NA        NA 582.0          1
26: 2015-12-23 07:17:00   NA        NA 863.0          2
27: 2015-12-23 07:17:10   NA        NA 863.5          2
28: 2015-12-23 07:17:20   NA        NA 865.0          2
29: 2015-12-23 07:17:30   NA        NA 866.0          2
30: 2015-12-23 07:17:40   NA        NA 867.0          2
31: 2015-12-23 07:17:50   NA        NA 867.5          2
32: 2015-12-23 07:18:00   NA        NA 868.5          2
33: 2015-12-23 07:18:10   NA        NA 870.0          2
34: 2015-12-23 07:18:20   NA        NA 870.5          2
35: 2015-12-23 07:18:30   NA        NA 871.0          2
36: 2015-12-23 07:18:40   NA        NA 872.0          2
37: 2015-12-23 07:18:50   NA        NA 872.0          2
38: 2015-12-23 07:19:00 1267       970 874.0          2
39: 2015-12-23 07:19:10   NA        NA 875.0          2
40: 2015-12-23 07:19:20   NA        NA 875.0          2
41: 2015-12-23 07:19:30   NA        NA 876.0          2
42: 2015-12-23 07:19:40   NA        NA 876.5          2
43: 2015-12-23 07:19:50   NA        NA 876.0          2
44: 2015-12-23 07:20:00   NA        NA 878.0          2
45: 2015-12-23 07:20:10   NA        NA 876.0          2
46: 2015-12-23 07:20:20   NA        NA 875.5          2
47: 2015-12-23 07:20:30   NA        NA 875.5          2
48: 2015-12-23 07:20:40   NA        NA 874.0          2
49: 2015-12-23 07:20:50   NA        NA 872.5          2
50: 2015-12-23 07:21:00   NA        NA 870.5          2
           Bottom.Start Dive Max.Depth Depth DiveNumber
  • Related