I have merged two data frames by time:
Mergeboth <- merge(test, deeptest, by.x="date.time", by.y="Bottom.Start", all=TRUE)
The test
dataframe is very large (over 1 million datapoints) while the deeptest
is only around 100. Checking with complete.cases()
it correctly merged the two dataframes. Now that the dataframes are merged, I would like to select a certain time interval before and after the complete.cases (let's say 2 minutes for ease of output- but my actual goal is 12 hours), assign a new number to all of those newly selected rows and output it into a new dataframe: df_new
.
So my current dataframe looks like this:
MergeBoth()
Bottom.Start Dive Max.Depth Depth
1: 2015-12-22 01:07:00 NA NA 311.0
2: 2015-12-22 01:07:10 NA NA 308.5
3: 2015-12-22 01:07:20 NA NA 307.0
4: 2015-12-22 01:07:30 NA NA 306.5
5: 2015-12-22 01:07:40 NA NA 305.5
6: 2015-12-22 01:07:50 NA NA 308.5
7: 2015-12-22 01:08:00 NA NA 318.5
8: 2015-12-22 01:08:10 NA NA 331.0
9: 2015-12-22 01:08:20 NA NA 345.5
10: 2015-12-22 01:08:30 NA NA 361.0
11: 2015-12-22 01:08:40 NA NA 376.5
12: 2015-12-22 01:08:50 NA NA 392.0
13: 2015-12-22 01:09:00 NA NA 408.5
14: 2015-12-22 01:09:10 NA NA 425.5
15: 2015-12-22 01:09:20 NA NA 442.5
16: 2015-12-22 01:09:30 NA NA 459.0
17: 2015-12-22 01:09:40 NA NA 475.0
18: 2015-12-22 01:09:50 NA NA 491.0
19: 2015-12-22 01:10:00 1238 727 508.0
20: 2015-12-22 01:10:10 NA NA 523.5
21: 2015-12-22 01:10:20 NA NA 540.5
22: 2015-12-22 01:10:30 NA NA 556.5
23: 2015-12-22 01:10:40 NA NA 572.5
24: 2015-12-22 01:10:50 NA NA 581.0
25: 2015-12-22 01:11:00 NA NA 583.5
26: 2015-12-22 01:11:10 NA NA 582.0
27: 2015-12-22 01:11:20 NA NA 581.0
28: 2015-12-22 01:11:30 NA NA 581.0
29: 2015-12-22 01:11:40 NA NA 581.0
30: 2015-12-22 01:11:50 NA NA 581.0
31: 2015-12-22 01:12:00 NA NA 582.0
32: 2015-12-22 01:12:10 NA NA 581.5
33: 2015-12-22 01:12:20 NA NA 581.5
34: 2015-12-22 01:12:30 NA NA 592.5
35: 2015-12-22 01:12:40 NA NA 606.5
36: 2015-12-22 01:12:50 NA NA 621.5
37: 2015-12-22 01:13:00 NA NA 637.5
38: 2015-12-22 01:13:10 NA NA 655.0
39: 2015-12-23 07:17:00 NA NA 863.0
40: 2015-12-23 07:17:10 NA NA 863.5
41: 2015-12-23 07:17:20 NA NA 865.0
42: 2015-12-23 07:17:30 NA NA 866.0
43: 2015-12-23 07:17:40 NA NA 867.0
44: 2015-12-23 07:17:50 NA NA 867.5
45: 2015-12-23 07:18:00 NA NA 868.5
46: 2015-12-23 07:18:10 NA NA 870.0
47: 2015-12-23 07:18:20 NA NA 870.5
48: 2015-12-23 07:18:30 NA NA 871.0
49: 2015-12-23 07:18:40 NA NA 872.0
50: 2015-12-23 07:18:50 NA NA 872.0
51: 2015-12-23 07:19:00 1267 970 874.0
52: 2015-12-23 07:19:10 NA NA 875.0
53: 2015-12-23 07:19:20 NA NA 875.0
54: 2015-12-23 07:19:30 NA NA 876.0
55: 2015-12-23 07:19:40 NA NA 876.5
56: 2015-12-23 07:19:50 NA NA 876.0
57: 2015-12-23 07:20:00 NA NA 878.0
58: 2015-12-23 07:20:10 NA NA 876.0
59: 2015-12-23 07:20:20 NA NA 875.5
60: 2015-12-23 07:20:30 NA NA 875.5
61: 2015-12-23 07:20:40 NA NA 874.0
62: 2015-12-23 07:20:50 NA NA 872.5
63: 2015-12-23 07:21:00 NA NA 870.5
64: 2015-12-23 07:21:10 NA NA 867.0
65: 2015-12-23 07:21:20 NA NA 863.5
66: 2015-12-23 07:21:30 NA NA 860.5
67: 2015-12-23 07:21:40 NA NA 859.0
68: 2015-12-23 07:21:50 NA NA 861.0
69: 2015-12-23 07:22:00 NA NA 864.5
70: 2015-12-23 07:22:10 NA NA 868.5
71: 2015-12-23 07:22:20 NA NA 874.5
72: 2015-12-23 07:22:30 NA NA 882.0
73: 2015-12-23 07:22:40 NA NA 894.0
74: 2015-12-23 07:22:50 NA NA 907.0
75: 2015-12-23 07:23:00 NA NA 922.5
And I am trying to get this:
df_new()
Bottom.Start Dive Max.Depth Depth DiveNumber
1: 2015-12-22 01:08:00 NA NA 318.5 1
2: 2015-12-22 01:08:10 NA NA 331.0 1
3: 2015-12-22 01:08:20 NA NA 345.5 1
4: 2015-12-22 01:08:30 NA NA 361.0 1
5: 2015-12-22 01:08:40 NA NA 376.5 1
6: 2015-12-22 01:08:50 NA NA 392.0 1
7: 2015-12-22 01:09:00 NA NA 408.5 1
8: 2015-12-22 01:09:10 NA NA 425.5 1
9: 2015-12-22 01:09:20 NA NA 442.5 1
10: 2015-12-22 01:09:30 NA NA 459.0 1
11: 2015-12-22 01:09:40 NA NA 475.0 1
12: 2015-12-22 01:09:50 NA NA 491.0 1
13: 2015-12-22 01:10:00 1238 727 508.0 1
14: 2015-12-22 01:10:10 NA NA 523.5 1
15: 2015-12-22 01:10:20 NA NA 540.5 1
16: 2015-12-22 01:10:30 NA NA 556.5 1
17: 2015-12-22 01:10:40 NA NA 572.5 1
18: 2015-12-22 01:10:50 NA NA 581.0 1
19: 2015-12-22 01:11:00 NA NA 583.5 1
20: 2015-12-22 01:11:10 NA NA 582.0 1
21: 2015-12-22 01:11:20 NA NA 581.0 1
22: 2015-12-22 01:11:30 NA NA 581.0 1
23: 2015-12-22 01:11:40 NA NA 581.0 1
24: 2015-12-22 01:11:50 NA NA 581.0 1
25: 2015-12-22 01:12:00 NA NA 582.0 1
26: 2015-12-23 07:17:00 NA NA 863.0 2
27: 2015-12-23 07:17:10 NA NA 863.5 2
28: 2015-12-23 07:17:20 NA NA 865.0 2
29: 2015-12-23 07:17:30 NA NA 866.0 2
30: 2015-12-23 07:17:40 NA NA 867.0 2
31: 2015-12-23 07:17:50 NA NA 867.5 2
32: 2015-12-23 07:18:00 NA NA 868.5 2
33: 2015-12-23 07:18:10 NA NA 870.0 2
34: 2015-12-23 07:18:20 NA NA 870.5 2
35: 2015-12-23 07:18:30 NA NA 871.0 2
36: 2015-12-23 07:18:40 NA NA 872.0 2
37: 2015-12-23 07:18:50 NA NA 872.0 2
38: 2015-12-23 07:19:00 1267 970 874.0 2
39: 2015-12-23 07:19:10 NA NA 875.0 2
40: 2015-12-23 07:19:20 NA NA 875.0 2
41: 2015-12-23 07:19:30 NA NA 876.0 2
42: 2015-12-23 07:19:40 NA NA 876.5 2
43: 2015-12-23 07:19:50 NA NA 876.0 2
44: 2015-12-23 07:20:00 NA NA 878.0 2
45: 2015-12-23 07:20:10 NA NA 876.0 2
46: 2015-12-23 07:20:20 NA NA 875.5 2
47: 2015-12-23 07:20:30 NA NA 875.5 2
48: 2015-12-23 07:20:40 NA NA 874.0 2
49: 2015-12-23 07:20:50 NA NA 872.5 2
50: 2015-12-23 07:21:00 NA NA 870.5 2
Here is my dput()
for the dataframe:
structure(list(Bottom.Start = structure(c(1450746420, 1450746430,
1450746440, 1450746450, 1450746460, 1450746470, 1450746480, 1450746490,
1450746500, 1450746510, 1450746520, 1450746530, 1450746540, 1450746550,
1450746560, 1450746570, 1450746580, 1450746590, 1450746600, 1450746610,
1450746620, 1450746630, 1450746640, 1450746650, 1450746660, 1450746670,
1450746680, 1450746690, 1450746700, 1450746710, 1450746720, 1450746730,
1450746740, 1450746750, 1450746760, 1450746770, 1450746780, 1450746790,
1450855020, 1450855030, 1450855040, 1450855050, 1450855060, 1450855070,
1450855080, 1450855090, 1450855100, 1450855110, 1450855120, 1450855130,
1450855140, 1450855150, 1450855160, 1450855170, 1450855180, 1450855190,
1450855200, 1450855210, 1450855220, 1450855230, 1450855240, 1450855250,
1450855260, 1450855270, 1450855280, 1450855290, 1450855300, 1450855310,
1450855320, 1450855330, 1450855340, 1450855350, 1450855360, 1450855370,
1450855380), tzone = "UTC", class = c("POSIXct", "POSIXt")),
Dive = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1238L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1267L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), Max.Depth = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 727, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
970, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Depth = c(311,
308.5, 307, 306.5, 305.5, 308.5, 318.5, 331, 345.5, 361,
376.5, 392, 408.5, 425.5, 442.5, 459, 475, 491, 508, 523.5,
540.5, 556.5, 572.5, 581, 583.5, 582, 581, 581, 581, 581,
582, 581.5, 581.5, 592.5, 606.5, 621.5, 637.5, 655, 863,
863.5, 865, 866, 867, 867.5, 868.5, 870, 870.5, 871, 872,
872, 874, 875, 875, 876, 876.5, 876, 878, 876, 875.5, 875.5,
874, 872.5, 870.5, 867, 863.5, 860.5, 859, 861, 864.5, 868.5,
874.5, 882, 894, 907, 922.5)), row.names = c(NA, -75L), class = c("data.table",
"data.frame"))
I have tried lag()
and tail()
and a combination of dplyr arguments including case_when()
and mutate()
but cannot seem to get my desired output. I am relatively new to R and appreciate the help!
CodePudding user response:
You may try using difftime
lag
and cumsum
-
library(dplyr)
df %>%
mutate(DiveNumber = cumsum(c(TRUE, difftime(Bottom.Start,
lag(Bottom.Start), units = 'mins')[-1] > 2)))
# Bottom.Start Dive Max.Depth Depth DiveNumber
# 1: 2015-12-22 01:07:00 NA NA 311.0 1
# 2: 2015-12-22 01:07:10 NA NA 308.5 1
# 3: 2015-12-22 01:07:20 NA NA 307.0 1
# 4: 2015-12-22 01:07:30 NA NA 306.5 1
# 5: 2015-12-22 01:07:40 NA NA 305.5 1
# 6: 2015-12-22 01:07:50 NA NA 308.5 1
# 7: 2015-12-22 01:08:00 NA NA 318.5 1
# 8: 2015-12-22 01:08:10 NA NA 331.0 1
#...
#...
#37: 2015-12-22 01:13:00 NA NA 637.5 1
#38: 2015-12-22 01:13:10 NA NA 655.0 1
#39: 2015-12-23 07:17:00 NA NA 863.0 2
#40: 2015-12-23 07:17:10 NA NA 863.5 2
#41: 2015-12-23 07:17:20 NA NA 865.0 2
#42: 2015-12-23 07:17:30 NA NA 866.0 2
#43: 2015-12-23 07:17:40 NA NA 867.0 2
#...
#...
#74: 2015-12-23 07:22:50 NA NA 907.0 2
#75: 2015-12-23 07:23:00 NA NA 922.5 2
For your actual goal of 12 hours you may change units = 'mins'
to units = 'hours'
and > 2
to > 12
.
CodePudding user response:
I'm not sure what your original data.frames look like before the merge, but this is something to consider (using data.table
which may be relatively faster).
First, you can create (or you may already have?) a small data.frame that includes complete cases. You can enumerate the dives and determine your time ranges of interest for these rows of data.
Then, you can perform a non-equi join and select rows of data that have Bottom.Start
times within the determined time range.
library(data.table)
library(lubridate)
df_c <- df[complete.cases(df), ]
df_c[, `:=` (DiveNumber = seq_len(.N), Start = Bottom.Start - minutes(2), End = Bottom.Start minutes(2))]
df[df_c, .(Bottom.Start = x.Bottom.Start, Dive, Max.Depth, Depth, DiveNumber), on = .(Bottom.Start >= Start, Bottom.Start <= End)]
Output
Bottom.Start Dive Max.Depth Depth DiveNumber
1: 2015-12-22 01:08:00 NA NA 318.5 1
2: 2015-12-22 01:08:10 NA NA 331.0 1
3: 2015-12-22 01:08:20 NA NA 345.5 1
4: 2015-12-22 01:08:30 NA NA 361.0 1
5: 2015-12-22 01:08:40 NA NA 376.5 1
6: 2015-12-22 01:08:50 NA NA 392.0 1
7: 2015-12-22 01:09:00 NA NA 408.5 1
8: 2015-12-22 01:09:10 NA NA 425.5 1
9: 2015-12-22 01:09:20 NA NA 442.5 1
10: 2015-12-22 01:09:30 NA NA 459.0 1
11: 2015-12-22 01:09:40 NA NA 475.0 1
12: 2015-12-22 01:09:50 NA NA 491.0 1
13: 2015-12-22 01:10:00 1238 727 508.0 1
14: 2015-12-22 01:10:10 NA NA 523.5 1
15: 2015-12-22 01:10:20 NA NA 540.5 1
16: 2015-12-22 01:10:30 NA NA 556.5 1
17: 2015-12-22 01:10:40 NA NA 572.5 1
18: 2015-12-22 01:10:50 NA NA 581.0 1
19: 2015-12-22 01:11:00 NA NA 583.5 1
20: 2015-12-22 01:11:10 NA NA 582.0 1
21: 2015-12-22 01:11:20 NA NA 581.0 1
22: 2015-12-22 01:11:30 NA NA 581.0 1
23: 2015-12-22 01:11:40 NA NA 581.0 1
24: 2015-12-22 01:11:50 NA NA 581.0 1
25: 2015-12-22 01:12:00 NA NA 582.0 1
26: 2015-12-23 07:17:00 NA NA 863.0 2
27: 2015-12-23 07:17:10 NA NA 863.5 2
28: 2015-12-23 07:17:20 NA NA 865.0 2
29: 2015-12-23 07:17:30 NA NA 866.0 2
30: 2015-12-23 07:17:40 NA NA 867.0 2
31: 2015-12-23 07:17:50 NA NA 867.5 2
32: 2015-12-23 07:18:00 NA NA 868.5 2
33: 2015-12-23 07:18:10 NA NA 870.0 2
34: 2015-12-23 07:18:20 NA NA 870.5 2
35: 2015-12-23 07:18:30 NA NA 871.0 2
36: 2015-12-23 07:18:40 NA NA 872.0 2
37: 2015-12-23 07:18:50 NA NA 872.0 2
38: 2015-12-23 07:19:00 1267 970 874.0 2
39: 2015-12-23 07:19:10 NA NA 875.0 2
40: 2015-12-23 07:19:20 NA NA 875.0 2
41: 2015-12-23 07:19:30 NA NA 876.0 2
42: 2015-12-23 07:19:40 NA NA 876.5 2
43: 2015-12-23 07:19:50 NA NA 876.0 2
44: 2015-12-23 07:20:00 NA NA 878.0 2
45: 2015-12-23 07:20:10 NA NA 876.0 2
46: 2015-12-23 07:20:20 NA NA 875.5 2
47: 2015-12-23 07:20:30 NA NA 875.5 2
48: 2015-12-23 07:20:40 NA NA 874.0 2
49: 2015-12-23 07:20:50 NA NA 872.5 2
50: 2015-12-23 07:21:00 NA NA 870.5 2
Bottom.Start Dive Max.Depth Depth DiveNumber