I am trying to clean up a timeseries with multiple data points. The data is arranged by day and by 'beep'. I want to only keep items that are 1 beep between each other on the same day.
In order to do this I have created a dummy variable by multiplying day number by 10 and adding the beep number to it. Because beeps only reach 7 there is no risk of overlap between days.
E.g. Day 2, beep 4 becomes 24 in a variable called day_beep Day 2, beep 5 becomes 25. e.t.c.
I now want to use a function called lagvar from an ESM package to created a time-lagged series. Before doing this I want to make sure that any variables in day_beep that are greater than 1 from their contiguous neighbours are removed.
E.g. Take the following rows and day_beep values
1 21
2 22
3 24
4 26
5 27
In this instance I would want to remove the data from row 3 as it is contiguously 1<
What would be the easiest way to do this for the entire dataframe?
CodePudding user response:
With dplyr
:
df <- tibble(row_number = 1:5, beep = c(21, 22, 24, 26, 27))
filter(df, abs(beep-lag(beep)) <= 1 | abs(beep-lead(beep)) <= 1)
#> # A tibble: 4 × 2
#> row_number beep
#> <int> <dbl>
#> 1 1 21
#> 2 2 22
#> 3 4 26
#> 4 5 27
CodePudding user response:
Here is an alternative, that keeps rows where the difference in lead and lag is less than 4 or NA:
df[Reduce(`-`,data.table::shift(beep,c(-1,1))) %in% c(NA,1:3),]
Output:
id beep
<int> <int>
1: 1 21
2: 2 22
3: 4 26
4: 5 27