Home > other >  Removing rows in R where the contiguous values of rows in one column are further than 1 numeric valu
Removing rows in R where the contiguous values of rows in one column are further than 1 numeric valu

Time:08-12

I am trying to clean up a timeseries with multiple data points. The data is arranged by day and by 'beep'. I want to only keep items that are 1 beep between each other on the same day.

In order to do this I have created a dummy variable by multiplying day number by 10 and adding the beep number to it. Because beeps only reach 7 there is no risk of overlap between days.

E.g. Day 2, beep 4 becomes 24 in a variable called day_beep Day 2, beep 5 becomes 25. e.t.c.

I now want to use a function called lagvar from an ESM package to created a time-lagged series. Before doing this I want to make sure that any variables in day_beep that are greater than 1 from their contiguous neighbours are removed.

E.g. Take the following rows and day_beep values

1                          21
2                          22
3                          24
4                          26
5                          27

In this instance I would want to remove the data from row 3 as it is contiguously 1<

What would be the easiest way to do this for the entire dataframe?

CodePudding user response:

With dplyr:

df <- tibble(row_number = 1:5, beep = c(21, 22, 24, 26, 27))

filter(df, abs(beep-lag(beep)) <= 1 | abs(beep-lead(beep)) <= 1)

#> # A tibble: 4 × 2
#>   row_number  beep
#>        <int> <dbl>
#> 1          1    21
#> 2          2    22
#> 3          4    26
#> 4          5    27

CodePudding user response:

Here is an alternative, that keeps rows where the difference in lead and lag is less than 4 or NA:

df[Reduce(`-`,data.table::shift(beep,c(-1,1))) %in% c(NA,1:3),]

Output:

      id  beep
   <int> <int>
1:     1    21
2:     2    22
3:     4    26
4:     5    27
  • Related