Home > front end >  Filter conditionally, where if a value is exceeded in column A, further observations beyond the resp
Filter conditionally, where if a value is exceeded in column A, further observations beyond the resp

Time:10-23

I am working with fish telemetry data in R, with a simplified dataset provided below:

df <- structure(list(DATE.TIME = structure(c(1560900051, 1560900101, 
1560927373, 1560927504, 1560927533, 1560927585, 1560927689, 1560899962, 
1560900026, 1560900026, 1560900076, 1560927328, 1560927498, 1560927529, 
1560927558, 1560907660, 1560907720, 1560908037, 1560925131, 1560925260, 
1560931034, 1560907630, 1560907695, 1560907746, 1560907804, 1560908189, 
1560908268, 1560925097, 1560925300, 1560925426), class = c("POSIXct", 
"POSIXt"), tzone = "Canada/Atlantic"), TAG = c(1310230L, 1310230L, 
1310230L, 1310230L, 1310230L, 1310230L, 1310230L, 1310230L, 1310230L, 
1310230L, 1310230L, 1310230L, 1310230L, 1310230L, 1310230L, 1311038L, 
1311038L, 1311038L, 1311038L, 1311038L, 1311038L, 1311038L, 1311038L, 
1311038L, 1311038L, 1311038L, 1311038L, 1311038L, 1311038L, 1311038L
), SENSOR.ID = c(5665L, 5665L, 5665L, 5665L, 5665L, 5665L, 5665L, 
5666L, 5666L, 5666L, 5666L, 5666L, 5666L, 5666L, 5666L, 5821L, 
5821L, 5821L, 5821L, 5821L, 5821L, 5822L, 5822L, 5822L, 5822L, 
5822L, 5822L, 5822L, 5822L, 5822L), SENSOR.VALUE = c(8.1796, 
8.1796, 35.0095, 35.0095, 35.0095, 35.0095, 35.0095, 0.9024, 
0, 0, 0, 34.2986, 0.9024, 18.9544, 18.9544, 8.4934, 8.4934, 8.4934, 
35.0095, 35.0095, 35.0095, 0, 0, 0, 0, 0, 0, 13.5388, 1.805, 
1.805), SENSOR = c("temp", "temp", "temp", "temp", "temp", "temp", 
"temp", "depth", "depth", "depth", "depth", "depth", "depth", 
"depth", "depth", "temp", "temp", "temp", "temp", "temp", "temp", 
"depth", "depth", "depth", "depth", "depth", "depth", "depth", 
"depth", "depth")), row.names = c(435151L, 435152L, 435203L, 
435204L, 435205L, 435206L, 435207L, 435614L, 435615L, 435616L, 
435617L, 435664L, 435665L, 435666L, 435667L, 455286L, 455287L, 
455288L, 455295L, 455296L, 455297L, 455553L, 455554L, 455555L, 
455556L, 455557L, 455558L, 455568L, 455569L, 455570L), class = "data.frame")

The data are structured, as follows:

  • DATE.TIME=timestamp of the fish detection
  • TAG=unique ID for the acoustic tag implanted in a fish
  • SENSOR.ID=unique ID for each sensor (temp and depth), 2 sensor IDs per
  • TAG SENSOR.VALUE=recorded temp (C) or depth (m)
  • SENSOR=categorical variable to indicate sensor type (temp or depth)

What I would like to do is subset/filter this data, such that when the temperature increased above 30C (indicative of predation), any subsequent detections would be removed from both the temperature and depth sensor. Each TAG implanted in a fish alternates between transmitting its temperature or depth SENSOR.ID and SENSOR.VALUE. I can do this filter for the temperature sensor data:

dfsub <- subset(df, SENSOR=="temp" & SENSOR.VALUE<30)

But this would still allow for the inclusion of detections after the predation event on the depth sensor, which would now reflect the movement of a predator in this instance. Ideally, a filter would recognize the time stamp of the first instance where temperature increased above 30C, and remove all observations beyond that time for each individual fish (i.e., TAG). I was looking to resolve a dataset, as below, once passing through the filter.

df <- structure(list(DATE.TIME = structure(c(1560900051, 1560900101, 
1560899962, 1560900026, 1560900026, 1560900076, 1560907660, 1560907720, 
1560908037, 1560907630, 1560907695, 1560907746, 1560907804, 1560908189, 
1560908268, 1560925097), class = c("POSIXct", "POSIXt"), tzone = "Canada/Atlantic"), 
    TAG = c(1310230L, 1310230L, 1310230L, 1310230L, 1310230L, 
    1310230L, 1311038L, 1311038L, 1311038L, 1311038L, 1311038L, 
    1311038L, 1311038L, 1311038L, 1311038L, 1311038L), SENSOR.ID = c(5665L, 
    5665L, 5666L, 5666L, 5666L, 5666L, 5821L, 5821L, 5821L, 5822L, 
    5822L, 5822L, 5822L, 5822L, 5822L, 5822L), SENSOR.VALUE = c(8.1796, 
    8.1796, 0.9024, 0, 0, 0, 8.4934, 8.4934, 8.4934, 0, 0, 0, 
    0, 0, 0, 13.5388), SENSOR = c("temp", "temp", "depth", "depth", 
    "depth", "depth", "temp", "temp", "temp", "depth", "depth", 
    "depth", "depth", "depth", "depth", "depth")), row.names = c(435151L, 
435152L, 435614L, 435615L, 435616L, 435617L, 455286L, 455287L, 
455288L, 455553L, 455554L, 455555L, 455556L, 455557L, 455558L, 
455568L), class = "data.frame")

Appreciate the insight!

CodePudding user response:

Cool dataset! Here is one option using tidyr::fill. I edited and slimmed down your data a bit to make a better reprex.

Also separated it into steps for pedagogic purposes but in reality, you should do this in a single pipe chain.

library(tidyverse)

fishdat <- tibble::tribble(
  ~DATE.TIME,        ~FISH.TAG, ~SENSOR.ID, ~SENSOR.VALUE, ~SENSOR,
  "2019-06-18 20:19:41",   1,      65,            9,     "temp",
  "2019-06-18 20:20:51",   1,      65,            37,    "temp",
  "2019-06-18 20:19:22",   1,      66,            1,    "depth",
  "2019-06-18 20:21:16",   1,      66,            0,    "depth",
  "2019-06-18 22:27:40",   2,      21,           35,     "temp",
  "2019-06-18 22:33:57",   2,      21,           38,     "temp",
  "2019-06-18 22:27:10",   2,      22,            0,    "depth",
  "2019-06-19 3:18:17",    2,      22,           13,    "depth"
  )

Tag the values indicating predation

fishdat_marked <- 
  fishdat %>% 
  mutate(predated = ifelse(SENSOR == "temp" & SENSOR.VALUE > 30, 
                           "predated", 
                           NA_character_)) 
fishdat_marked

#> # A tibble: 8 × 6
#>   DATE.TIME           FISH.TAG SENSOR.ID SENSOR.VALUE SENSOR predated
#>   <chr>                  <dbl>     <dbl>        <dbl> <chr>  <chr>   
#> 1 2019-06-18 20:19:41        1        65            9 temp   <NA>    
#> 2 2019-06-18 20:20:51        1        65           37 temp   predated
#> 3 2019-06-18 20:19:22        1        66            1 depth  <NA>    
#> 4 2019-06-18 20:21:16        1        66            0 depth  <NA>    
#> 5 2019-06-18 22:27:40        2        21           35 temp   <NA>    
#> 6 2019-06-18 22:33:57        2        21           38 temp   predated
#> 7 2019-06-18 22:27:10        2        22            0 depth  <NA>    
#> 8 2019-06-19 3:18:17         2        22           13 depth  <NA>

Cascade down the predation marker

fishdat_filled <- 
  fishdat_marked %>% 
  group_by(FISH.TAG) %>% ## for each fish
  arrange(DATE.TIME, .by_group = T)  %>% 
  fill(predated, .direction = "down")

fishdat_filled

#> # A tibble: 8 × 6
#> # Groups:   FISH.TAG [2]
#>   DATE.TIME           FISH.TAG SENSOR.ID SENSOR.VALUE SENSOR predated
#>   <chr>                  <dbl>     <dbl>        <dbl> <chr>  <chr>   
#> 1 2019-06-18 20:19:22        1        66            1 depth  <NA>    
#> 2 2019-06-18 20:19:41        1        65            9 temp   <NA>    
#> 3 2019-06-18 20:20:51        1        65           37 temp   predated
#> 4 2019-06-18 20:21:16        1        66            0 depth  predated
#> 5 2019-06-18 22:27:10        2        22            0 depth  <NA>    
#> 6 2019-06-18 22:27:40        2        21           35 temp   <NA>    
#> 7 2019-06-18 22:33:57        2        21           38 temp   predated
#> 8 2019-06-19 3:18:17         2        22           13 depth  predated

Now, filter

fishdat_filled %>% 
  filter(is.na(predated))
#> # A tibble: 4 × 6
#> # Groups:   FISH.TAG [2]
#>   DATE.TIME           FISH.TAG SENSOR.ID SENSOR.VALUE SENSOR predated
#>   <chr>                  <dbl>     <dbl>        <dbl> <chr>  <chr>   
#> 1 2019-06-18 20:19:22        1        66            1 depth  <NA>    
#> 2 2019-06-18 20:19:41        1        65            9 temp   <NA>    
#> 3 2019-06-18 22:27:10        2        22            0 depth  <NA>    
#> 4 2019-06-18 22:27:40        2        21           35 temp   <NA>

Created on 2021-10-22 by the reprex package (v2.0.1)

  • Related