calling an index case from a list of lists R-CodePudding

I have a problem that I can brute force through, but would like to learn a cleaner way, which i think requires calling observations from within a list of lists.

i am tracking particles across a surface, where each observation is a particular particle at a given time, along with experimental interventions.

I've made a list of particle.ids that have traveled a given distance (00 mm, 1mm, 3mm, 5mm, etc) and would like to see how long it took each particle to get to that distance.

library(tidyverse)
library(here)

load(here("outputs", "master.muc.RData")) #all particles with all data 
load(here("outputs", "max.disp.RData")) #one observation per particle, using slice_max(displacement) 

#links below

link to master.muc, which includes all particle observations https://www.dropbox.com/s/77h4aajfmfvpeb5/master.muc.RData?dl=0

link to max.disp, a single observation per particle based on maximum displacement https://www.dropbox.com/s/y6qmt85wskmj9mg/max.disp.RData?dl=0

Here's how I created my distance lists. I'm sure this could be simplified, I'd be happy for the feedback. i've also tried this as list() and using select() instead of pull()

disp.00 <- max.disp %>% 
  filter(displacement < 0.03) %>% 
  pull(particle.id)

disp.03 <- max.disp %>% 
  filter(displacement >= 0.03) %>% 
  pull(particle.id)

disp.05 <- max.disp %>% 
  filter(displacement >= 0.05) %>% 
  pull(particle.id)

disp.10 <- max.disp %>% 
  filter(displacement >= 0.10) %>% 
  pull(particle.id)

disp.15 <- max.disp %>% 
  filter(displacement >= 0.15) %>% 
  pull(particle.id)

disp.20 <- max.disp %>% 
  filter(displacement >= 0.20) %>% 
  pull(particle.id)

disp.25 <- max.disp %>% 
  filter(displacement >= 0.25) %>% 
  pull(particle.id)

disp.30 <- max.disp %>% 
  filter(displacement >= 0.30) %>% 
  pull(particle.id)

disp.50 <- max.disp %>% 
  filter(displacement >= 0.50) %>% 
  pull(particle.id)

disp.75 <- max.disp %>% 
  filter(displacement >= 0.75) %>% 
  pull(particle.id)

disp.99 <- max.disp %>% 
  filter(displacement > 0.99) %>% 
  pull(particle.id)

create a tibble for data population

particle.displacement <- master.muc %>% select(particle.id) %>% unique()

particle.displacement <- particle.displacement %>% add_column(disp.00 = NA, 
                                     disp.03 = NA, 
                                     disp.05 = NA, 
                                     disp.10 = NA, 
                                     disp.15 = NA, 
                                     disp.20 = NA, 
                                     disp.25 = NA, 
                                     disp.30 = NA, 
                                     disp.50 = NA, 
                                     disp.75 = NA, 
                                     disp.99 = NA)

time.min.part.disp <- particle.displacement 
time.max.part.disp <- particle.displacement

then I'd like to add minimum elapsed times ∆t, dts to each particle that appears in that list, particles that don't appear in each list will remain as NAs

displacements <- c(disp.00, disp.03, disp.05, disp.10, disp.15, disp.20, disp.25, disp.30, disp.50, disp.75, disp.99) #i've tried this as a list as well. 

for(j in 1:length(displacements)){
  #j <- 8
  dt.min <- master.muc %>% 
    filter(particle.id %in% paste(displacements[j])) %>% #this command works if i call the list directly, for example: %in% disp.05, but not as a loop
    slice_min(dt) %>% 
    select(particle.id, dt)
  dt.max <- master.muc %>% group_by(particle.id) %>% 
    filter(particle.id %in% displacements[j]) %>% 
    slice_max(dt) %>% 
    select(particle.id, dt)
  
time.min.part.disp <- left_join(time.min.part.disp, dt.min, by = particle.id)
time.max.part.disp <- left_join(time.max.part.disp, dt.max, by = particle.id)

}

I was going to do this manually for each list, but I'd rather not at the risk of some manual errors and with the hope of learning something.

d.00.min <- master.muc %>% group_by(particle.id) %>% 
  filter(particle.id %in% disp.00) %>% 
  slice_min(dt) %>% 
  select(particle.id, dt)
d.00.max <- master.muc %>% group_by(particle.id) %>% 
  filter(particle.id %in% disp.00) %>% 
  slice_max(dt) %>%  
  select(particle.id, dt)

thanks for the help!

CodePudding user response：

You can create a table with one particle or one particle displacement combination per row and use mutate to calculate e.g. the ids being displaced at least that much. Here is some code for inspiration:

library(tidyverse)

load("master.muc.RData")
load("max.disp.RData")

displacements <- c(0.03, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.75, 0.99)
particle_ids <- master.muc %>% pull(particle.id) %>% unique()

displaced_particles <-
  tibble(displacement = displacements) %>%
  mutate(
    particle.id = displacement %>% map(~ {
      max.disp %>% 
        filter(displacement >= .x) %>% 
        pull(particle.id)
    })
  ) %>%
  unnest(particle.id)
displaced_particles
#> # A tibble: 119,081 x 2
#>    displacement particle.id    
#>           <dbl> <chr>          
#>  1         0.03 100-135-001-0  
#>  2         0.03 100-135-001-1  
#>  3         0.03 100-135-001-10 
#>  4         0.03 100-135-001-101
#>  5         0.03 100-135-001-102
#>  6         0.03 100-135-001-103
#>  7         0.03 100-135-001-104
#>  8         0.03 100-135-001-105
#>  9         0.03 100-135-001-106
#> 10         0.03 100-135-001-106
#> # … with 119,071 more rows

particle_durations <-
  master.muc %>%
  group_by(particle.id) %>%
  summarise(
    min_elapsed_time = min(dt),
    max_elapsed_time = max(dt)
  )
particle_durations
#> # A tibble: 14,594 x 3
#>    particle.id     min_elapsed_time max_elapsed_time
#>    <chr>                      <dbl>            <dbl>
#>  1 100-135-001-0                  0             21.9
#>  2 100-135-001-1                  0             33  
#>  3 100-135-001-10                 0             22.8
#>  4 100-135-001-101                0             39.9
#>  5 100-135-001-102                0             20.1
#>  6 100-135-001-103                0             23.4
#>  7 100-135-001-104                0             23.1
#>  8 100-135-001-105                0             25.5
#>  9 100-135-001-106                0            137. 
#> 10 100-135-001-108                0             31.5
#> # … with 14,584 more rows

particle_durations %>%
  left_join(displaced_particles)
#> Joining, by = "particle.id"
#> # A tibble: 123,303 x 4
#>    particle.id   min_elapsed_time max_elapsed_time displacement
#>    <chr>                    <dbl>            <dbl>        <dbl>
#>  1 100-135-001-0                0             21.9         0.03
#>  2 100-135-001-0                0             21.9         0.05
#>  3 100-135-001-0                0             21.9         0.1 
#>  4 100-135-001-0                0             21.9         0.15
#>  5 100-135-001-0                0             21.9         0.2 
#>  6 100-135-001-0                0             21.9         0.25
#>  7 100-135-001-1                0             33           0.03
#>  8 100-135-001-1                0             33           0.05
#>  9 100-135-001-1                0             33           0.1 
#> 10 100-135-001-1                0             33           0.15
#> # … with 123,293 more rows

displaced_particles %>%
  nest(particle.id) %>%
  mutate(
    data = data %>% map(~ {
      master.muc %>%
        # filter before group_by is much faster
        filter(particle.id %in% .x$particle.id) %>% 
        group_by(particle.id) %>% 
        slice_min(dt) %>% 
        select(particle.id, dt)
    })
  ) %>%
  unnest(data)
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(particle.id)`?
#> # A tibble: 57,666 x 3
#>    displacement particle.id        dt
#>           <dbl> <chr>           <dbl>
#>  1         0.03 100-135-001-0       0
#>  2         0.03 100-135-001-1       0
#>  3         0.03 100-135-001-10      0
#>  4         0.03 100-135-001-101     0
#>  5         0.03 100-135-001-102     0
#>  6         0.03 100-135-001-103     0
#>  7         0.03 100-135-001-104     0
#>  8         0.03 100-135-001-105     0
#>  9         0.03 100-135-001-106     0
#> 10         0.03 100-135-001-108     0
#> # … with 57,656 more rows

^{Created on 2021-12-14 by the reprex package (v2.0.1)}

You can use nest and unnest to split your table in row groups. Usually, it is much better to have 3NF normalized long tables like displaced_particles which just one number per cell. This is much easier e.g. for joining tables to gather properties of the same particles from different table columns together.

Since there is just one object in the file max.disp.RData, you should save them as RDS. Also consider that other folks might not use R for data analysis so there are file formats like feather or even CSV making your data much more compatible with other tools.