I have a problem that I can brute force through, but would like to learn a cleaner way, which i think requires calling observations from within a list of lists.
i am tracking particles across a surface, where each observation is a particular particle at a given time, along with experimental interventions.
I've made a list of particle.id
s that have traveled a given distance (00 mm, 1mm, 3mm, 5mm, etc) and would like to see how long it took each particle to get to that distance.
library(tidyverse)
library(here)
load(here("outputs", "master.muc.RData")) #all particles with all data
load(here("outputs", "max.disp.RData")) #one observation per particle, using slice_max(displacement)
#links below
link to master.muc, which includes all particle observations https://www.dropbox.com/s/77h4aajfmfvpeb5/master.muc.RData?dl=0
link to max.disp, a single observation per particle based on maximum displacement https://www.dropbox.com/s/y6qmt85wskmj9mg/max.disp.RData?dl=0
Here's how I created my distance lists. I'm sure this could be simplified, I'd be happy for the feedback. i've also tried this as list() and using select() instead of pull()
disp.00 <- max.disp %>%
filter(displacement < 0.03) %>%
pull(particle.id)
disp.03 <- max.disp %>%
filter(displacement >= 0.03) %>%
pull(particle.id)
disp.05 <- max.disp %>%
filter(displacement >= 0.05) %>%
pull(particle.id)
disp.10 <- max.disp %>%
filter(displacement >= 0.10) %>%
pull(particle.id)
disp.15 <- max.disp %>%
filter(displacement >= 0.15) %>%
pull(particle.id)
disp.20 <- max.disp %>%
filter(displacement >= 0.20) %>%
pull(particle.id)
disp.25 <- max.disp %>%
filter(displacement >= 0.25) %>%
pull(particle.id)
disp.30 <- max.disp %>%
filter(displacement >= 0.30) %>%
pull(particle.id)
disp.50 <- max.disp %>%
filter(displacement >= 0.50) %>%
pull(particle.id)
disp.75 <- max.disp %>%
filter(displacement >= 0.75) %>%
pull(particle.id)
disp.99 <- max.disp %>%
filter(displacement > 0.99) %>%
pull(particle.id)
create a tibble for data population
particle.displacement <- master.muc %>% select(particle.id) %>% unique()
particle.displacement <- particle.displacement %>% add_column(disp.00 = NA,
disp.03 = NA,
disp.05 = NA,
disp.10 = NA,
disp.15 = NA,
disp.20 = NA,
disp.25 = NA,
disp.30 = NA,
disp.50 = NA,
disp.75 = NA,
disp.99 = NA)
time.min.part.disp <- particle.displacement
time.max.part.disp <- particle.displacement
then I'd like to add minimum elapsed times ∆t, dt
s to each particle that appears in that list, particles that don't appear in each list will remain as NAs
displacements <- c(disp.00, disp.03, disp.05, disp.10, disp.15, disp.20, disp.25, disp.30, disp.50, disp.75, disp.99) #i've tried this as a list as well.
for(j in 1:length(displacements)){
#j <- 8
dt.min <- master.muc %>%
filter(particle.id %in% paste(displacements[j])) %>% #this command works if i call the list directly, for example: %in% disp.05, but not as a loop
slice_min(dt) %>%
select(particle.id, dt)
dt.max <- master.muc %>% group_by(particle.id) %>%
filter(particle.id %in% displacements[j]) %>%
slice_max(dt) %>%
select(particle.id, dt)
time.min.part.disp <- left_join(time.min.part.disp, dt.min, by = particle.id)
time.max.part.disp <- left_join(time.max.part.disp, dt.max, by = particle.id)
}
I was going to do this manually for each list, but I'd rather not at the risk of some manual errors and with the hope of learning something.
d.00.min <- master.muc %>% group_by(particle.id) %>%
filter(particle.id %in% disp.00) %>%
slice_min(dt) %>%
select(particle.id, dt)
d.00.max <- master.muc %>% group_by(particle.id) %>%
filter(particle.id %in% disp.00) %>%
slice_max(dt) %>%
select(particle.id, dt)
thanks for the help!
CodePudding user response:
You can create a table with one particle or one particle displacement combination per row and use mutate
to calculate e.g. the ids being displaced at least that much. Here is some code for inspiration:
library(tidyverse)
load("master.muc.RData")
load("max.disp.RData")
displacements <- c(0.03, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.75, 0.99)
particle_ids <- master.muc %>% pull(particle.id) %>% unique()
displaced_particles <-
tibble(displacement = displacements) %>%
mutate(
particle.id = displacement %>% map(~ {
max.disp %>%
filter(displacement >= .x) %>%
pull(particle.id)
})
) %>%
unnest(particle.id)
displaced_particles
#> # A tibble: 119,081 x 2
#> displacement particle.id
#> <dbl> <chr>
#> 1 0.03 100-135-001-0
#> 2 0.03 100-135-001-1
#> 3 0.03 100-135-001-10
#> 4 0.03 100-135-001-101
#> 5 0.03 100-135-001-102
#> 6 0.03 100-135-001-103
#> 7 0.03 100-135-001-104
#> 8 0.03 100-135-001-105
#> 9 0.03 100-135-001-106
#> 10 0.03 100-135-001-106
#> # … with 119,071 more rows
particle_durations <-
master.muc %>%
group_by(particle.id) %>%
summarise(
min_elapsed_time = min(dt),
max_elapsed_time = max(dt)
)
particle_durations
#> # A tibble: 14,594 x 3
#> particle.id min_elapsed_time max_elapsed_time
#> <chr> <dbl> <dbl>
#> 1 100-135-001-0 0 21.9
#> 2 100-135-001-1 0 33
#> 3 100-135-001-10 0 22.8
#> 4 100-135-001-101 0 39.9
#> 5 100-135-001-102 0 20.1
#> 6 100-135-001-103 0 23.4
#> 7 100-135-001-104 0 23.1
#> 8 100-135-001-105 0 25.5
#> 9 100-135-001-106 0 137.
#> 10 100-135-001-108 0 31.5
#> # … with 14,584 more rows
particle_durations %>%
left_join(displaced_particles)
#> Joining, by = "particle.id"
#> # A tibble: 123,303 x 4
#> particle.id min_elapsed_time max_elapsed_time displacement
#> <chr> <dbl> <dbl> <dbl>
#> 1 100-135-001-0 0 21.9 0.03
#> 2 100-135-001-0 0 21.9 0.05
#> 3 100-135-001-0 0 21.9 0.1
#> 4 100-135-001-0 0 21.9 0.15
#> 5 100-135-001-0 0 21.9 0.2
#> 6 100-135-001-0 0 21.9 0.25
#> 7 100-135-001-1 0 33 0.03
#> 8 100-135-001-1 0 33 0.05
#> 9 100-135-001-1 0 33 0.1
#> 10 100-135-001-1 0 33 0.15
#> # … with 123,293 more rows
displaced_particles %>%
nest(particle.id) %>%
mutate(
data = data %>% map(~ {
master.muc %>%
# filter before group_by is much faster
filter(particle.id %in% .x$particle.id) %>%
group_by(particle.id) %>%
slice_min(dt) %>%
select(particle.id, dt)
})
) %>%
unnest(data)
#> Warning: All elements of `...` must be named.
#> Did you want `data = c(particle.id)`?
#> # A tibble: 57,666 x 3
#> displacement particle.id dt
#> <dbl> <chr> <dbl>
#> 1 0.03 100-135-001-0 0
#> 2 0.03 100-135-001-1 0
#> 3 0.03 100-135-001-10 0
#> 4 0.03 100-135-001-101 0
#> 5 0.03 100-135-001-102 0
#> 6 0.03 100-135-001-103 0
#> 7 0.03 100-135-001-104 0
#> 8 0.03 100-135-001-105 0
#> 9 0.03 100-135-001-106 0
#> 10 0.03 100-135-001-108 0
#> # … with 57,656 more rows
Created on 2021-12-14 by the reprex package (v2.0.1)
You can use nest
and unnest
to split your table in row groups. Usually, it is much better to have 3NF normalized long tables like displaced_particles
which just one number per cell. This is much easier e.g. for joining tables to gather properties of the same particles from different table columns together.
Since there is just one object in the file max.disp.RData
, you should save them as RDS. Also consider that other folks might not use R for data analysis so there are file formats like feather or even CSV making your data much more compatible with other tools.