Home > Back-end >  Rerun function and save outputs to later plot?
Rerun function and save outputs to later plot?

Time:10-15

pretty new to R and could use some guidance, help or solutions!

My actual dataset is large so I have this sample dataset with two columns that looks like this:

plot<- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
ID<- c("S","S","S","C","T","S","SP","T", "C", "T","S","SP","T","S","C")
dat<- data.frame(plot,ID)

I am trying to randomly remove one entry per plot, calculate the frequency of each ID, randomly remove another entry per plot, calculate the frequency and continue repeating.

So far with some help I was able to use the following to randomly remove one entry from each plot

dat %>%
  group_by(plot) %>%
  sample_n(n() - 1) %>%
  ungroup()

and I was able to use this to calculate the frequency of each ID

dat %>%                               
  group_by(ID) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n))

I need to be able to rinse and repeat these two functions and look at the results each time. This data set is only 15 variables but my actual dataset is a lot larger so it seems inefficient to constantly write it out over 100 times.

Is it possible to loop or rerun both functions together and produce an output each time an x amount of times? For example for the sample data I provided it could be a total of 4 times. I tried "for" loops but I couldn't get it to work (most likely my user error)

Thanks for any help!

CodePudding user response:

Combine the two operations in one function and use map to call it with different p values.

library(dplyr)

run_sample <- function(dat, p) {
  dat %>%
    group_by(plot) %>%
    sample_n(n() - p) %>%
    ungroup() %>%
    count(ID) %>%
    mutate(freq = n / sum(n))
}

set.seed(123)
purrr::map(seq(n_distinct(dat$ID)), run_sample, dat = dat)

This returns a list of tibbles.

result

#[[1]]
# A tibble: 4 x 3
#  ID        n  freq
#  <chr> <int> <dbl>
#1 C         2 0.167
#2 S         5 0.417
#3 SP        2 0.167
#4 T         3 0.25 

#[[2]]
# A tibble: 3 x 3
#  ID        n  freq
#  <chr> <int> <dbl>
#1 C         1 0.111
#2 S         6 0.667
#3 T         2 0.222

#[[3]]
# A tibble: 3 x 3
#  ID        n  freq
#  <chr> <int> <dbl>
#1 S         2 0.333
#2 SP        2 0.333
#3 T         2 0.333

#[[4]]
# A tibble: 2 x 3
#  ID        n  freq
#  <chr> <int> <dbl>
#1 C         1 0.333
#2 S         2 0.667
  • Related