Home > Software engineering >  mutate and map to create runif distributions
mutate and map to create runif distributions

Time:10-16

I am trying to create distributions in R by mapping over a nested data frame.

I first have:

# A tibble: 3 × 4
  Species      min   max    sd
  <fct>      <dbl> <dbl> <dbl>
1 setosa       4.3   5.8 0.352
2 versicolor   4.9   7   0.516
3 virginica    4.9   7.9 0.636

So, for each group I want to create 100 random samples between the min and max of each row.

How can I correctly apply the mutate, map function to create these distributions?

  mutate(
    out = map(runif(min = min, max = max, n = 100))
  )

Expected output would be an extra column in the data frame c(1, 2, 3, ... , n) using runif().

iris %>% 
  group_by(Species) %>% 
  summarise(
    min = min(Sepal.Length),
    max = max(Sepal.Length),
    sd = sd(Sepal.Length)
  ) %>% 
  ungroup() %>%
  group_by(Species) %>% 
  nest() %>% 
  mutate(
    out = map(data, ~ runif(min = min, max = max, n = 100))
  )

CodePudding user response:

We could do it rowwise() which is automatically applied when we use nest_by(). It's important to wrap the result in list(), but its very readable.

library(dplyr)

iris %>% 
  nest_by(Species) %>% 
  mutate(
    min = min(data$Sepal.Length),
    max = max(data$Sepal.Length),
    sd = sd(data$Sepal.Length)
  ) %>% 
  mutate(
    out = list(runif(n = 100, min = min, max = max))
  )

#> # A tibble: 3 × 6
#> # Rowwise:  Species
#>   Species                  data   min   max    sd out        
#>   <fct>      <list<tibble[,4]>> <dbl> <dbl> <dbl> <list>     
#> 1 setosa               [50 × 4]   4.3   5.8 0.352 <dbl [100]>
#> 2 versicolor           [50 × 4]   4.9   7   0.516 <dbl [100]>
#> 3 virginica            [50 × 4]   4.9   7.9 0.636 <dbl [100]>

With {purrr} we could use map2():

library(dplyr)
library(purrr)


iris %>% 
  group_by(Species) %>% 
  summarise(
    min = min(Sepal.Length),
    max = max(Sepal.Length),
    sd = sd(Sepal.Length)
  ) %>% 
  mutate(out = map2(min, max, 
                    ~ runif(n = 100, min = .x, max = .y))
  )

#> # A tibble: 3 × 5
#>   Species      min   max    sd out        
#>   <fct>      <dbl> <dbl> <dbl> <list>     
#> 1 setosa       4.3   5.8 0.352 <dbl [100]>
#> 2 versicolor   4.9   7   0.516 <dbl [100]>
#> 3 virginica    4.9   7.9 0.636 <dbl [100]>

Created on 2022-10-15 by the reprex package (v0.3.0)

  • Related