I am trying to create distributions in R by mapping over a nested data frame.
I first have:
# A tibble: 3 × 4
Species min max sd
<fct> <dbl> <dbl> <dbl>
1 setosa 4.3 5.8 0.352
2 versicolor 4.9 7 0.516
3 virginica 4.9 7.9 0.636
So, for each group I want to create 100 random samples between the min and max of each row.
How can I correctly apply the mutate, map
function to create these distributions?
mutate(
out = map(runif(min = min, max = max, n = 100))
)
Expected output would be an extra column in the data frame c(1, 2, 3, ... , n)
using runif()
.
iris %>%
group_by(Species) %>%
summarise(
min = min(Sepal.Length),
max = max(Sepal.Length),
sd = sd(Sepal.Length)
) %>%
ungroup() %>%
group_by(Species) %>%
nest() %>%
mutate(
out = map(data, ~ runif(min = min, max = max, n = 100))
)
CodePudding user response:
We could do it rowwise()
which is automatically applied when we use nest_by()
. It's important to wrap the result in list()
, but its very readable.
library(dplyr)
iris %>%
nest_by(Species) %>%
mutate(
min = min(data$Sepal.Length),
max = max(data$Sepal.Length),
sd = sd(data$Sepal.Length)
) %>%
mutate(
out = list(runif(n = 100, min = min, max = max))
)
#> # A tibble: 3 × 6
#> # Rowwise: Species
#> Species data min max sd out
#> <fct> <list<tibble[,4]>> <dbl> <dbl> <dbl> <list>
#> 1 setosa [50 × 4] 4.3 5.8 0.352 <dbl [100]>
#> 2 versicolor [50 × 4] 4.9 7 0.516 <dbl [100]>
#> 3 virginica [50 × 4] 4.9 7.9 0.636 <dbl [100]>
With {purrr} we could use map2()
:
library(dplyr)
library(purrr)
iris %>%
group_by(Species) %>%
summarise(
min = min(Sepal.Length),
max = max(Sepal.Length),
sd = sd(Sepal.Length)
) %>%
mutate(out = map2(min, max,
~ runif(n = 100, min = .x, max = .y))
)
#> # A tibble: 3 × 5
#> Species min max sd out
#> <fct> <dbl> <dbl> <dbl> <list>
#> 1 setosa 4.3 5.8 0.352 <dbl [100]>
#> 2 versicolor 4.9 7 0.516 <dbl [100]>
#> 3 virginica 4.9 7.9 0.636 <dbl [100]>
Created on 2022-10-15 by the reprex package (v0.3.0)