I am trying to simulate dataset for a linear regression in a bit of bayesian stats.
Obviously the overall formula is Y = A BX I have simulated a variety of values of A and B using
A <- rnorm(10,0,1)
B <- rnorm(10,0,1)
#10 Random draws from a normal distribution for the values of each of A and B
I setup a list of possible values of X
stuff <- tibble(x = seq(130,170,10)) %>%
#Make table for possible values of X between 130>170 in intervals of 10
mutate(Y = A B*x)
Make new value which is A plus B*each value of X
This works fine when I have only 1 value in A & B (i.e if I do A <- rnorm(1,0,1)
)
But obviously it doesnt work when the length of A & B > 1
What I am trying to figure out how to do us something that would be like
mutate(Y[i] = A[i] B[i]*x
Resulting in 10 new columns Y1>Y10
Any suggestions welcomed
CodePudding user response:
Here's how I would do what I think you want. I'd start long and then convert to wide...
library(tidyverse)
set.seed(123)
df <- tibble() %>%
expand(
nesting(
ID=1:10,
A=rnorm(10,0,1),
B=rnorm(10,0,1)
),
X=seq(130,170,10)
) %>%
mutate(Y=A B*X)
df
# A tibble: 50 × 5
ID A B X Y
<int> <dbl> <dbl> <dbl> <dbl>
1 1 -1.07 0.426 130 54.4
2 1 -1.07 0.426 140 58.6
3 1 -1.07 0.426 150 62.9
4 1 -1.07 0.426 160 67.2
5 1 -1.07 0.426 170 71.4
6 2 -0.218 -0.295 130 -38.6
7 2 -0.218 -0.295 140 -41.5
8 2 -0.218 -0.295 150 -44.5
9 2 -0.218 -0.295 160 -47.4
10 2 -0.218 -0.295 170 -50.4
# … with 40 more rows
Now, pivot to wide...
df %>%
pivot_wider(
names_from=ID,
values_from=Y,
names_prefix="Y",
id_cols=X
)
# A tibble: 5 × 11
X Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 130 54.4 -38.6 115. 113. 106. 87.8 72.8 -7.90 -40.9 -48.2
2 140 58.6 -41.5 124. 122. 114. 94.7 78.4 -8.51 -44.0 -52.0
3 150 62.9 -44.5 133. 131. 123. 102. 83.9 -9.13 -47.0 -55.8
4 160 67.2 -47.4 142. 140. 131. 108. 89.5 -9.75 -50.1 -59.6
5 170 71.4 -50.4 151. 149. 139. 115. 95.0 -10.4 -53.2 -63.4
At this point you've lost A & B, because you'd need another 10 columns to store the original A's and another 10 to store the original B's.
Personally, I'd probably stick with the long format, because that's most likely going to make your future workflow easier. And I get to keep the A's and B's.