Home > Software design >  Make multiple new columns (ideally tidyverse) by applying mutate across a vector?
Make multiple new columns (ideally tidyverse) by applying mutate across a vector?

Time:05-25

I am trying to simulate dataset for a linear regression in a bit of bayesian stats.

Obviously the overall formula is Y = A BX I have simulated a variety of values of A and B using

A <- rnorm(10,0,1)
B <- rnorm(10,0,1)
#10 Random draws from a normal distribution for the values of each of A and B

I setup a list of possible values of X

stuff <- tibble(x  = seq(130,170,10)) %>%
#Make table for possible values of X between 130>170 in intervals of 10
mutate(Y = A   B*x)
Make new value which is A plus B*each value of X

This works fine when I have only 1 value in A & B (i.e if I do A <- rnorm(1,0,1)) But obviously it doesnt work when the length of A & B > 1

What I am trying to figure out how to do us something that would be like

mutate(Y[i] = A[i]   B[i]*x

Resulting in 10 new columns Y1>Y10

Any suggestions welcomed

CodePudding user response:

Here's how I would do what I think you want. I'd start long and then convert to wide...

library(tidyverse)

set.seed(123)

df <- tibble() %>% 
        expand(
          nesting(
            ID=1:10,
            A=rnorm(10,0,1),
            B=rnorm(10,0,1)
          ),
          X=seq(130,170,10)
        ) %>% 
        mutate(Y=A   B*X)
df
# A tibble: 50 × 5
      ID      A      B     X     Y
   <int>  <dbl>  <dbl> <dbl> <dbl>
 1     1 -1.07   0.426   130  54.4
 2     1 -1.07   0.426   140  58.6
 3     1 -1.07   0.426   150  62.9
 4     1 -1.07   0.426   160  67.2
 5     1 -1.07   0.426   170  71.4
 6     2 -0.218 -0.295   130 -38.6
 7     2 -0.218 -0.295   140 -41.5
 8     2 -0.218 -0.295   150 -44.5
 9     2 -0.218 -0.295   160 -47.4
10     2 -0.218 -0.295   170 -50.4
# … with 40 more rows

Now, pivot to wide...

df %>% 
  pivot_wider(
    names_from=ID,
    values_from=Y,
    names_prefix="Y",
    id_cols=X
  )
# A tibble: 5 × 11
      X    Y1    Y2    Y3    Y4    Y5    Y6    Y7     Y8    Y9   Y10
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1   130  54.4 -38.6  115.  113.  106.  87.8  72.8  -7.90 -40.9 -48.2
2   140  58.6 -41.5  124.  122.  114.  94.7  78.4  -8.51 -44.0 -52.0
3   150  62.9 -44.5  133.  131.  123. 102.   83.9  -9.13 -47.0 -55.8
4   160  67.2 -47.4  142.  140.  131. 108.   89.5  -9.75 -50.1 -59.6
5   170  71.4 -50.4  151.  149.  139. 115.   95.0 -10.4  -53.2 -63.4

At this point you've lost A & B, because you'd need another 10 columns to store the original A's and another 10 to store the original B's.

Personally, I'd probably stick with the long format, because that's most likely going to make your future workflow easier. And I get to keep the A's and B's.

  • Related