How do I create interaction columns forall columns in tidyverse?-CodePudding

I am trying to create interaction variables for all 20 variables in a dataframe, so I would have in total 20 base variables and 380 interaction variables. For any single variable, I am able to create a dataframe of 19 variables by using:

in_sample[3:22] %>%
transmute(across(.cols = -c(frpm_frac_s), .fns = function(x){x*frpm_frac_s}))

But I am unable to iterate across the columns. I tried to use map over a vector of column names but am unable to get the function inside map to read as.symbol(character). Here is a sample of my data from dput:

structure(list(frpm_frac_s = c(0.870400011539459, 0.904699981212616, 
0.98089998960495, 0.838800013065338, 0.919900000095367, 0.837700009346008, 
0.84799998998642, 0.925999999046326, 0.963900029659271, 0.887899994850159
), enrollment_s = c(364, 608, 571, 705, 566, 838, 421, 757, 693, 
535), ell_frac_s = c(0.46000000834465, 0.334000021219254, 0.300999999046326, 
0.209999993443489, 0.706999957561493, 0.552999973297119, 0.412999987602234, 
0.359000027179718, 0.726000010967255, 0.646999955177307), edi_s = c(8, 
38, 39, 37, 11, 35, 15, 39, 9, 4), te_fte_s = c(23, 22, 20, 25, 
24.5, 36, 18, 30.2999992370605, 24.3999996185303, 19)), row.names = c(NA, 
10L), class = "data.frame")

When using:

 in_sample[3:22] %>%
    transmute(across(.cols = -c(frpm_frac_s), .fns = function(x){x*frpm_frac_s}))

I get:

structure(list(enrollment_s = c(316.825604200363, 550.057588577271, 
560.093894064426, 591.354009211063, 520.663400053978, 701.992607831955, 
357.007995784283, 700.981999278069, 667.982720553875, 475.026497244835
), ell_frac_s = c(0.400384012571335, 0.302169812922072, 0.295250895935631, 
0.17614799724412, 0.650369261028242, 0.463248082799339, 0.350223985351086, 
0.33243402482605, 0.699791432103968, 0.574471256869984), edi_s = c(6.96320009231567, 
34.3785992860794, 38.255099594593, 31.0356004834175, 10.118900001049, 
29.3195003271103, 12.7199998497963, 36.1139999628067, 8.67510026693344, 
3.55159997940063), te_fte_s = c(20.0192002654076, 19.9033995866776, 
19.617999792099, 20.9700003266335, 22.5375500023365, 30.1572003364563, 
15.2639998197556, 28.0577992646217, 23.5191603559875, 16.870099902153
)), row.names = c(NA, 10L), class = "data.frame")

I would like to do this for all variables and then cbind them together. Thank you for your help.

CodePudding user response：

You can use model.matrix to create interaction terms. (This is what's done under the hood in most modeling functions.)

m = model.matrix(~ .^2 - .   0, data = df)
m
#    frpm_frac_s:enrollment_s frpm_frac_s:ell_frac_s frpm_frac_s:edi_s frpm_frac_s:te_fte_s
# 1                  316.8256              0.4003840            6.9632             20.01920
# 2                  550.0576              0.3021698           34.3786             19.90340
# 3                  560.0939              0.2952509           38.2551             19.61800
# 4                  591.3540              0.1761480           31.0356             20.97000
# 5                  520.6634              0.6503693           10.1189             22.53755
# 6                  701.9926              0.4632481           29.3195             30.15720
# 7                  357.0080              0.3502240           12.7200             15.26400
# 8                  700.9820              0.3324340           36.1140             28.05780
# 9                  667.9827              0.6997914            8.6751             23.51916
# 10                 475.0265              0.5744713            3.5516             16.87010
#    enrollment_s:ell_frac_s enrollment_s:edi_s enrollment_s:te_fte_s ell_frac_s:edi_s
# 1                  167.440               2912                8372.0            3.680
# 2                  203.072              23104               13376.0           12.692
# 3                  171.871              22269               11420.0           11.739
# 4                  148.050              26085               17625.0            7.770
# 5                  400.162               6226               13867.0            7.777
# 6                  463.414              29330               30168.0           19.355
# 7                  173.873               6315                7578.0            6.195
# 8                  271.763              29523               22937.1           14.001
# 9                  503.118               6237               16909.2            6.534
# 10                 346.145               2140               10165.0            2.588
#    ell_frac_s:te_fte_s edi_s:te_fte_s
# 1              10.5800          184.0
# 2               7.3480          836.0
# 3               6.0200          780.0
# 4               5.2500          925.0
# 5              17.3215          269.5
# 6              19.9080         1260.0
# 7               7.4340          270.0
# 8              10.8777         1181.7
# 9              17.7144          219.6
# 10             12.2930           76.0
# attr(,"assign")
#  [1]  1  2  3  4  5  6  7  8  9 10

Your math is a little off, because order doesn't matter in multiplication there are n * (n - 1) / 2 possibilities, (same as n choose 2), so you should expect 190 columns output for 20 columns input.

I made the formula to only include interaction terms, you can use ~ .^2 0 to include the first order terms too, or ~ .^2 to also include an intercept.