I am trying to create interaction variables for all 20 variables in a dataframe, so I would have in total 20 base variables and 380 interaction variables. For any single variable, I am able to create a dataframe of 19 variables by using:
in_sample[3:22] %>%
transmute(across(.cols = -c(frpm_frac_s), .fns = function(x){x*frpm_frac_s}))
But I am unable to iterate across the columns. I tried to use map over a vector of column names but am unable to get the function inside map to read as.symbol(character). Here is a sample of my data from dput:
structure(list(frpm_frac_s = c(0.870400011539459, 0.904699981212616,
0.98089998960495, 0.838800013065338, 0.919900000095367, 0.837700009346008,
0.84799998998642, 0.925999999046326, 0.963900029659271, 0.887899994850159
), enrollment_s = c(364, 608, 571, 705, 566, 838, 421, 757, 693,
535), ell_frac_s = c(0.46000000834465, 0.334000021219254, 0.300999999046326,
0.209999993443489, 0.706999957561493, 0.552999973297119, 0.412999987602234,
0.359000027179718, 0.726000010967255, 0.646999955177307), edi_s = c(8,
38, 39, 37, 11, 35, 15, 39, 9, 4), te_fte_s = c(23, 22, 20, 25,
24.5, 36, 18, 30.2999992370605, 24.3999996185303, 19)), row.names = c(NA,
10L), class = "data.frame")
When using:
in_sample[3:22] %>%
transmute(across(.cols = -c(frpm_frac_s), .fns = function(x){x*frpm_frac_s}))
I get:
structure(list(enrollment_s = c(316.825604200363, 550.057588577271,
560.093894064426, 591.354009211063, 520.663400053978, 701.992607831955,
357.007995784283, 700.981999278069, 667.982720553875, 475.026497244835
), ell_frac_s = c(0.400384012571335, 0.302169812922072, 0.295250895935631,
0.17614799724412, 0.650369261028242, 0.463248082799339, 0.350223985351086,
0.33243402482605, 0.699791432103968, 0.574471256869984), edi_s = c(6.96320009231567,
34.3785992860794, 38.255099594593, 31.0356004834175, 10.118900001049,
29.3195003271103, 12.7199998497963, 36.1139999628067, 8.67510026693344,
3.55159997940063), te_fte_s = c(20.0192002654076, 19.9033995866776,
19.617999792099, 20.9700003266335, 22.5375500023365, 30.1572003364563,
15.2639998197556, 28.0577992646217, 23.5191603559875, 16.870099902153
)), row.names = c(NA, 10L), class = "data.frame")
I would like to do this for all variables and then cbind them together. Thank you for your help.
CodePudding user response:
You can use model.matrix
to create interaction terms. (This is what's done under the hood in most modeling functions.)
m = model.matrix(~ .^2 - . 0, data = df)
m
# frpm_frac_s:enrollment_s frpm_frac_s:ell_frac_s frpm_frac_s:edi_s frpm_frac_s:te_fte_s
# 1 316.8256 0.4003840 6.9632 20.01920
# 2 550.0576 0.3021698 34.3786 19.90340
# 3 560.0939 0.2952509 38.2551 19.61800
# 4 591.3540 0.1761480 31.0356 20.97000
# 5 520.6634 0.6503693 10.1189 22.53755
# 6 701.9926 0.4632481 29.3195 30.15720
# 7 357.0080 0.3502240 12.7200 15.26400
# 8 700.9820 0.3324340 36.1140 28.05780
# 9 667.9827 0.6997914 8.6751 23.51916
# 10 475.0265 0.5744713 3.5516 16.87010
# enrollment_s:ell_frac_s enrollment_s:edi_s enrollment_s:te_fte_s ell_frac_s:edi_s
# 1 167.440 2912 8372.0 3.680
# 2 203.072 23104 13376.0 12.692
# 3 171.871 22269 11420.0 11.739
# 4 148.050 26085 17625.0 7.770
# 5 400.162 6226 13867.0 7.777
# 6 463.414 29330 30168.0 19.355
# 7 173.873 6315 7578.0 6.195
# 8 271.763 29523 22937.1 14.001
# 9 503.118 6237 16909.2 6.534
# 10 346.145 2140 10165.0 2.588
# ell_frac_s:te_fte_s edi_s:te_fte_s
# 1 10.5800 184.0
# 2 7.3480 836.0
# 3 6.0200 780.0
# 4 5.2500 925.0
# 5 17.3215 269.5
# 6 19.9080 1260.0
# 7 7.4340 270.0
# 8 10.8777 1181.7
# 9 17.7144 219.6
# 10 12.2930 76.0
# attr(,"assign")
# [1] 1 2 3 4 5 6 7 8 9 10
Your math is a little off, because order doesn't matter in multiplication there are n * (n - 1) / 2
possibilities, (same as n choose 2
), so you should expect 190 columns output for 20 columns input.
I made the formula to only include interaction terms, you can use ~ .^2 0
to include the first order terms too, or ~ .^2
to also include an intercept.