Is there a way to create a correlation matrix for showing correlations among the average store sales-CodePudding

I'm trying to create a correlation matrix for showing correlations among the average store sales for all product categories. The Product Categories are columns 10-18.

Here is my head(df2):

> head(df2)
  store       city  region province size revenue   units    cost gross_profit promo_units energy_units regularBars_units
1   105 BROCKVILLE ONTARIO       ON  496  984.70  470.46  590.73       393.97      210.23        72.13             38.63
2   117 BURLINGTON ONTARIO       ON  875 2629.32 1131.38 1621.58      1007.74      401.46       192.77             75.04
3   122 BURLINGTON ONTARIO       ON  691 2786.73 1229.46 1709.45      1077.27      450.04       240.48             93.73
4   123 BURLINGTON ONTARIO       ON  763 2834.49 1257.63 1719.61      1114.88      476.83       194.21             99.44
5   182  DON MILLS ONTARIO       ON  784 4118.36 1949.50 2485.83      1632.53      664.71       199.73            175.48
7   186 NORTH YORK ONTARIO       ON  966 8195.26 3695.46 5069.99      3125.27     1143.33       419.19            271.58
  gum_units bagpegCandy_units isotonics_units singleServePotato_units takeHomePotato_units kingBars_units flatWater_units
1     29.29             13.38           20.69                   18.60                 7.71          17.87           56.54
2     55.85             42.15           87.62                   36.44                33.46          47.44           98.42
3     64.27             29.85          105.65                   47.96                19.90          45.21          130.27
4     73.25             54.15          118.19                   39.67                22.10          45.33          132.77
5    145.81             68.06          109.35                   85.71                42.33          79.81          204.06
7    212.42            153.90          166.37                  130.79               136.79         114.50          328.63
  psd591Ml_units
1          39.71
2          38.73
3          47.31
4          39.87
5          50.29
7         112.38

Here is my dput(df2):

> dput(head(df2,4))
structure(list(store = c(105L, 117L, 122L, 123L), city = c("BROCKVILLE", 
"BURLINGTON", "BURLINGTON", "BURLINGTON"), region = c("ONTARIO", 
"ONTARIO", "ONTARIO", "ONTARIO"), province = c("ON", "ON", "ON", 
"ON"), size = c(496L, 875L, 691L, 763L), revenue = c(984.7, 2629.32, 
2786.73, 2834.49), units = c(470.46, 1131.38, 1229.46, 1257.63
), cost = c(590.73, 1621.58, 1709.45, 1719.61), gross_profit = c(393.97, 
1007.74, 1077.27, 1114.88), promo_units = c(210.23, 401.46, 450.04, 
476.83), energy_units = c(72.13, 192.77, 240.48, 194.21), regularBars_units = c(38.63, 
75.04, 93.73, 99.44), gum_units = c(29.29, 55.85, 64.27, 73.25
), bagpegCandy_units = c(13.38, 42.15, 29.85, 54.15), isotonics_units = c(20.69, 
87.62, 105.65, 118.19), singleServePotato_units = c(18.6, 36.44, 
47.96, 39.67), takeHomePotato_units = c(7.71, 33.46, 19.9, 22.1
), kingBars_units = c(17.87, 47.44, 45.21, 45.33), flatWater_units = c(56.54, 
98.42, 130.27, 132.77), psd591Ml_units = c(39.71, 38.73, 47.31, 
39.87)), na.action = structure(c(`6` = 6L, `169` = 169L, `173` = 173L, 
`177` = 177L, `182` = 182L, `191` = 191L, `193` = 193L, `195` = 195L, 
`196` = 196L, `198` = 198L, `204` = 204L, `277` = 277L, `385` = 385L, 
`452` = 452L, `527` = 527L, `601` = 601L), class = "omit"), row.names = c(NA, 
4L), class = "data.frame")

CodePudding user response：

you can try with:

cor(df2[,10:18])

which will give you the following output :

                        promo_units energy_units regularBars_units gum_units bagpegCandy_units isotonics_units
promo_units               1.0000000    0.9341821         0.9910344 0.9909434         0.8449146       0.9993738
energy_units              0.9341821    1.0000000         0.9174646 0.8830945         0.6324464       0.9223899
regularBars_units         0.9910344    0.9174646         1.0000000 0.9929161         0.8075351       0.9932908
gum_units                 0.9909434    0.8830945         0.9929161 1.0000000         0.8711067       0.9950740
bagpegCandy_units         0.8449146    0.6324464         0.8075351 0.8711067         1.0000000       0.8532315
isotonics_units           0.9993738    0.9223899         0.9932908 0.9950740         0.8532315       1.0000000
singleServePotato_units   0.9317931    0.9922160         0.9317428 0.8901737         0.5995540       0.9225072
takeHomePotato_units      0.6708652    0.6744549         0.5657719 0.6040891         0.7302509       0.6543614
kingBars_units            0.9459992    0.9363726         0.8960974 0.9021115         0.8223868       0.9360713
                        singleServePotato_units takeHomePotato_units kingBars_units
promo_units                           0.9317931            0.6708652      0.9459992
energy_units                          0.9922160            0.6744549      0.9363726
regularBars_units                     0.9317428            0.5657719      0.8960974
gum_units                             0.8901737            0.6040891      0.9021115
bagpegCandy_units                     0.5995540            0.7302509      0.8223868
isotonics_units                       0.9225072            0.6543614      0.9360713
singleServePotato_units               1.0000000            0.5804878      0.8960888
takeHomePotato_units                  0.5804878            1.0000000      0.8649255
kingBars_units                        0.8960888            0.8649255      1.0000000

Explanation:

You are using cor() function which calculate the correlation between the elements its receive as inputs. In this case, the input is df2[,10:18], which are the columns 10 to 18 of your df2 dataframe.