I'm trying to create a correlation matrix for showing correlations among the average store sales for all product categories. The Product Categories are columns 10-18.
Here is my head(df2):
> head(df2)
store city region province size revenue units cost gross_profit promo_units energy_units regularBars_units
1 105 BROCKVILLE ONTARIO ON 496 984.70 470.46 590.73 393.97 210.23 72.13 38.63
2 117 BURLINGTON ONTARIO ON 875 2629.32 1131.38 1621.58 1007.74 401.46 192.77 75.04
3 122 BURLINGTON ONTARIO ON 691 2786.73 1229.46 1709.45 1077.27 450.04 240.48 93.73
4 123 BURLINGTON ONTARIO ON 763 2834.49 1257.63 1719.61 1114.88 476.83 194.21 99.44
5 182 DON MILLS ONTARIO ON 784 4118.36 1949.50 2485.83 1632.53 664.71 199.73 175.48
7 186 NORTH YORK ONTARIO ON 966 8195.26 3695.46 5069.99 3125.27 1143.33 419.19 271.58
gum_units bagpegCandy_units isotonics_units singleServePotato_units takeHomePotato_units kingBars_units flatWater_units
1 29.29 13.38 20.69 18.60 7.71 17.87 56.54
2 55.85 42.15 87.62 36.44 33.46 47.44 98.42
3 64.27 29.85 105.65 47.96 19.90 45.21 130.27
4 73.25 54.15 118.19 39.67 22.10 45.33 132.77
5 145.81 68.06 109.35 85.71 42.33 79.81 204.06
7 212.42 153.90 166.37 130.79 136.79 114.50 328.63
psd591Ml_units
1 39.71
2 38.73
3 47.31
4 39.87
5 50.29
7 112.38
Here is my dput(df2):
> dput(head(df2,4))
structure(list(store = c(105L, 117L, 122L, 123L), city = c("BROCKVILLE",
"BURLINGTON", "BURLINGTON", "BURLINGTON"), region = c("ONTARIO",
"ONTARIO", "ONTARIO", "ONTARIO"), province = c("ON", "ON", "ON",
"ON"), size = c(496L, 875L, 691L, 763L), revenue = c(984.7, 2629.32,
2786.73, 2834.49), units = c(470.46, 1131.38, 1229.46, 1257.63
), cost = c(590.73, 1621.58, 1709.45, 1719.61), gross_profit = c(393.97,
1007.74, 1077.27, 1114.88), promo_units = c(210.23, 401.46, 450.04,
476.83), energy_units = c(72.13, 192.77, 240.48, 194.21), regularBars_units = c(38.63,
75.04, 93.73, 99.44), gum_units = c(29.29, 55.85, 64.27, 73.25
), bagpegCandy_units = c(13.38, 42.15, 29.85, 54.15), isotonics_units = c(20.69,
87.62, 105.65, 118.19), singleServePotato_units = c(18.6, 36.44,
47.96, 39.67), takeHomePotato_units = c(7.71, 33.46, 19.9, 22.1
), kingBars_units = c(17.87, 47.44, 45.21, 45.33), flatWater_units = c(56.54,
98.42, 130.27, 132.77), psd591Ml_units = c(39.71, 38.73, 47.31,
39.87)), na.action = structure(c(`6` = 6L, `169` = 169L, `173` = 173L,
`177` = 177L, `182` = 182L, `191` = 191L, `193` = 193L, `195` = 195L,
`196` = 196L, `198` = 198L, `204` = 204L, `277` = 277L, `385` = 385L,
`452` = 452L, `527` = 527L, `601` = 601L), class = "omit"), row.names = c(NA,
4L), class = "data.frame")
CodePudding user response:
you can try with:
cor(df2[,10:18])
which will give you the following output :
promo_units energy_units regularBars_units gum_units bagpegCandy_units isotonics_units
promo_units 1.0000000 0.9341821 0.9910344 0.9909434 0.8449146 0.9993738
energy_units 0.9341821 1.0000000 0.9174646 0.8830945 0.6324464 0.9223899
regularBars_units 0.9910344 0.9174646 1.0000000 0.9929161 0.8075351 0.9932908
gum_units 0.9909434 0.8830945 0.9929161 1.0000000 0.8711067 0.9950740
bagpegCandy_units 0.8449146 0.6324464 0.8075351 0.8711067 1.0000000 0.8532315
isotonics_units 0.9993738 0.9223899 0.9932908 0.9950740 0.8532315 1.0000000
singleServePotato_units 0.9317931 0.9922160 0.9317428 0.8901737 0.5995540 0.9225072
takeHomePotato_units 0.6708652 0.6744549 0.5657719 0.6040891 0.7302509 0.6543614
kingBars_units 0.9459992 0.9363726 0.8960974 0.9021115 0.8223868 0.9360713
singleServePotato_units takeHomePotato_units kingBars_units
promo_units 0.9317931 0.6708652 0.9459992
energy_units 0.9922160 0.6744549 0.9363726
regularBars_units 0.9317428 0.5657719 0.8960974
gum_units 0.8901737 0.6040891 0.9021115
bagpegCandy_units 0.5995540 0.7302509 0.8223868
isotonics_units 0.9225072 0.6543614 0.9360713
singleServePotato_units 1.0000000 0.5804878 0.8960888
takeHomePotato_units 0.5804878 1.0000000 0.8649255
kingBars_units 0.8960888 0.8649255 1.0000000
Explanation:
You are using cor()
function which calculate the correlation between the elements its receive as inputs. In this case, the input is df2[,10:18]
, which are the columns 10 to 18 of your df2
dataframe.