Home > Net >  Is there a way to create box plots of "revenue" for each "region"?
Is there a way to create box plots of "revenue" for each "region"?

Time:06-14

I'm trying to create box plots of the total revenue for each region and cannot figure out how to create.

Here is my head(df):

> head(df2)
  store       city  region province size revenue   units    cost gross_profit promo_units energy_units regularBars_units
1   105 BROCKVILLE ONTARIO       ON  496  984.70  470.46  590.73       393.97      210.23        72.13             38.63
2   117 BURLINGTON ONTARIO       ON  875 2629.32 1131.38 1621.58      1007.74      401.46       192.77             75.04
3   122 BURLINGTON ONTARIO       ON  691 2786.73 1229.46 1709.45      1077.27      450.04       240.48             93.73
4   123 BURLINGTON ONTARIO       ON  763 2834.49 1257.63 1719.61      1114.88      476.83       194.21             99.44
5   182  DON MILLS ONTARIO       ON  784 4118.36 1949.50 2485.83      1632.53      664.71       199.73            175.48
7   186 NORTH YORK ONTARIO       ON  966 8195.26 3695.46 5069.99      3125.27     1143.33       419.19            271.58
  gum_units bagpegCandy_units isotonics_units singleServePotato_units takeHomePotato_units kingBars_units flatWater_units
1     29.29             13.38           20.69                   18.60                 7.71          17.87           56.54
2     55.85             42.15           87.62                   36.44                33.46          47.44           98.42
3     64.27             29.85          105.65                   47.96                19.90          45.21          130.27
4     73.25             54.15          118.19                   39.67                22.10          45.33          132.77
5    145.81             68.06          109.35                   85.71                42.33          79.81          204.06
7    212.42            153.90          166.37                  130.79               136.79         114.50          328.63
  psd591Ml_units
1          39.71
2          38.73
3          47.31
4          39.87
5          50.29
7         112.38

We are only concerned about region and revenue here and trying to create box plots for the revenue of each region.

Here is my str(df2)

> str(df2)
'data.frame':   755 obs. of  20 variables:
 $ store                  : int  105 117 122 123 182 186 194 227 233 236 ...
 $ city                   : chr  "BROCKVILLE" "BURLINGTON" "BURLINGTON" "BURLINGTON" ...
 $ region                 : chr  "ONTARIO" "ONTARIO" "ONTARIO" "ONTARIO" ...
 $ province               : chr  "ON" "ON" "ON" "ON" ...
 $ size                   : int  496 875 691 763 784 966 710 973 967 1001 ...
 $ revenue                : num  985 2629 2787 2834 4118 ...
 $ units                  : num  470 1131 1229 1258 1950 ...
 $ cost                   : num  591 1622 1709 1720 2486 ...
 $ gross_profit           : num  394 1008 1077 1115 1633 ...
 $ promo_units            : num  210 401 450 477 665 ...
 $ energy_units           : num  72.1 192.8 240.5 194.2 199.7 ...
 $ regularBars_units      : num  38.6 75 93.7 99.4 175.5 ...
 $ gum_units              : num  29.3 55.9 64.3 73.2 145.8 ...
 $ bagpegCandy_units      : num  13.4 42.1 29.9 54.1 68.1 ...
 $ isotonics_units        : num  20.7 87.6 105.7 118.2 109.3 ...
 $ singleServePotato_units: num  18.6 36.4 48 39.7 85.7 ...
 $ takeHomePotato_units   : num  7.71 33.46 19.9 22.1 42.33 ...
 $ kingBars_units         : num  17.9 47.4 45.2 45.3 79.8 ...
 $ flatWater_units        : num  56.5 98.4 130.3 132.8 204.1 ...
 $ psd591Ml_units         : num  39.7 38.7 47.3 39.9 50.3 ...
 - attr(*, "na.action")= 'omit' Named int [1:16] 6 169 173 177 182 191 193 195 196 198 ...
  ..- attr(*, "names")= chr [1:16] "6" "169" "173" "177" ... 

CodePudding user response:

Have you tried

boxplot(revenue ~ region, data = df2)

?

CodePudding user response:

In ggplot2:

library(ggplot2)
df2 |> ggplot(aes(region, revenue))   geom_boxplot()

More info and examples here: https://ggplot2.tidyverse.org/reference/geom_boxplot.html

  • Related