Is there a way to create box plots of "revenue" for each "region"?


I'm trying to create box plots of the total revenue for each region and cannot figure out how to create.

Here is my head(df):

> head(df2)
  store       city  region province size revenue   units    cost gross_profit promo_units energy_units regularBars_units
1   105 BROCKVILLE ONTARIO       ON  496  984.70  470.46  590.73       393.97      210.23        72.13             38.63
2   117 BURLINGTON ONTARIO       ON  875 2629.32 1131.38 1621.58      1007.74      401.46       192.77             75.04
3   122 BURLINGTON ONTARIO       ON  691 2786.73 1229.46 1709.45      1077.27      450.04       240.48             93.73
4   123 BURLINGTON ONTARIO       ON  763 2834.49 1257.63 1719.61      1114.88      476.83       194.21             99.44
5   182  DON MILLS ONTARIO       ON  784 4118.36 1949.50 2485.83      1632.53      664.71       199.73            175.48
7   186 NORTH YORK ONTARIO       ON  966 8195.26 3695.46 5069.99      3125.27     1143.33       419.19            271.58
  gum_units bagpegCandy_units isotonics_units singleServePotato_units takeHomePotato_units kingBars_units flatWater_units
1     29.29             13.38           20.69                   18.60                 7.71          17.87           56.54
2     55.85             42.15           87.62                   36.44                33.46          47.44           98.42
3     64.27             29.85          105.65                   47.96                19.90          45.21          130.27
4     73.25             54.15          118.19                   39.67                22.10          45.33          132.77
5    145.81             68.06          109.35                   85.71                42.33          79.81          204.06
7    212.42            153.90          166.37                  130.79               136.79         114.50          328.63
1          39.71
2          38.73
3          47.31
4          39.87
5          50.29
7         112.38

We are only concerned about region and revenue here and trying to create box plots for the revenue of each region.

Here is my str(df2)

> str(df2)
'data.frame':   755 obs. of  20 variables:
 $ store                  : int  105 117 122 123 182 186 194 227 233 236 ...
 $ city                   : chr  "BROCKVILLE" "BURLINGTON" "BURLINGTON" "BURLINGTON" ...
 $ region                 : chr  "ONTARIO" "ONTARIO" "ONTARIO" "ONTARIO" ...
 $ province               : chr  "ON" "ON" "ON" "ON" ...
 $ size                   : int  496 875 691 763 784 966 710 973 967 1001 ...
 $ revenue                : num  985 2629 2787 2834 4118 ...
 $ units                  : num  470 1131 1229 1258 1950 ...
 $ cost                   : num  591 1622 1709 1720 2486 ...
 $ gross_profit           : num  394 1008 1077 1115 1633 ...
 $ promo_units            : num  210 401 450 477 665 ...
 $ energy_units           : num  72.1 192.8 240.5 194.2 199.7 ...
 $ regularBars_units      : num  38.6 75 93.7 99.4 175.5 ...
 $ gum_units              : num  29.3 55.9 64.3 73.2 145.8 ...
 $ bagpegCandy_units      : num  13.4 42.1 29.9 54.1 68.1 ...
 $ isotonics_units        : num  20.7 87.6 105.7 118.2 109.3 ...
 $ singleServePotato_units: num  18.6 36.4 48 39.7 85.7 ...
 $ takeHomePotato_units   : num  7.71 33.46 19.9 22.1 42.33 ...
 $ kingBars_units         : num  17.9 47.4 45.2 45.3 79.8 ...
 $ flatWater_units        : num  56.5 98.4 130.3 132.8 204.1 ...
 $ psd591Ml_units         : num  39.7 38.7 47.3 39.9 50.3 ...
 - attr(*, "na.action")= 'omit' Named int [1:16] 6 169 173 177 182 191 193 195 196 198 ...
  ..- attr(*, "names")= chr [1:16] "6" "169" "173" "177" ... 

Have you tried

boxplot(revenue ~ region, data = df2)


In ggplot2:

df2 |> ggplot(aes(region, revenue))   geom_boxplot()

More info and examples here: https://ggplot2.tidyverse.org/reference/geom_boxplot.html

