Home > Software design >  How do you plot multiple bars (each bar representing a column) that show the average value of numeri
How do you plot multiple bars (each bar representing a column) that show the average value of numeri

Time:01-31

I have four columns that contain numberical values (hundreds of rows). I would like to plot a bar chart that shows the average value of each of those columns on one chart. So it would show 4 bars on one bar chart, and each bar would represent one column.

Columns are veryactive, fairlyactive, lightlyactive, sedentary. I already know the mean for each column with the summary function, but I want to plot it on a chart. Do I need another variable for one of the other axis?

I was able to plot one of the columns in a bar chart and showing calories as the x axis, but I would just like to compare the mean for each column within a bar chart.

ggplot(Activity_Zero, aes(x = calories, y = veryactive)) 
  stat_summary(geom = 'bar', fun.y = 'mean')

Here is a sample of my data: tibble of my data

CodePudding user response:

It depends on how your data are formatted. I've provided two examples below.

If you're starting with a table with the summary values, you can do this

library(ggplot2)

levels <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
df1 <- data.frame(activitylevel = factor(levels, levels = levels),
                 meancalories = c(3000, 2500, 2000, 1500))
ggplot(df1, aes(x = activitylevel, y = meancalories))  
  geom_col()

Created on 2023-01-30 by the reprex package (v2.0.1)

And if you're starting with your original data in long form, you can do this.

library(ggplot2)
levels <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
df2 <- data.frame(activitylevel = factor(rep(levels,
                                     each = 20), levels = levels),
                 calories = c(rnorm(20, 3000, 100),
                              rnorm(20, 2500, 100),
                              rnorm(20, 2000, 100),
                              rnorm(20, 1500, 100))
                 )
ggplot(df2, aes(x = activitylevel, y = calories))  
  stat_summary(geom = "col", fun = "mean")

Created on 2023-01-30 by the reprex package (v2.0.1)

Finally, if you're starting with your data in wide form (i.e. a column for each activity level) then I'd suggest you look up the function tidyr::pivot_longer, which will wrangle your data into the form required for stat_summary.

CodePudding user response:

Using colMeans

cols <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
# base R
barplot(colMeans(Activity_Zero[, cols]))
# ggplot
library(ggplot2)

ggplot(stack(colMeans(Activity_Zero[, cols])), aes(ind, values))   geom_col()

CodePudding user response:

You'll likely need to pivot your data like so:

library(tidyverse)

df <- tibble(
  sedentary = sample(10:50, 100, replace = TRUE),
  lightlyactive = sample(10:20, 100, replace = TRUE),
  fairlyactive = sample(100:400, 100, replace = TRUE),
  veryactive = sample(300:600, 100, replace = TRUE),
  calories = sample(1500:2500, 100, replace = TRUE)
)

df |> 
  pivot_longer(c(veryactive, fairlyactive, lightlyactive, sedentary), 
               names_to = "group",
               values_to = "vals") |> 
  ggplot(aes(fct_inorder(group), vals))  
  stat_summary(geom = "bar", fun = mean)

  • Related