I have four columns that contain numberical values (hundreds of rows). I would like to plot a bar chart that shows the average value of each of those columns on one chart. So it would show 4 bars on one bar chart, and each bar would represent one column.
Columns are veryactive, fairlyactive, lightlyactive, sedentary. I already know the mean for each column with the summary function, but I want to plot it on a chart. Do I need another variable for one of the other axis?
I was able to plot one of the columns in a bar chart and showing calories as the x axis, but I would just like to compare the mean for each column within a bar chart.
ggplot(Activity_Zero, aes(x = calories, y = veryactive))
stat_summary(geom = 'bar', fun.y = 'mean')
Here is a sample of my data: tibble of my data
CodePudding user response:
It depends on how your data are formatted. I've provided two examples below.
If you're starting with a table with the summary values, you can do this
library(ggplot2)
levels <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
df1 <- data.frame(activitylevel = factor(levels, levels = levels),
meancalories = c(3000, 2500, 2000, 1500))
ggplot(df1, aes(x = activitylevel, y = meancalories))
geom_col()
Created on 2023-01-30 by the reprex package (v2.0.1)
And if you're starting with your original data in long form, you can do this.
library(ggplot2)
levels <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
df2 <- data.frame(activitylevel = factor(rep(levels,
each = 20), levels = levels),
calories = c(rnorm(20, 3000, 100),
rnorm(20, 2500, 100),
rnorm(20, 2000, 100),
rnorm(20, 1500, 100))
)
ggplot(df2, aes(x = activitylevel, y = calories))
stat_summary(geom = "col", fun = "mean")
Created on 2023-01-30 by the reprex package (v2.0.1)
Finally, if you're starting with your data in wide form (i.e. a column for each activity level) then I'd suggest you look up the function tidyr::pivot_longer
, which will wrangle your data into the form required for stat_summary
.
CodePudding user response:
Using colMeans
cols <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
# base R
barplot(colMeans(Activity_Zero[, cols]))
# ggplot
library(ggplot2)
ggplot(stack(colMeans(Activity_Zero[, cols])), aes(ind, values)) geom_col()
CodePudding user response:
You'll likely need to pivot your data like so:
library(tidyverse)
df <- tibble(
sedentary = sample(10:50, 100, replace = TRUE),
lightlyactive = sample(10:20, 100, replace = TRUE),
fairlyactive = sample(100:400, 100, replace = TRUE),
veryactive = sample(300:600, 100, replace = TRUE),
calories = sample(1500:2500, 100, replace = TRUE)
)
df |>
pivot_longer(c(veryactive, fairlyactive, lightlyactive, sedentary),
names_to = "group",
values_to = "vals") |>
ggplot(aes(fct_inorder(group), vals))
stat_summary(geom = "bar", fun = mean)