There's a behaviour with position_dodge2 in ggplot which I cannot seem to understand. This question has been asked before and is also on tidyverse's page on position FAQs but I don't understand what's going on here.
I have a simple dataset with a numerical Cost value, and two factors SEX and RACE. I make a basic bar graph as follows:
ggplot(healthdata)
geom_col(aes(x = RACE, y = Costs, fill=SEX), position = "dodge")
I need to make sure RACE 5 has the same width as the rest, so I do what every source says which is use position_dodge2, and set the preserve parameter to "single". Yet this is the output I get. Can someone help me understand this?
ggplot(healthdata)
geom_col(aes(x = RACE, y = HealthcareCosts, fill=SEX), position = position_dodge2(preserve = "single"))
Why does the scaling go all over the place? I can adjust the parameter 'padding' to make the bars thicker, but I don't understand how changing "dodge"
to position_dodge2(preserve = "single")
has caused this change in the graph. The highest of the many bars in the second case match the heights of the bars in the first. So what are all the extra bars then? I followed the instructions from the second example on
ggplot(mtcars)
geom_col(aes(x = as_factor(cyl), y = mpg), position = "stack")
mtcars |> group_by(cyl) |> summarise(sum = sum(mpg))
Shows sum of the values, as the bars are stacked
ggplot(mtcars)
geom_col(aes(x = as_factor(cyl), y = mpg), position = "dodge2")
mtcars |> group_by(cyl) |> summarise(count = n())
Shows all individual values, as the bars are fanned out.
CodePudding user response:
your example is using geom_col and the reference is using geom_bar, which have some key differences - geom_col is mapping each element but performing a summarizing of the total cost. when the position_dodge(preserve = "single")
is used, it is then preserving the single printing of each element. To get this to match what I suspect is what you're looking for is to perform the summarizing then pipe into ggplot, using geom_bar instead
mtcars |>
group_by(cyl,vs) |>
mutate(total = sum(mpg)) |>
ungroup() |>
select(cyl, vs, total) |>
distinct() |> #important for only graphing single element
ggplot(aes(x = factor(cyl), y = total, fill = factor(vs)))
geom_bar(position = position_dodge2(preserve = "single"), stat = "identity")