Home > front end >  R Plotly - Bar Chart Bars Have Random Lengths
R Plotly - Bar Chart Bars Have Random Lengths

Time:03-29

I have come across a bizarre problem. I created a bar chart with plotly, but the bars have lengths that are seemingly random. In some cases, a higher value will have a higher bar, but in other cases, a higher value will have a lower bar, depending on which bars you are comparing. I cannot figure out how this is possible, let alone how it happened here.

Here is the code I have:

priceplot2 <- listings4 %>% 
  plot_ly(hovertemplate = "",
          x = ~logavgprice, 
          y = ~neighbourhood,
          span = I(1)
          ) %>% 
  add_bars(color = ~logavgprice,
           hovertemplate = ~paste0(neighbourhood, ", Average Price: $", round(avgprice, 2), "<extra></extra>"))
priceplot2

Here is a photo of the result: Result

Earlier, I grouped the data by a variable (neighbourhood), and I took the average of price for each neighbourhood, and then I did a log transformation of the data.

I hope this makes sense, and I apologize that this is not easily reproducible.

I appreciate any help anyone can provide.

Thank you!

P.S. If anyone could show me how to reorder the neighbourhoods, I'd appreciate that as well. Also, I cannot seem to get rid of that legend, but that's not a huge deal.

CodePudding user response:

It looks like you are displaying unaggregated data, so the bars are stacking all the observations. See this example:

library(plotly)
data.frame(neighborhood = c("A","B","B","C"),
           avgprice = c(10,100,100,1000),
           logavgprice = c(1,2,2,3)) %>%
  plot_ly(x = ~logavgprice,
          y = ~neighborhood) %>%
  add_bars(marker = list(line = list(color = 'rgb(200,200,200)', width = 3)))

Note that the bars in B are each 2 long, but the total bar looks 4 long (and would be 100's or 1000's long if I had more observations).

enter image description here

To fix this, you could adjust your prior steps -- maybe you should use summarise instead of mutate when you calculate avgprice and then logavgprice -- or use dplyr::distinct(neighborhood, .keep_all = TRUE) to keep just one observation per neighborhood.

We can also use forcats::fct_reorder to make the neighborhoods be an ordered factor which will display in the order we want.

Here's an example:

data.frame(neighborhood = c("A","B","B","C"),
           price = c(1000, 90, 110, 10)) %>%
  group_by(neighborhood) %>%
  summarize(avg_price = mean(price)) %>%
  mutate(logavgprice = log(avg_price)) %>%
  mutate(neighborhood = forcats::fct_reorder(neighborhood, logavgprice)) %>%
  arrange(neighborhood) %>% # optional, just to show reordering before plotting
  plot_ly(x = ~logavgprice,
          y = ~neighborhood) %>%
  add_bars(marker = list(line = list(color = 'rgb(200,200,200)', width = 3)))
  • Related