Home > Back-end >  Summarize values below threshold for graph in R
Summarize values below threshold for graph in R

Time:06-03

I am struggling with a problem manipulating data in R.

Consider the data set:

Color Count


Red 550

Blue 309

Green 296

Purple 49

Yellow 36

Brown 19

I'd like to use ggplot to graph the set with a column of "Other" summing all values < 50. I'd end up with categories: Red, Blue, Green, Other (with count of 104).

I can filter the set to exclude counts < 50 but don't know how to either create a new row with the sum of the others or achieve it in another way. BTW, it would be completely acceptable to stack the "Other" bar with the counts from Purple, Yellow, and Brown.

Any help is much appreciated. Thanks.

CodePudding user response:

You could do this using forcats::fct_lump, but it uses a minimum share of the total for the threshold (like 0.04), not an absolute # like 50.

Then we can get the totals for the result categories using Count, now that a few rows have "Other" as their color.

Finally we can plot it. The default colors will not be related to the name of the Color, so here I put them in manually.

library(tidyverse)
df %>% 
  mutate(Color = fct_lump(Color, w = Count, prop = 0.04)) %>%
  count(Color, wt = Count, name = "Count") %>%
  ggplot(aes(x = 1, y = Count, fill = Color))  
  geom_col()  
  geom_text(aes(label = Count), position = position_stack(vjust = 0.5))  
  scale_fill_manual(values = c("blue", "green", "red", "gray70"))
  

enter image description here

CodePudding user response:

Let's assume you have a dataframe like that named dat. Then rbind the rows with Counts >= 50 to a list made up of the desired name and the sum of the other Counts:

dat2 <- rbind( dat[dat$Count>=50, ], 
               list(Color="Other", Count=sum( dat[dat$Count<50, "Count"] ) )  
dat2
  Color Count
1   Red   550
2  Blue   309
3 Green   296
4 Other   104

The it's just:

ggplot( data=dat2, aes(x= Color, y=Count) ) geom_col()
  •  Tags:  
  • r
  • Related