I am struggling with a problem manipulating data in R.
Consider the data set:
Color Count
Red 550
Blue 309
Green 296
Purple 49
Yellow 36
Brown 19
I'd like to use ggplot to graph the set with a column of "Other" summing all values < 50. I'd end up with categories: Red, Blue, Green, Other (with count of 104).
I can filter the set to exclude counts < 50 but don't know how to either create a new row with the sum of the others or achieve it in another way. BTW, it would be completely acceptable to stack the "Other" bar with the counts from Purple, Yellow, and Brown.
Any help is much appreciated. Thanks.
CodePudding user response:
You could do this using forcats::fct_lump
, but it uses a minimum share of the total for the threshold (like 0.04), not an absolute # like 50.
Then we can get the totals for the result categories using Count, now that a few rows have "Other" as their color.
Finally we can plot it. The default colors will not be related to the name of the Color, so here I put them in manually.
library(tidyverse)
df %>%
mutate(Color = fct_lump(Color, w = Count, prop = 0.04)) %>%
count(Color, wt = Count, name = "Count") %>%
ggplot(aes(x = 1, y = Count, fill = Color))
geom_col()
geom_text(aes(label = Count), position = position_stack(vjust = 0.5))
scale_fill_manual(values = c("blue", "green", "red", "gray70"))
CodePudding user response:
Let's assume you have a dataframe like that named dat
. Then rbind
the rows with Counts >= 50 to a list made up of the desired name and the sum of the other Counts:
dat2 <- rbind( dat[dat$Count>=50, ],
list(Color="Other", Count=sum( dat[dat$Count<50, "Count"] ) )
dat2
Color Count
1 Red 550
2 Blue 309
3 Green 296
4 Other 104
The it's just:
ggplot( data=dat2, aes(x= Color, y=Count) ) geom_col()